Papers and Talks

Research Publications

  1. Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]

    Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

    Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023

    BibTeX
    @misc{Hidet,
      title     = {Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs},
      author    = {Yaoyao Ding and
                   Cody Hao Yu and
                   Bojian Zheng and
                   Yizhi Liu and
                   Yida Wang and
                   Gennady Pekhimenko},
      doi       = {10.48550/ARXIV.2210.09603},
      url       = {https://arxiv.org/abs/2210.09603},
      keywords  = {Machine Learning (cs.LG),
                   Artificial Intelligence (cs.AI),
                   Programming Languages (cs.PL),
                   FOS: Computer and information sciences,
                   FOS: Computer and information sciences},
      publisher = {arXiv},
      year      = {2022},
      copyright = {arXiv.org perpetual, non-exclusive license}
    }
    
  2. Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]

    Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko

    Neural Information Processing Systems (NeurIPS), November 2022

    BibTeX
    @inproceedings{Tempo,
      title     = {Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction},
      author    = {Muralidhar Andoorveedu and
                   Zhanda Zhu and
                   Bojian Zheng and
                   Gennady Pekhimenko},
      booktitle = {Advances in Neural Information Processing Systems},
      editor    = {Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
      year      = {2022},
      url       = {https://openreview.net/forum?id=xqyEG7EhTZ}
    }
    
  3. DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, PPTX Slides, PDF Slides]

    Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko

    Machine Learning and Systems (MLSys), August 2022

    BibTeX
    @inproceedings{DietCode,
      author    = {Zheng, Bojian and
                   Jiang, Ziheng and
                   Yu, Cody Hao and
                   Shen, Haichen and
                   Fromm, Joshua and
                   Liu, Yizhi and
                   Wang, Yida and
                   Ceze, Luis and
                   Chen, Tianqi and
                   Pekhimenko, Gennady},
      booktitle = {Proceedings of Machine Learning and Systems},
      editor    = {D. Marculescu and Y. Chi and C. Wu},
      pages     = {848--863},
      title     = {DietCode: Automatic Optimization for Dynamic Tensor Programs},
      url       = {https://proceedings.mlsys.org/paper/2022/file/fa7cdfad1a5aaf8370ebeda47a1ff1c3-Paper.pdf},
      volume    = {4},
      year      = {2022}
    }
    
  4. Automatic Horizontal Fusion for GPU Kernels [Paper]

    Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long

    International Symposium on Code Generation and Optimization (CGO), February 2022

    BibTeX
    @inproceedings{HFuse,
     author    = {Ao Li and
                  Bojian Zheng and
                  Gennady Pekhimenko and
                  Fan Long},
     editor    = {Jae W. Lee and
                  Sebastian Hack and
                  Tatiana Shpeisman},
     title     = {Automatic Horizontal Fusion for {GPU} Kernels},
     booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization,
                  {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022},
     pages     = {14--27},
     publisher = {{IEEE}},
     year      = {2022},
     url       = {https://doi.org/10.1109/CGO53902.2022.9741270},
     doi       = {10.1109/CGO53902.2022.9741270},
     timestamp = {Fri, 01 Apr 2022 09:28:21 +0200},
     biburl    = {https://dblp.org/rec/conf/cgo/LiZPL22.bib},
     bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  5. Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]

    Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko

    ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020

    BibTeX
    @inproceedings{Echo,
      author    = {Bojian Zheng and
                   Nandita Vijaykumar and
                   Gennady Pekhimenko},
      title     = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN}
                   Training},
      booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture,
                   {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020},
      pages     = {1089--1102},
      publisher = {{IEEE}},
      year      = {2020},
      url       = {https://doi.org/10.1109/ISCA45697.2020.00092},
      doi       = {10.1109/ISCA45697.2020.00092},
      timestamp = {Fri, 09 Jul 2021 15:51:20 +0200},
      biburl    = {https://dblp.org/rec/conf/isca/ZhengVP20.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  6. TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]

    Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko

    Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019

    BibTeX
    @inproceedings{TBDSuite_Demo,
      author    = {Geoffrey Yu and
                   Hongyu Zhu and
                   Anand Jayarajan and
                   Bojian Zheng and
                   Abhishek Tiwari and
                   Gennady Pekhimenko},
      title     = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}},
      booktitle = {Machine Learning and Systems Conference Demonstration},
      year      = {2019},
      url       = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf},
    }
    
  7. EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]

    Bojian Zheng, Gennady Pekhimenko

    IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO’51 SRC, 3rd Place), October 2018

    BibTeX
    @inproceedings{EcoRNN,
      author    = {Bojian Zheng and
                   Gennady Pekhimenko},
      title     = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}},
      booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition},
      year      = {2018},
      url       = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf},
    }
    
  8. Benchmarking and Analyzing Deep Neural Network Training [Paper]

    Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

    IEEE International Symposium on Workload Characterization (IISWC), July 2018

    BibTeX
    @inproceedings{TBD,
      author    = {Hongyu Zhu and
                   Mohamed Akrout and
                   Bojian Zheng and
                   Andrew Pelegris and
                   Anand Jayarajan and
                   Amar Phanishayee and
                   Bianca Schroeder and
                   Gennady Pekhimenko},
      title     = {Benchmarking and Analyzing Deep Neural Network Training},
      booktitle = {2018 {IEEE} International Symposium on Workload Characterization,
                   {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018},
      pages     = {88--100},
      publisher = {{IEEE} Computer Society},
      year      = {2018},
      url       = {https://doi.org/10.1109/IISWC.2018.8573476},
      doi       = {10.1109/IISWC.2018.8573476},
      timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
      biburl    = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  9. IDEAL: Image DEnoising AcceLerator [Paper]

    Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos

    IEEE/ACM International Symposium on Microarchitecture (MICRO’50), October 2017

    BibTeX
    @inproceedings{IDEAL,
      author    = {Mostafa Mahmoud and
                   Bojian Zheng and
                   Alberto Delmas Lascorz and
                   Felix Heide and
                   Jonathan Assouline and
                   Paul Boucher and
                   Emmanuel Onzon and
                   Andreas Moshovos},
      editor    = {Hillery C. Hunter and
                   Jaime Moreno and
                   Joel S. Emer and
                   Daniel S{\'{a}}nchez},
      title     = {{IDEAL}: Image DEnoising AcceLerator},
      booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium
                   on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18,
                   2017},
      pages     = {82--95},
      publisher = {{ACM}},
      year      = {2017},
      url       = {https://doi.org/10.1145/3123939.3123941},
      doi       = {10.1145/3123939.3123941},
      timestamp = {Wed, 11 Aug 2021 11:51:26 +0200},
      biburl    = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    

Talks

  1. DietCode: Automatic Optimization for Dynamic Tensor Programs [PPTX Slides, PDF Slides]

    Bojian Zheng, Ziheng Jiang (co-authored)

    TVM Conference, December 2021

  2. Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]

    Bojian Zheng

    AMD, July 2021

  3. Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]

    Bojian Zheng

    Vector Institute, November 2019

  4. Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]

    Bojian Zheng, Shang Wang (co-authored)

    IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018