Papers and Talks

Research Publications

  1. Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs

    Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko

    To appear in IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023

  2. Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]

    Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

    International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023

    BibTeX
    @inproceedings{Hidet,
      author       = {Yaoyao Ding and
                      Cody Hao Yu and
                      Bojian Zheng and
                      Yizhi Liu and
                      Yida Wang and
                      Gennady Pekhimenko},
      editor       = {Tor M. Aamodt and
                      Natalie D. Enright Jerger and
                      Michael M. Swift},
      title        = {Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor
                      Programs},
      booktitle    = {Proceedings of the 28th {ACM} International Conference on Architectural
                      Support for Programming Languages and Operating Systems, Volume 2,
                      {ASPLOS} 2023, Vancouver, BC, Canada, March 25-29, 2023},
      pages        = {370--384},
      publisher    = {{ACM}},
      year         = {2023},
      url          = {https://doi.org/10.1145/3575693.3575702},
      doi          = {10.1145/3575693.3575702},
      timestamp    = {Thu, 02 Feb 2023 08:48:06 +0100},
      biburl       = {https://dblp.org/rec/conf/asplos/DingYZLWP23.bib},
      bibsource    = {dblp computer science bibliography, https://dblp.org}
    }
    
  3. Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]

    Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko

    Neural Information Processing Systems (NeurIPS), November 2022

    BibTeX
    @inproceedings{Tempo,
      author       = {Muralidhar Andoorveedu and
                      Zhanda Zhu and
                      Bojian Zheng and
                      Gennady Pekhimenko},
      title        = {Tempo: Accelerating Transformer-Based Model Training through Memory
                      Footprint Reduction},
      booktitle    = {NeurIPS},
      year         = {2022},
      url          = {http://papers.nips.cc/paper\_files/paper/2022/hash/4fc81f4cd2715d995018e0799262176b-Abstract-Conference.html},
      timestamp    = {Thu, 11 May 2023 17:08:21 +0200},
      biburl       = {https://dblp.org/rec/conf/nips/AndoorveeduZZP22.bib},
      bibsource    = {dblp computer science bibliography, https://dblp.org}
    }
    
  4. DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, PPTX Slides, PDF Slides]

    Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko

    Machine Learning and Systems (MLSys), August 2022

    BibTeX
    @inproceedings{DietCode,
      author    = {Zheng, Bojian and
                   Jiang, Ziheng and
                   Yu, Cody Hao and
                   Shen, Haichen and
                   Fromm, Joshua and
                   Liu, Yizhi and
                   Wang, Yida and
                   Ceze, Luis and
                   Chen, Tianqi and
                   Pekhimenko, Gennady},
      booktitle = {Proceedings of Machine Learning and Systems},
      editor    = {D. Marculescu and Y. Chi and C. Wu},
      pages     = {848--863},
      title     = {DietCode: Automatic Optimization for Dynamic Tensor Programs},
      url       = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/f89b79c9a28d4cae22ef9e557d9fa191-Abstract.html},
      volume    = {4},
      year      = {2022}
    }
    
  5. Automatic Horizontal Fusion for GPU Kernels [Paper]

    Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long

    International Symposium on Code Generation and Optimization (CGO), April 2022

    BibTeX
    @inproceedings{HFuse,
     author    = {Ao Li and
                  Bojian Zheng and
                  Gennady Pekhimenko and
                  Fan Long},
     editor    = {Jae W. Lee and
                  Sebastian Hack and
                  Tatiana Shpeisman},
     title     = {Automatic Horizontal Fusion for {GPU} Kernels},
     booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization,
                  {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022},
     pages     = {14--27},
     publisher = {{IEEE}},
     year      = {2022},
     url       = {https://doi.org/10.1109/CGO53902.2022.9741270},
     doi       = {10.1109/CGO53902.2022.9741270},
     timestamp = {Fri, 01 Apr 2022 09:28:21 +0200},
     biburl    = {https://dblp.org/rec/conf/cgo/LiZPL22.bib},
     bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  6. Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]

    Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko

    ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020

    BibTeX
    @inproceedings{Echo,
      author    = {Bojian Zheng and
                   Nandita Vijaykumar and
                   Gennady Pekhimenko},
      title     = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN}
                   Training},
      booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture,
                   {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020},
      pages     = {1089--1102},
      publisher = {{IEEE}},
      year      = {2020},
      url       = {https://doi.org/10.1109/ISCA45697.2020.00092},
      doi       = {10.1109/ISCA45697.2020.00092},
      timestamp = {Fri, 09 Jul 2021 15:51:20 +0200},
      biburl    = {https://dblp.org/rec/conf/isca/ZhengVP20.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  7. TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]

    Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko

    Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019

    BibTeX
    @inproceedings{TBDSuite_Demo,
      author    = {Geoffrey Yu and
                   Hongyu Zhu and
                   Anand Jayarajan and
                   Bojian Zheng and
                   Abhishek Tiwari and
                   Gennady Pekhimenko},
      title     = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}},
      booktitle = {Machine Learning and Systems Conference Demonstration},
      year      = {2019},
      url       = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf},
    }
    
  8. EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]

    Bojian Zheng, Gennady Pekhimenko

    IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO SRC, 3rd Place), October 2018

    BibTeX
    @inproceedings{EcoRNN,
      author    = {Bojian Zheng and
                   Gennady Pekhimenko},
      title     = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}},
      booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition},
      year      = {2018},
      url       = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf},
    }
    
  9. Benchmarking and Analyzing Deep Neural Network Training [Paper]

    Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

    IEEE International Symposium on Workload Characterization (IISWC), September 2018

    BibTeX
    @inproceedings{TBD,
      author    = {Hongyu Zhu and
                   Mohamed Akrout and
                   Bojian Zheng and
                   Andrew Pelegris and
                   Anand Jayarajan and
                   Amar Phanishayee and
                   Bianca Schroeder and
                   Gennady Pekhimenko},
      title     = {Benchmarking and Analyzing Deep Neural Network Training},
      booktitle = {2018 {IEEE} International Symposium on Workload Characterization,
                   {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018},
      pages     = {88--100},
      publisher = {{IEEE} Computer Society},
      year      = {2018},
      url       = {https://doi.org/10.1109/IISWC.2018.8573476},
      doi       = {10.1109/IISWC.2018.8573476},
      timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
      biburl    = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  10. IDEAL: Image DEnoising AcceLerator [Paper]

    Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos

    IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017

    BibTeX
    @inproceedings{IDEAL,
      author    = {Mostafa Mahmoud and
                   Bojian Zheng and
                   Alberto Delmas Lascorz and
                   Felix Heide and
                   Jonathan Assouline and
                   Paul Boucher and
                   Emmanuel Onzon and
                   Andreas Moshovos},
      editor    = {Hillery C. Hunter and
                   Jaime Moreno and
                   Joel S. Emer and
                   Daniel S{\'{a}}nchez},
      title     = {{IDEAL}: Image DEnoising AcceLerator},
      booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium
                   on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18,
                   2017},
      pages     = {82--95},
      publisher = {{ACM}},
      year      = {2017},
      url       = {https://doi.org/10.1145/3123939.3123941},
      doi       = {10.1145/3123939.3123941},
      timestamp = {Wed, 11 Aug 2021 11:51:26 +0200},
      biburl    = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    

Talks and Presentations

  1. DietCode: Automatic Optimization for Dynamic Tensor Programs [PPTX Slides, PDF Slides]

    Annual Apache TVM Conference (TVMCon), December 2021

    Google, May 2023

  2. Recent Trends in Machine Learning Compilers [PPTX Slides, PDF Slides]

    Amazon, March 2022

  3. Recent Trends in GPU Memory Footprint Reduction Techniques (Ep.3 Weight Placement for EX-Large Models) [PPTX Slides, PDF Slides]

    Amazon, February 2022

  4. Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]

    AMD, July 2021

  5. Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]

    Vector Institute, November 2019

  6. Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]

    Presented with Shang Wang.

    IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018