Papers and Presentations

Papers

  1. Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs [Paper, Code, PPTX Slides, PDF Slides, Poster]

    Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko

    IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023

    BibTeX
    @Inproceedings{Grape,
      author    = {Bojian Zheng and
                   Cody Hao Yu and
                   Jie Wang and
                   Yaoyao Ding and
                   Yizhi Liu and
                   Yida Wang and
                   Gennady Pekhimenko},
      title     = {{Grape}: Practical and efficient graph-based executions for dynamic deep neural networks on {GPUs}},
      year      = {2023},
      url       = {https://www.amazon.science/publications/grape-practical-and-efficient-graph-based-executions-for-dynamic-deep-neural-networks-on-gpus},
      booktitle = {IEEE/ACM International Symposium on Microarchitecture (MICRO ’23)},
    }
    
  2. Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]

    Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

    International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023

    BibTeX
    @inproceedings{Hidet,
      author       = {Yaoyao Ding and
                      Cody Hao Yu and
                      Bojian Zheng and
                      Yizhi Liu and
                      Yida Wang and
                      Gennady Pekhimenko},
      editor       = {Tor M. Aamodt and
                      Natalie D. Enright Jerger and
                      Michael M. Swift},
      title        = {Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor
                      Programs},
      booktitle    = {Proceedings of the 28th {ACM} International Conference on Architectural
                      Support for Programming Languages and Operating Systems, Volume 2,
                      {ASPLOS} 2023, Vancouver, BC, Canada, March 25-29, 2023},
      pages        = {370--384},
      publisher    = {{ACM}},
      year         = {2023},
      url          = {https://doi.org/10.1145/3575693.3575702},
      doi          = {10.1145/3575693.3575702},
      timestamp    = {Thu, 02 Feb 2023 08:48:06 +0100},
      biburl       = {https://dblp.org/rec/conf/asplos/DingYZLWP23.bib},
      bibsource    = {dblp computer science bibliography, https://dblp.org}
    }
    
  3. Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]

    Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko

    Neural Information Processing Systems (NeurIPS), November 2022

    BibTeX
    @inproceedings{Tempo,
      author       = {Muralidhar Andoorveedu and
                      Zhanda Zhu and
                      Bojian Zheng and
                      Gennady Pekhimenko},
      title        = {Tempo: Accelerating Transformer-Based Model Training through Memory
                      Footprint Reduction},
      booktitle    = {NeurIPS},
      year         = {2022},
      url          = {http://papers.nips.cc/paper\_files/paper/2022/hash/4fc81f4cd2715d995018e0799262176b-Abstract-Conference.html},
      timestamp    = {Thu, 11 May 2023 17:08:21 +0200},
      biburl       = {https://dblp.org/rec/conf/nips/AndoorveeduZZP22.bib},
      bibsource    = {dblp computer science bibliography, https://dblp.org}
    }
    
  4. DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, Code, PPTX Slides, PDF Slides]

    Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko

    Machine Learning and Systems (MLSys), August 2022

    BibTeX
    @inproceedings{DietCode,
      author    = {Zheng, Bojian and
                   Jiang, Ziheng and
                   Yu, Cody Hao and
                   Shen, Haichen and
                   Fromm, Joshua and
                   Liu, Yizhi and
                   Wang, Yida and
                   Ceze, Luis and
                   Chen, Tianqi and
                   Pekhimenko, Gennady},
      booktitle = {Proceedings of Machine Learning and Systems},
      editor    = {D. Marculescu and Y. Chi and C. Wu},
      pages     = {848--863},
      title     = {DietCode: Automatic Optimization for Dynamic Tensor Programs},
      url       = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/f89b79c9a28d4cae22ef9e557d9fa191-Abstract.html},
      volume    = {4},
      year      = {2022}
    }
    
  5. Automatic Horizontal Fusion for GPU Kernels [Paper]

    Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long

    International Symposium on Code Generation and Optimization (CGO), April 2022

    BibTeX
    @inproceedings{HFuse,
     author    = {Ao Li and
                  Bojian Zheng and
                  Gennady Pekhimenko and
                  Fan Long},
     editor    = {Jae W. Lee and
                  Sebastian Hack and
                  Tatiana Shpeisman},
     title     = {Automatic Horizontal Fusion for {GPU} Kernels},
     booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization,
                  {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022},
     pages     = {14--27},
     publisher = {{IEEE}},
     year      = {2022},
     url       = {https://doi.org/10.1109/CGO53902.2022.9741270},
     doi       = {10.1109/CGO53902.2022.9741270},
     timestamp = {Fri, 01 Apr 2022 09:28:21 +0200},
     biburl    = {https://dblp.org/rec/conf/cgo/LiZPL22.bib},
     bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  6. Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]

    Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko

    ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020

    BibTeX
    @inproceedings{Echo,
      author    = {Bojian Zheng and
                   Nandita Vijaykumar and
                   Gennady Pekhimenko},
      title     = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN}
                   Training},
      booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture,
                   {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020},
      pages     = {1089--1102},
      publisher = {{IEEE}},
      year      = {2020},
      url       = {https://doi.org/10.1109/ISCA45697.2020.00092},
      doi       = {10.1109/ISCA45697.2020.00092},
      timestamp = {Fri, 09 Jul 2021 15:51:20 +0200},
      biburl    = {https://dblp.org/rec/conf/isca/ZhengVP20.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  7. TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]

    Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko

    Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019

    BibTeX
    @inproceedings{TBDSuite_Demo,
      author    = {Geoffrey Yu and
                   Hongyu Zhu and
                   Anand Jayarajan and
                   Bojian Zheng and
                   Abhishek Tiwari and
                   Gennady Pekhimenko},
      title     = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}},
      booktitle = {Machine Learning and Systems Conference Demonstration},
      year      = {2019},
      url       = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf},
    }
    
  8. EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]

    Bojian Zheng, Gennady Pekhimenko

    IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO SRC, 3rd Place), October 2018

    BibTeX
    @inproceedings{EcoRNN,
      author    = {Bojian Zheng and
                   Gennady Pekhimenko},
      title     = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}},
      booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition},
      year      = {2018},
      url       = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf},
    }
    
  9. Benchmarking and Analyzing Deep Neural Network Training [Paper]

    Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

    IEEE International Symposium on Workload Characterization (IISWC), September 2018

    BibTeX
    @inproceedings{TBD,
      author    = {Hongyu Zhu and
                   Mohamed Akrout and
                   Bojian Zheng and
                   Andrew Pelegris and
                   Anand Jayarajan and
                   Amar Phanishayee and
                   Bianca Schroeder and
                   Gennady Pekhimenko},
      title     = {Benchmarking and Analyzing Deep Neural Network Training},
      booktitle = {2018 {IEEE} International Symposium on Workload Characterization,
                   {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018},
      pages     = {88--100},
      publisher = {{IEEE} Computer Society},
      year      = {2018},
      url       = {https://doi.org/10.1109/IISWC.2018.8573476},
      doi       = {10.1109/IISWC.2018.8573476},
      timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
      biburl    = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    
  10. IDEAL: Image DEnoising AcceLerator [Paper]

    Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos

    IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017

    BibTeX
    @inproceedings{IDEAL,
      author    = {Mostafa Mahmoud and
                   Bojian Zheng and
                   Alberto Delmas Lascorz and
                   Felix Heide and
                   Jonathan Assouline and
                   Paul Boucher and
                   Emmanuel Onzon and
                   Andreas Moshovos},
      editor    = {Hillery C. Hunter and
                   Jaime Moreno and
                   Joel S. Emer and
                   Daniel S{\'{a}}nchez},
      title     = {{IDEAL}: Image DEnoising AcceLerator},
      booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium
                   on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18,
                   2017},
      pages     = {82--95},
      publisher = {{ACM}},
      year      = {2017},
      url       = {https://doi.org/10.1145/3123939.3123941},
      doi       = {10.1145/3123939.3123941},
      timestamp = {Wed, 11 Aug 2021 11:51:26 +0200},
      biburl    = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib},
      bibsource = {dblp computer science bibliography, https://dblp.org}
    }
    

Presentations

  1. DietCode: Automatic Optimization for Dynamic Tensor Programs

    Annual Apache TVM Conference (TVMCon), December 2021 [PPTX Slides, PDF Slides]

    Google, May 2023 [PPTX Slides, PDF Slides]

  2. Recent Trends in Machine Learning Compilers [PPTX Slides, PDF Slides]

    Amazon, March 2022

  3. Recent Trends in GPU Memory Footprint Reduction Techniques (Ep.3 Weight Placement for EX-Large Models) [PPTX Slides, PDF Slides]

    Amazon, February 2022

  4. Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]

    AMD, July 2021

  5. Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]

    Vector Institute, November 2019

  6. Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]

    Presented with Shang Wang.

    IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018

  7. EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization [PDF Slides, Poster]

    NSERC Computing Hardware for Emerging Intelligent Sensing Applications Annual General Meeting (COHESA AGM), July 2018