Papers and Presentations¶

Papers¶

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs [Paper, Code, PPTX Slides, PDF Slides, Poster]

Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko

IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023

BibTeX

@Inproceedings{Grape,
  author    = {Bojian Zheng and
               Cody Hao Yu and
               Jie Wang and
               Yaoyao Ding and
               Yizhi Liu and
               Yida Wang and
               Gennady Pekhimenko},
  title     = {{Grape}: Practical and efficient graph-based executions for dynamic deep neural networks on {GPUs}},
  year      = {2023},
  url       = {https://www.amazon.science/publications/grape-practical-and-efficient-graph-based-executions-for-dynamic-deep-neural-networks-on-gpus},
  booktitle = {IEEE/ACM International Symposium on Microarchitecture (MICRO ’23)},
}

Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]

Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023

BibTeX

@inproceedings{Hidet,
  author       = {Yaoyao Ding and
                  Cody Hao Yu and
                  Bojian Zheng and
                  Yizhi Liu and
                  Yida Wang and
                  Gennady Pekhimenko},
  editor       = {Tor M. Aamodt and
                  Natalie D. Enright Jerger and
                  Michael M. Swift},
  title        = {Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor
                  Programs},
  booktitle    = {Proceedings of the 28th {ACM} International Conference on Architectural
                  Support for Programming Languages and Operating Systems, Volume 2,
                  {ASPLOS} 2023, Vancouver, BC, Canada, March 25-29, 2023},
  pages        = {370--384},
  publisher    = {{ACM}},
  year         = {2023},
  url          = {https://doi.org/10.1145/3575693.3575702},
  doi          = {10.1145/3575693.3575702},
  timestamp    = {Thu, 02 Feb 2023 08:48:06 +0100},
  biburl       = {https://dblp.org/rec/conf/asplos/DingYZLWP23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]

Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko

Neural Information Processing Systems (NeurIPS), November 2022

BibTeX

@inproceedings{Tempo,
  author       = {Muralidhar Andoorveedu and
                  Zhanda Zhu and
                  Bojian Zheng and
                  Gennady Pekhimenko},
  title        = {Tempo: Accelerating Transformer-Based Model Training through Memory
                  Footprint Reduction},
  booktitle    = {NeurIPS},
  year         = {2022},
  url          = {http://papers.nips.cc/paper\_files/paper/2022/hash/4fc81f4cd2715d995018e0799262176b-Abstract-Conference.html},
  timestamp    = {Thu, 11 May 2023 17:08:21 +0200},
  biburl       = {https://dblp.org/rec/conf/nips/AndoorveeduZZP22.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, Code, PPTX Slides, PDF Slides]

Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko

Machine Learning and Systems (MLSys), August 2022

BibTeX

@inproceedings{DietCode,
  author    = {Zheng, Bojian and
               Jiang, Ziheng and
               Yu, Cody Hao and
               Shen, Haichen and
               Fromm, Joshua and
               Liu, Yizhi and
               Wang, Yida and
               Ceze, Luis and
               Chen, Tianqi and
               Pekhimenko, Gennady},
  booktitle = {Proceedings of Machine Learning and Systems},
  editor    = {D. Marculescu and Y. Chi and C. Wu},
  pages     = {848--863},
  title     = {DietCode: Automatic Optimization for Dynamic Tensor Programs},
  url       = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/f89b79c9a28d4cae22ef9e557d9fa191-Abstract.html},
  volume    = {4},
  year      = {2022}
}

Automatic Horizontal Fusion for GPU Kernels [Paper]

Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long

International Symposium on Code Generation and Optimization (CGO), April 2022

BibTeX

@inproceedings{HFuse,
 author    = {Ao Li and
              Bojian Zheng and
              Gennady Pekhimenko and
              Fan Long},
 editor    = {Jae W. Lee and
              Sebastian Hack and
              Tatiana Shpeisman},
 title     = {Automatic Horizontal Fusion for {GPU} Kernels},
 booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization,
              {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022},
 pages     = {14--27},
 publisher = {{IEEE}},
 year      = {2022},
 url       = {https://doi.org/10.1109/CGO53902.2022.9741270},
 doi       = {10.1109/CGO53902.2022.9741270},
 timestamp = {Fri, 01 Apr 2022 09:28:21 +0200},
 biburl    = {https://dblp.org/rec/conf/cgo/LiZPL22.bib},
 bibsource = {dblp computer science bibliography, https://dblp.org}
}

Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]

Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko

ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020

BibTeX

@inproceedings{Echo,
  author    = {Bojian Zheng and
               Nandita Vijaykumar and
               Gennady Pekhimenko},
  title     = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN}
               Training},
  booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture,
               {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020},
  pages     = {1089--1102},
  publisher = {{IEEE}},
  year      = {2020},
  url       = {https://doi.org/10.1109/ISCA45697.2020.00092},
  doi       = {10.1109/ISCA45697.2020.00092},
  timestamp = {Fri, 09 Jul 2021 15:51:20 +0200},
  biburl    = {https://dblp.org/rec/conf/isca/ZhengVP20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]

Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko

Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019

BibTeX

@inproceedings{TBDSuite_Demo,
  author    = {Geoffrey Yu and
               Hongyu Zhu and
               Anand Jayarajan and
               Bojian Zheng and
               Abhishek Tiwari and
               Gennady Pekhimenko},
  title     = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}},
  booktitle = {Machine Learning and Systems Conference Demonstration},
  year      = {2019},
  url       = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf},
}

EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]

Bojian Zheng, Gennady Pekhimenko

IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO SRC, 3rd Place), October 2018

BibTeX

@inproceedings{EcoRNN,
  author    = {Bojian Zheng and
               Gennady Pekhimenko},
  title     = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}},
  booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition},
  year      = {2018},
  url       = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf},
}

Benchmarking and Analyzing Deep Neural Network Training [Paper]

Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

IEEE International Symposium on Workload Characterization (IISWC), September 2018

BibTeX

@inproceedings{TBD,
  author    = {Hongyu Zhu and
               Mohamed Akrout and
               Bojian Zheng and
               Andrew Pelegris and
               Anand Jayarajan and
               Amar Phanishayee and
               Bianca Schroeder and
               Gennady Pekhimenko},
  title     = {Benchmarking and Analyzing Deep Neural Network Training},
  booktitle = {2018 {IEEE} International Symposium on Workload Characterization,
               {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018},
  pages     = {88--100},
  publisher = {{IEEE} Computer Society},
  year      = {2018},
  url       = {https://doi.org/10.1109/IISWC.2018.8573476},
  doi       = {10.1109/IISWC.2018.8573476},
  timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
  biburl    = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

IDEAL: Image DEnoising AcceLerator [Paper]

Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos

IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017

BibTeX

@inproceedings{IDEAL,
  author    = {Mostafa Mahmoud and
               Bojian Zheng and
               Alberto Delmas Lascorz and
               Felix Heide and
               Jonathan Assouline and
               Paul Boucher and
               Emmanuel Onzon and
               Andreas Moshovos},
  editor    = {Hillery C. Hunter and
               Jaime Moreno and
               Joel S. Emer and
               Daniel S{\'{a}}nchez},
  title     = {{IDEAL}: Image DEnoising AcceLerator},
  booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium
               on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18,
               2017},
  pages     = {82--95},
  publisher = {{ACM}},
  year      = {2017},
  url       = {https://doi.org/10.1145/3123939.3123941},
  doi       = {10.1145/3123939.3123941},
  timestamp = {Wed, 11 Aug 2021 11:51:26 +0200},
  biburl    = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Presentations¶

DietCode: Automatic Optimization for Dynamic Tensor Programs
Annual Apache TVM Conference (TVMCon), December 2021 [PPTX Slides, PDF Slides]

Google, May 2023 [PPTX Slides, PDF Slides]
Recent Trends in Machine Learning Compilers [PPTX Slides, PDF Slides]
Amazon, March 2022
Recent Trends in GPU Memory Footprint Reduction Techniques (Ep.3 Weight Placement for EX-Large Models) [PPTX Slides, PDF Slides]
Amazon, February 2022
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]
AMD, July 2021
Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]
Vector Institute, November 2019
Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]
Presented with Shang Wang.

IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018
EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization [PDF Slides, Poster]
NSERC Computing Hardware for Emerging Intelligent Sensing Applications Annual General Meeting (COHESA AGM), July 2018

/home/bojian$ history

External Links

Papers and Presentations¶

Papers¶

Presentations¶