Papers and Presentations¶
Papers¶
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs [Paper, Code, PPTX Slides, PDF Slides, Poster]
Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko
IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023
BibTeX
@Inproceedings{Grape, author = {Bojian Zheng and Cody Hao Yu and Jie Wang and Yaoyao Ding and Yizhi Liu and Yida Wang and Gennady Pekhimenko}, title = {{Grape}: Practical and efficient graph-based executions for dynamic deep neural networks on {GPUs}}, year = {2023}, url = {https://www.amazon.science/publications/grape-practical-and-efficient-graph-based-executions-for-dynamic-deep-neural-networks-on-gpus}, booktitle = {IEEE/ACM International Symposium on Microarchitecture (MICRO ’23)}, }
Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]
Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023
BibTeX
@inproceedings{Hidet, author = {Yaoyao Ding and Cody Hao Yu and Bojian Zheng and Yizhi Liu and Yida Wang and Gennady Pekhimenko}, editor = {Tor M. Aamodt and Natalie D. Enright Jerger and Michael M. Swift}, title = {Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs}, booktitle = {Proceedings of the 28th {ACM} International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, {ASPLOS} 2023, Vancouver, BC, Canada, March 25-29, 2023}, pages = {370--384}, publisher = {{ACM}}, year = {2023}, url = {https://doi.org/10.1145/3575693.3575702}, doi = {10.1145/3575693.3575702}, timestamp = {Thu, 02 Feb 2023 08:48:06 +0100}, biburl = {https://dblp.org/rec/conf/asplos/DingYZLWP23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]
Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko
Neural Information Processing Systems (NeurIPS), November 2022
BibTeX
@inproceedings{Tempo, author = {Muralidhar Andoorveedu and Zhanda Zhu and Bojian Zheng and Gennady Pekhimenko}, title = {Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction}, booktitle = {NeurIPS}, year = {2022}, url = {http://papers.nips.cc/paper\_files/paper/2022/hash/4fc81f4cd2715d995018e0799262176b-Abstract-Conference.html}, timestamp = {Thu, 11 May 2023 17:08:21 +0200}, biburl = {https://dblp.org/rec/conf/nips/AndoorveeduZZP22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, Code, PPTX Slides, PDF Slides]
Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko
Machine Learning and Systems (MLSys), August 2022
BibTeX
@inproceedings{DietCode, author = {Zheng, Bojian and Jiang, Ziheng and Yu, Cody Hao and Shen, Haichen and Fromm, Joshua and Liu, Yizhi and Wang, Yida and Ceze, Luis and Chen, Tianqi and Pekhimenko, Gennady}, booktitle = {Proceedings of Machine Learning and Systems}, editor = {D. Marculescu and Y. Chi and C. Wu}, pages = {848--863}, title = {DietCode: Automatic Optimization for Dynamic Tensor Programs}, url = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/f89b79c9a28d4cae22ef9e557d9fa191-Abstract.html}, volume = {4}, year = {2022} }
Automatic Horizontal Fusion for GPU Kernels [Paper]
Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long
International Symposium on Code Generation and Optimization (CGO), April 2022
BibTeX
@inproceedings{HFuse, author = {Ao Li and Bojian Zheng and Gennady Pekhimenko and Fan Long}, editor = {Jae W. Lee and Sebastian Hack and Tatiana Shpeisman}, title = {Automatic Horizontal Fusion for {GPU} Kernels}, booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization, {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022}, pages = {14--27}, publisher = {{IEEE}}, year = {2022}, url = {https://doi.org/10.1109/CGO53902.2022.9741270}, doi = {10.1109/CGO53902.2022.9741270}, timestamp = {Fri, 01 Apr 2022 09:28:21 +0200}, biburl = {https://dblp.org/rec/conf/cgo/LiZPL22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]
Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko
ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020
BibTeX
@inproceedings{Echo, author = {Bojian Zheng and Nandita Vijaykumar and Gennady Pekhimenko}, title = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN} Training}, booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture, {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020}, pages = {1089--1102}, publisher = {{IEEE}}, year = {2020}, url = {https://doi.org/10.1109/ISCA45697.2020.00092}, doi = {10.1109/ISCA45697.2020.00092}, timestamp = {Fri, 09 Jul 2021 15:51:20 +0200}, biburl = {https://dblp.org/rec/conf/isca/ZhengVP20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]
Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko
Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019
BibTeX
@inproceedings{TBDSuite_Demo, author = {Geoffrey Yu and Hongyu Zhu and Anand Jayarajan and Bojian Zheng and Abhishek Tiwari and Gennady Pekhimenko}, title = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}}, booktitle = {Machine Learning and Systems Conference Demonstration}, year = {2019}, url = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf}, }
EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]
Bojian Zheng, Gennady Pekhimenko
IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO SRC, 3rd Place), October 2018
BibTeX
@inproceedings{EcoRNN, author = {Bojian Zheng and Gennady Pekhimenko}, title = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}}, booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition}, year = {2018}, url = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf}, }
Benchmarking and Analyzing Deep Neural Network Training [Paper]
Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko
IEEE International Symposium on Workload Characterization (IISWC), September 2018
BibTeX
@inproceedings{TBD, author = {Hongyu Zhu and Mohamed Akrout and Bojian Zheng and Andrew Pelegris and Anand Jayarajan and Amar Phanishayee and Bianca Schroeder and Gennady Pekhimenko}, title = {Benchmarking and Analyzing Deep Neural Network Training}, booktitle = {2018 {IEEE} International Symposium on Workload Characterization, {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018}, pages = {88--100}, publisher = {{IEEE} Computer Society}, year = {2018}, url = {https://doi.org/10.1109/IISWC.2018.8573476}, doi = {10.1109/IISWC.2018.8573476}, timestamp = {Wed, 16 Oct 2019 14:14:56 +0200}, biburl = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
IDEAL: Image DEnoising AcceLerator [Paper]
Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos
IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017
BibTeX
@inproceedings{IDEAL, author = {Mostafa Mahmoud and Bojian Zheng and Alberto Delmas Lascorz and Felix Heide and Jonathan Assouline and Paul Boucher and Emmanuel Onzon and Andreas Moshovos}, editor = {Hillery C. Hunter and Jaime Moreno and Joel S. Emer and Daniel S{\'{a}}nchez}, title = {{IDEAL}: Image DEnoising AcceLerator}, booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18, 2017}, pages = {82--95}, publisher = {{ACM}}, year = {2017}, url = {https://doi.org/10.1145/3123939.3123941}, doi = {10.1145/3123939.3123941}, timestamp = {Wed, 11 Aug 2021 11:51:26 +0200}, biburl = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Presentations¶
DietCode: Automatic Optimization for Dynamic Tensor Programs
Annual Apache TVM Conference (TVMCon), December 2021 [PPTX Slides, PDF Slides]
Google, May 2023 [PPTX Slides, PDF Slides]
Recent Trends in Machine Learning Compilers [PPTX Slides, PDF Slides]
Amazon, March 2022
Recent Trends in GPU Memory Footprint Reduction Techniques (Ep.3 Weight Placement for EX-Large Models) [PPTX Slides, PDF Slides]
Amazon, February 2022
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]
AMD, July 2021
Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]
Vector Institute, November 2019
Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]
Presented with Shang Wang.
IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018
EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization [PDF Slides, Poster]
NSERC Computing Hardware for Emerging Intelligent Sensing Applications Annual General Meeting (COHESA AGM), July 2018