Papers and Presentations¶
Papers¶
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs
Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko
To appear in IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023
Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]
Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko
International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023
BibTeX
@inproceedings{Hidet, author = {Yaoyao Ding and Cody Hao Yu and Bojian Zheng and Yizhi Liu and Yida Wang and Gennady Pekhimenko}, editor = {Tor M. Aamodt and Natalie D. Enright Jerger and Michael M. Swift}, title = {Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs}, booktitle = {Proceedings of the 28th {ACM} International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, {ASPLOS} 2023, Vancouver, BC, Canada, March 25-29, 2023}, pages = {370--384}, publisher = {{ACM}}, year = {2023}, url = {https://doi.org/10.1145/3575693.3575702}, doi = {10.1145/3575693.3575702}, timestamp = {Thu, 02 Feb 2023 08:48:06 +0100}, biburl = {https://dblp.org/rec/conf/asplos/DingYZLWP23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]
Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko
Neural Information Processing Systems (NeurIPS), November 2022
BibTeX
@inproceedings{Tempo, author = {Muralidhar Andoorveedu and Zhanda Zhu and Bojian Zheng and Gennady Pekhimenko}, title = {Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction}, booktitle = {NeurIPS}, year = {2022}, url = {http://papers.nips.cc/paper\_files/paper/2022/hash/4fc81f4cd2715d995018e0799262176b-Abstract-Conference.html}, timestamp = {Thu, 11 May 2023 17:08:21 +0200}, biburl = {https://dblp.org/rec/conf/nips/AndoorveeduZZP22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, PPTX Slides, PDF Slides]
Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko
Machine Learning and Systems (MLSys), August 2022
BibTeX
@inproceedings{DietCode, author = {Zheng, Bojian and Jiang, Ziheng and Yu, Cody Hao and Shen, Haichen and Fromm, Joshua and Liu, Yizhi and Wang, Yida and Ceze, Luis and Chen, Tianqi and Pekhimenko, Gennady}, booktitle = {Proceedings of Machine Learning and Systems}, editor = {D. Marculescu and Y. Chi and C. Wu}, pages = {848--863}, title = {DietCode: Automatic Optimization for Dynamic Tensor Programs}, url = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/f89b79c9a28d4cae22ef9e557d9fa191-Abstract.html}, volume = {4}, year = {2022} }
Automatic Horizontal Fusion for GPU Kernels [Paper]
Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long
International Symposium on Code Generation and Optimization (CGO), April 2022
BibTeX
@inproceedings{HFuse, author = {Ao Li and Bojian Zheng and Gennady Pekhimenko and Fan Long}, editor = {Jae W. Lee and Sebastian Hack and Tatiana Shpeisman}, title = {Automatic Horizontal Fusion for {GPU} Kernels}, booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization, {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022}, pages = {14--27}, publisher = {{IEEE}}, year = {2022}, url = {https://doi.org/10.1109/CGO53902.2022.9741270}, doi = {10.1109/CGO53902.2022.9741270}, timestamp = {Fri, 01 Apr 2022 09:28:21 +0200}, biburl = {https://dblp.org/rec/conf/cgo/LiZPL22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]
Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko
ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020
BibTeX
@inproceedings{Echo, author = {Bojian Zheng and Nandita Vijaykumar and Gennady Pekhimenko}, title = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN} Training}, booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture, {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020}, pages = {1089--1102}, publisher = {{IEEE}}, year = {2020}, url = {https://doi.org/10.1109/ISCA45697.2020.00092}, doi = {10.1109/ISCA45697.2020.00092}, timestamp = {Fri, 09 Jul 2021 15:51:20 +0200}, biburl = {https://dblp.org/rec/conf/isca/ZhengVP20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]
Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko
Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019
BibTeX
@inproceedings{TBDSuite_Demo, author = {Geoffrey Yu and Hongyu Zhu and Anand Jayarajan and Bojian Zheng and Abhishek Tiwari and Gennady Pekhimenko}, title = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}}, booktitle = {Machine Learning and Systems Conference Demonstration}, year = {2019}, url = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf}, }
EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]
Bojian Zheng, Gennady Pekhimenko
IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO SRC, 3rd Place), October 2018
BibTeX
@inproceedings{EcoRNN, author = {Bojian Zheng and Gennady Pekhimenko}, title = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}}, booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition}, year = {2018}, url = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf}, }
Benchmarking and Analyzing Deep Neural Network Training [Paper]
Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko
IEEE International Symposium on Workload Characterization (IISWC), September 2018
BibTeX
@inproceedings{TBD, author = {Hongyu Zhu and Mohamed Akrout and Bojian Zheng and Andrew Pelegris and Anand Jayarajan and Amar Phanishayee and Bianca Schroeder and Gennady Pekhimenko}, title = {Benchmarking and Analyzing Deep Neural Network Training}, booktitle = {2018 {IEEE} International Symposium on Workload Characterization, {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018}, pages = {88--100}, publisher = {{IEEE} Computer Society}, year = {2018}, url = {https://doi.org/10.1109/IISWC.2018.8573476}, doi = {10.1109/IISWC.2018.8573476}, timestamp = {Wed, 16 Oct 2019 14:14:56 +0200}, biburl = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
IDEAL: Image DEnoising AcceLerator [Paper]
Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos
IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017
BibTeX
@inproceedings{IDEAL, author = {Mostafa Mahmoud and Bojian Zheng and Alberto Delmas Lascorz and Felix Heide and Jonathan Assouline and Paul Boucher and Emmanuel Onzon and Andreas Moshovos}, editor = {Hillery C. Hunter and Jaime Moreno and Joel S. Emer and Daniel S{\'{a}}nchez}, title = {{IDEAL}: Image DEnoising AcceLerator}, booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18, 2017}, pages = {82--95}, publisher = {{ACM}}, year = {2017}, url = {https://doi.org/10.1145/3123939.3123941}, doi = {10.1145/3123939.3123941}, timestamp = {Wed, 11 Aug 2021 11:51:26 +0200}, biburl = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }
Presentations¶
DietCode: Automatic Optimization for Dynamic Tensor Programs [PPTX Slides, PDF Slides]
Annual Apache TVM Conference (TVMCon), December 2021
Google, May 2023
Recent Trends in Machine Learning Compilers [PPTX Slides, PDF Slides]
Amazon, March 2022
Recent Trends in GPU Memory Footprint Reduction Techniques (Ep.3 Weight Placement for EX-Large Models) [PPTX Slides, PDF Slides]
Amazon, February 2022
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]
AMD, July 2021
Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]
Vector Institute, November 2019
Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]
Presented with Shang Wang.
IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018
EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization [PDF Slides, Poster]
NSERC Computing Hardware for Emerging Intelligent Sensing Applications Annual General Meeting (COHESA AGM), July 2018