Papers and Talks¶

Research Publications¶

Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs
Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko

To appear in IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2023

Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs [Paper]

Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko

International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2023

BibTeX

@inproceedings{Hidet,
  author       = {Yaoyao Ding and
                  Cody Hao Yu and
                  Bojian Zheng and
                  Yizhi Liu and
                  Yida Wang and
                  Gennady Pekhimenko},
  editor       = {Tor M. Aamodt and
                  Natalie D. Enright Jerger and
                  Michael M. Swift},
  title        = {Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor
                  Programs},
  booktitle    = {Proceedings of the 28th {ACM} International Conference on Architectural
                  Support for Programming Languages and Operating Systems, Volume 2,
                  {ASPLOS} 2023, Vancouver, BC, Canada, March 25-29, 2023},
  pages        = {370--384},
  publisher    = {{ACM}},
  year         = {2023},
  url          = {https://doi.org/10.1145/3575693.3575702},
  doi          = {10.1145/3575693.3575702},
  timestamp    = {Thu, 02 Feb 2023 08:48:06 +0100},
  biburl       = {https://dblp.org/rec/conf/asplos/DingYZLWP23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction [Paper]

Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko

Neural Information Processing Systems (NeurIPS), November 2022

BibTeX

@inproceedings{Tempo,
  author       = {Muralidhar Andoorveedu and
                  Zhanda Zhu and
                  Bojian Zheng and
                  Gennady Pekhimenko},
  title        = {Tempo: Accelerating Transformer-Based Model Training through Memory
                  Footprint Reduction},
  booktitle    = {NeurIPS},
  year         = {2022},
  url          = {http://papers.nips.cc/paper\_files/paper/2022/hash/4fc81f4cd2715d995018e0799262176b-Abstract-Conference.html},
  timestamp    = {Thu, 11 May 2023 17:08:21 +0200},
  biburl       = {https://dblp.org/rec/conf/nips/AndoorveeduZZP22.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

DietCode: Automatic Optimization for Dynamic Tensor Programs [Paper, PPTX Slides, PDF Slides]

Bojian Zheng, Ziheng Jiang (co-authored), Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko

Machine Learning and Systems (MLSys), August 2022

BibTeX

@inproceedings{DietCode,
  author    = {Zheng, Bojian and
               Jiang, Ziheng and
               Yu, Cody Hao and
               Shen, Haichen and
               Fromm, Joshua and
               Liu, Yizhi and
               Wang, Yida and
               Ceze, Luis and
               Chen, Tianqi and
               Pekhimenko, Gennady},
  booktitle = {Proceedings of Machine Learning and Systems},
  editor    = {D. Marculescu and Y. Chi and C. Wu},
  pages     = {848--863},
  title     = {DietCode: Automatic Optimization for Dynamic Tensor Programs},
  url       = {https://proceedings.mlsys.org/paper_files/paper/2022/hash/f89b79c9a28d4cae22ef9e557d9fa191-Abstract.html},
  volume    = {4},
  year      = {2022}
}

Automatic Horizontal Fusion for GPU Kernels [Paper]

Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long

International Symposium on Code Generation and Optimization (CGO), April 2022

BibTeX

@inproceedings{HFuse,
 author    = {Ao Li and
              Bojian Zheng and
              Gennady Pekhimenko and
              Fan Long},
 editor    = {Jae W. Lee and
              Sebastian Hack and
              Tatiana Shpeisman},
 title     = {Automatic Horizontal Fusion for {GPU} Kernels},
 booktitle = {{IEEE/ACM} International Symposium on Code Generation and Optimization,
              {CGO} 2022, Seoul, Korea, Republic of, April 2-6, 2022},
 pages     = {14--27},
 publisher = {{IEEE}},
 year      = {2022},
 url       = {https://doi.org/10.1109/CGO53902.2022.9741270},
 doi       = {10.1109/CGO53902.2022.9741270},
 timestamp = {Fri, 01 Apr 2022 09:28:21 +0200},
 biburl    = {https://dblp.org/rec/conf/cgo/LiZPL22.bib},
 bibsource = {dblp computer science bibliography, https://dblp.org}
}

Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [Paper, PPTX Slides, PDF Slides, Talk]

Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko

ACM/IEEE International Symposium on Computer Architecture (ISCA), June 2020

BibTeX

@inproceedings{Echo,
  author    = {Bojian Zheng and
               Nandita Vijaykumar and
               Gennady Pekhimenko},
  title     = {Echo: Compiler-based {GPU} Memory Footprint Reduction for {LSTM} {RNN}
               Training},
  booktitle = {47th {ACM/IEEE} Annual International Symposium on Computer Architecture,
               {ISCA} 2020, Valencia, Spain, May 30 - June 3, 2020},
  pages     = {1089--1102},
  publisher = {{IEEE}},
  year      = {2020},
  url       = {https://doi.org/10.1109/ISCA45697.2020.00092},
  doi       = {10.1109/ISCA45697.2020.00092},
  timestamp = {Fri, 09 Jul 2021 15:51:20 +0200},
  biburl    = {https://dblp.org/rec/conf/isca/ZhengVP20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

TBD Suite: Benchmarking and Profiling Tools for DNNs [Abstract]

Geoffrey Yu, Hongyu Zhu, Anand Jayarajan, Bojian Zheng, Abhishek Tiwari, Gennady Pekhimenko

Machine Learning and Systems Conference Demonstration (MLSys Demo), March 2019

BibTeX

@inproceedings{TBDSuite_Demo,
  author    = {Geoffrey Yu and
               Hongyu Zhu and
               Anand Jayarajan and
               Bojian Zheng and
               Abhishek Tiwari and
               Gennady Pekhimenko},
  title     = {{TBD} Suite: Benchmarking and Profiling Tools for {DNNs}},
  booktitle = {Machine Learning and Systems Conference Demonstration},
  year      = {2019},
  url       = {https://mlsys.org/Conferences/2019/doc/2019/demo_24.pdf},
}

EcoRNN: Efficient Computing of LSTM RNN on GPUs [Abstract, Poster, PDF Slides]

Bojian Zheng, Gennady Pekhimenko

IEEE/ACM International Symposium on Microarchitecture Student Research Competition (MICRO SRC, 3rd Place), October 2018

BibTeX

@inproceedings{EcoRNN,
  author    = {Bojian Zheng and
               Gennady Pekhimenko},
  title     = {{EcoRNN}: Efficient Computing of {LSTM RNN} on {GPUs}},
  booktitle = {IEEE/ACM International Symposium on Microarchitecture Student Research Competition},
  year      = {2018},
  url       = {https://www.microarch.org/micro51/SRC/posters/20_zheng.pdf},
}

Benchmarking and Analyzing Deep Neural Network Training [Paper]

Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko

IEEE International Symposium on Workload Characterization (IISWC), September 2018

BibTeX

@inproceedings{TBD,
  author    = {Hongyu Zhu and
               Mohamed Akrout and
               Bojian Zheng and
               Andrew Pelegris and
               Anand Jayarajan and
               Amar Phanishayee and
               Bianca Schroeder and
               Gennady Pekhimenko},
  title     = {Benchmarking and Analyzing Deep Neural Network Training},
  booktitle = {2018 {IEEE} International Symposium on Workload Characterization,
               {IISWC} 2018, Raleigh, NC, USA, September 30 - October 2, 2018},
  pages     = {88--100},
  publisher = {{IEEE} Computer Society},
  year      = {2018},
  url       = {https://doi.org/10.1109/IISWC.2018.8573476},
  doi       = {10.1109/IISWC.2018.8573476},
  timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
  biburl    = {https://dblp.org/rec/conf/iiswc/ZhuAZPJPSP18.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

IDEAL: Image DEnoising AcceLerator [Paper]

Mostafa Mahmoud, Bojian Zheng, Alberto Delmás Lascorz, Felix Heide, Jonathan Assouline, Paul Boucher, Emmanuel Onzon, Andreas Moshovos

IEEE/ACM International Symposium on Microarchitecture (MICRO), October 2017

BibTeX

@inproceedings{IDEAL,
  author    = {Mostafa Mahmoud and
               Bojian Zheng and
               Alberto Delmas Lascorz and
               Felix Heide and
               Jonathan Assouline and
               Paul Boucher and
               Emmanuel Onzon and
               Andreas Moshovos},
  editor    = {Hillery C. Hunter and
               Jaime Moreno and
               Joel S. Emer and
               Daniel S{\'{a}}nchez},
  title     = {{IDEAL}: Image DEnoising AcceLerator},
  booktitle = {Proceedings of the 50th Annual {IEEE/ACM} International Symposium
               on Microarchitecture, {MICRO} 2017, Cambridge, MA, USA, October 14-18,
               2017},
  pages     = {82--95},
  publisher = {{ACM}},
  year      = {2017},
  url       = {https://doi.org/10.1145/3123939.3123941},
  doi       = {10.1145/3123939.3123941},
  timestamp = {Wed, 11 Aug 2021 11:51:26 +0200},
  biburl    = {https://dblp.org/rec/conf/micro/MahmoudZLHABOM17.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Talks and Presentations¶

DietCode: Automatic Optimization for Dynamic Tensor Programs [PPTX Slides, PDF Slides]
Annual Apache TVM Conference (TVMCon), December 2021

Google, May 2023
Recent Trends in Machine Learning Compilers [PPTX Slides, PDF Slides]
Amazon, March 2022
Recent Trends in GPU Memory Footprint Reduction Techniques (Ep.3 Weight Placement for EX-Large Models) [PPTX Slides, PDF Slides]
Amazon, February 2022
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training [PPTX Slides, PDF Slides]
AMD, July 2021
Automatic Mixed Precision (AMP) Training [PPTX Slides, PDF Slides]
Vector Institute, November 2019
Recent Trends in Machine Learning Compilers: A Survey [PPTX Slides, PDF Slides]
Presented with Shang Wang.

IBM CASCON Workshop on Compiler Driven Performance (CDP), October 2018

/home/bojian$ history

External Links

Papers and Talks¶

Research Publications¶

Talks and Presentations¶