George E. Dahl

PhD Machine Learning Group
Department of Computer Science
University of Toronto
Ontario, Canada
email: Can be easily derived from the URL for this page.

About me

I am currently a research scientist at Google on the Brain team in Mountain View. I graduated from the U of T Machine Learning Group and my supervisor was Geoffrey Hinton.

During my PhD, my collaborators and I trained the first successful deep acoustic models for automatic speech recognition. I also led the team that won the Merck molecular activity challenge on Kaggle.

Research interests

deep learning (and how to get good results with deep learning)
natural language processing
most of statistical machine learning

Selected Publications

My Google scholar profile is sometimes more current.

Benchmarking Neural Network Training Algorithms
George E. Dahl, Frank Schneider, Zachary Nado, Naman Agarwal, Chandramouli Shama Sastry, Philipp Hennig, Sourabh Medapati, Runa Eschenhagen, Priya Kasimbeg, Daniel Suo, Juhan Bae, Justin Gilmer, Abel L. Peirson, Bilal Khan, Rohan Anil, Mike Rabbat, Shankar Krishnan, Daniel Snider, Ehsan Amid, Kongtao Chen, Chris J. Maddison, Rakshith Vasudev, Michal Badura, Ankush Garg, Peter Mattson
[arXiv] [bibtex coming soon]
Predicting the utility of search spaces for black-box optimization: a simple, budget-aware approach
Setareh Ariafar, Justin Gilmer, Zachary Nado, Jasper Snoek, Rodolphe Jenatton, George E. Dahl
[AISTATS] [bibtex]
On Empirical Comparisons of Optimizers for Deep Learning
Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl
[arXiv] [bibtex]
Faster Neural Network Training with Data Echoing
Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl
[arXiv] [bibtex]
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse
[NeurIPS] [arXiv] [bibtex]
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl
In JMLR [JMLR] [arXiv] [bibtex] [blog post] [raw data]
Artificial Intelligence-Based Breast Cancer Nodal Metastasis Detection: Insights Into the Black Box for Pathologists
Yun Liu, Timo Kohlberger, Mohammad Norouzi, George E. Dahl, Jenny L. Smith, Arash Mohtashamian, Niels Olson, Lily H. Peng, Jason D. Hipp, Martin C. Stumpe
In Archives of pathology & laboratory medicine [archivesofpathology] [pdf] [bibtex coming soon]
The Importance of Generation Order in Language Modeling
Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, George E. Dahl
In EMNLP 2018 [ACL] [arXiv] [bibtex]
Peptide-Spectra Matching from Weak Supervision
Samuel S. Schoenholz, Sean Hackett, Laura Deming, Eugene Melamud, Navdeep Jaitly, Fiona McAllister, Jonathon O'Brien, George E. Dahl, Bryson Bennett, Andrew M. Dai, Daphne Koller
[arXiv] [bibtex coming soon]
Motivating the Rules of the Game for Adversarial Example Research
Justin Gilmer, Ryan P. Adams, Ian Goodfellow, David Andersen, George E. Dahl
[arXiv] [bibtex coming soon]
A deep learning approach to pattern recognition for short DNA sequences
Akosua Busia, George E. Dahl, Clara Fannjiang, David H. Alexander, Elizabeth Dorfman, Ryan Poplin, Cory Y. McLean, Pi-Chuan Chang, Mark DePristo
[bioRxiv] [bibtex]
Parallel Architecture and Hyperparameter Search via Successive Halving and Classification
Manoj Kumar, George E. Dahl, Vijay Vasudevan, Mohammad Norouzi
[arXiv] [bibtex]
Embedding Text in Hyperbolic Spaces
Bhuwan Dhingra, Christopher J. Shallue, Mohammad Norouzi, Andrew Dai, George E. Dahl
In TextGraphs 2018 [arXiv] [bibtex]
Large scale distributed neural network training through online distillation
Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton
In ICLR 2018 [abstract] [pdf] [arXiv] [bibtex]
Neural Message Passing for Quantum Chemistry
Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl
In ICML 2017 [abstract] [pdf] [supplementary material] [arXiv] [bibtex] [blog post]
Prediction errors of molecular machine learning models lower than hybrid DFT error
Felix A. Faber, Luke Hutchison, Bing Huang, Justin Gilmer, Samuel S. Schoenholz, George E. Dahl, Oriol Vinyals, Steven Kearnes, Patrick F. Riley, O. Anatole von Lilienfeld
In J. Chem. Theory Comput. [ACS] [bibtex coming soon]
Fast machine learning models of electronic and energetic properties consistently reach approximation errors better than DFT accuracy
Felix A. Faber, Luke Hutchison, Bing Huang, Justin Gilmer, Samuel S. Schoenholz, George E. Dahl, Oriol Vinyals, Steven Kearnes, Patrick F. Riley, O. Anatole von Lilienfeld
ArXiv, 2017. [arXiv] [bibtex coming soon]
This is a preprint of the definitive journal version above.
Detecting Cancer Metastases on Gigapixel Pathology Images
Yun Liu, Krishna Gadepalli, Mohammad Norouzi, George E. Dahl, Timo Kohlberger, Aleksey Boyko, Subhashini Venugopalan, Aleksei Timofeev, Philip Q. Nelson, Greg S. Corrado, Jason D. Hipp, Lily Peng, Martin C. Stumpe
ArXiv, 2017. [arXiv] [bibtex coming soon] [blog post]
Deep learning approaches to problems in speech recognition, computational chemistry, and natural language text processing
George E. Dahl
Ph.D. thesis, 2015. [pdf] [bibtex]
My dissertation includes an exposition of my own personal approach to machine learning suitable for non-specialist readers as well as some material not published elsewhere.
Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships
Junshui Ma, Robert P. Sheridan, Andy Liaw, George E. Dahl, and Vladimir Svetnik
In Journal of Chemical Information and Modeling, 2015. [ACS] [pdf draft] [bibtex]
This paper describes experiments performed at Merck using my code on the contest data. I answer some questions about my team's winning Kaggle solution in this blog post.
Multi-task Neural Networks for QSAR Predictions
George E. Dahl, Navdeep Jaitly and Ruslan Salakhutdinov
ArXiv, 2014. [arXiv] [bibtex]
This paper includes experiments on public data using similar methods to what my team used to win the Merck Kaggle contest.
Improvements to Deep Convolutional Neural Networks for LVCSR
Tara N. Sainath, Brian Kingsbury, Abdel-rahman Mohamed, George E. Dahl, George Saon, Hagen Soltau, Tomas Beran, Aleksandr Y. Aravkin and Bhuvana Ramabhadran
In ASRU 2013 [pdf] [bibtex]
This paper includes experiments completing the investigation of dropout and ReLUs with full sequence training started in "Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout"
On the importance of initialization and momentum in deep learning
Ilya Sutskever, James Martens, George E. Dahl, and Geoffrey E. Hinton
In ICML 2013 [pdf] [bibtex]
Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout
George E. Dahl, Tara N. Sainath, and Geoffrey E. Hinton
In ICASSP 2013 [pdf] [bibtex]
Large-Scale Malware Classification Using Random Projections and Neural Networks
George E. Dahl, Jack W. Stokes, Li Deng, and Dong Yu
In ICASSP 2013 [pdf] [bibtex]
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, and Brian Kingsbury
IEEE Signal Processing Magazine, 29, November 2012 [pdf] [bibtex]
Training Restricted Boltzmann Machines on Word Observations
George E. Dahl, Ryan P. Adams, and Hugo Larochelle
In ICML 2012 [pdf] [arXiv preprint] [bibtex] [alias method pseudocode] [poster]
Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition
George E. Dahl, Dong Yu, Li Deng, and Alex Acero
In IEEE Transactions on Audio, Speech, and Language Processing [pdf] [bibtex]
Winner of the 2013 IEEE Signal Processing Society Best Paper Award
Large Vocabulary Continuous Speech Recognition with Context-Dependent DBN-HMMs
George E. Dahl, Dong Yu, Li Deng, and Alex Acero
In ICASSP 2011 [pdf] [bibtex]
This paper is a conference-length version of the journal paper listed immediately above.
Deep Belief Networks Using Discriminative Features for Phone Recognition
Abdel-rahman Mohamed, Tara N. Sainath, George E. Dahl, Bhuvana Ramabhadran, Geoffrey E. Hinton, and Michael A. Picheny
In ICASSP 2011 [pdf] [bibtex]
Acoustic Modeling using Deep Belief Networks
Abdel-rahman Mohamed, George E. Dahl, and Geoffrey E. Hinton
In IEEE Trans. on Audio, Speech and Language Processing. [pdf] [bibtex]
Deep Belief Networks for Phone Recognition
Abdel-rahman Mohamed, George E. Dahl, Geoffrey E. Hinton
In NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009. [pdf] [bibtex]
The journal version of this work (listed immediately above) should be viewed as the definitive version.
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
George E. Dahl, Marc'Aurelio Ranzato, Abdel-rahman Mohamed, and Geoffrey E. Hinton
In Advances in Neural Information Processing Systems 23, 2010. [pdf] [bibtex]
Incorporating Side Information into Probabilistic Matrix Factorization Using Gaussian Processes
Ryan Prescott Adams, George E. Dahl, and Iain Murray
In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010. [pdf] [bibtex] [code]

Google scholar profile

Code

The code below is (as of 2019) several years old and you probably don't want most of it at this point. Nowadays, you should use something like JAX.

I have implemented a version of the Hessian Free (truncated Newton) optimization approach that is based on James Martens's exposition of it in his paper that explored using HF for deep learning (please see James Martens's research page). My particular implementation was made possible with Ilya Sutskever's guidance and some of the implementation choices have been made to make it easier to compare my code to various optimizers he has written. Despite Ilya's generous assistance, any bugs or defects that might exist in the code I post here are my own. Please see Ilya's publication page for code he has released for HF and recurrent neural nets. It isn't too difficult to wrap his recurrent neural net model code in a way that lets my optimizer code optimize it. Without further ado, here is the code. The file is large because it also contains a copy of the curves dataset. The code requires gnumpy to run and I recommend using cudamat , written by Volodymyr Mnih , and running the code on a GPU and not in the slower simulation mode of gnumpy.

I have some python code (once again using gnumpy ) I am tentatively dubbing gdbn. In it, I have implemented (RBM) pre-trained deep neural nets (sometimes called DBNs). A gzipped copy of the data needed to run the example can be downloaded here . This is just an initial release for now, hopefully later there will be more features and even some documentation.

I have just released (7/7/2015) a new python deep neural net library on bitbucket called gdnn. It supports learning embeddings, hierarchical softmax output layers, full DAG layer connectivity, and of course the deep neural net essentials.