
George E. Dahl
- PhD Machine Learning Group
- Department of Computer Science
- University of Toronto
- Ontario, Canada
- email: Can be easily derived from the URL for this page.
About me
I am currently a research scientist at Google on the Brain team in Mountain View. I graduated from the U of T Machine Learning Group and my supervisor was Geoffrey Hinton.During my PhD, my collaborators and I trained the first successful deep acoustic models for automatic speech recognition. I also led the team that won the Merck molecular activity challenge on Kaggle.
Research interests
- deep learning (and how to get good results with deep learning)
- natural language processing
- most of statistical machine learning
Selected Publications
My Google scholar profile is sometimes more current.-
Benchmarking Neural Network Training Algorithms
[arXiv] [bibtex coming soon] -
Predicting the utility of search spaces for black-box optimization: a simple, budget-aware approach
[AISTATS] [bibtex] -
On Empirical Comparisons of Optimizers for Deep Learning
[arXiv] [bibtex] -
Faster Neural Network Training with Data Echoing
[arXiv] [bibtex] -
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model
[NeurIPS] [arXiv] [bibtex] -
Measuring the Effects of Data Parallelism on Neural Network Training
In JMLR [JMLR] [arXiv] [bibtex] [blog post] [raw data] -
Artificial Intelligence-Based Breast Cancer Nodal Metastasis Detection: Insights Into the Black Box for Pathologists
In Archives of pathology & laboratory medicine [archivesofpathology] [pdf] [bibtex coming soon] -
The Importance of Generation Order in Language Modeling
In EMNLP 2018 [ACL] [arXiv] [bibtex] -
Peptide-Spectra Matching from Weak Supervision
[arXiv] [bibtex coming soon] -
Motivating the Rules of the Game for Adversarial Example Research
[arXiv] [bibtex coming soon] -
A deep learning approach to pattern recognition for short DNA sequences
[bioRxiv] [bibtex] -
Parallel Architecture and Hyperparameter Search via Successive Halving and Classification
[arXiv] [bibtex] -
Embedding Text in Hyperbolic Spaces
In TextGraphs 2018 [arXiv] [bibtex] -
Large scale distributed neural network training through online distillation
In ICLR 2018 [abstract] [pdf] [arXiv] [bibtex] -
Neural Message Passing for Quantum Chemistry
In ICML 2017 [abstract] [pdf] [supplementary material] [arXiv] [bibtex] [blog post]
-
Prediction errors of molecular machine learning models lower than hybrid DFT error
In J. Chem. Theory Comput. [ACS] [bibtex coming soon]
-
Fast machine learning models of electronic and energetic properties consistently reach approximation errors better than DFT accuracy
ArXiv, 2017. [arXiv] [bibtex coming soon]
This is a preprint of the definitive journal version above. -
Detecting Cancer Metastases on Gigapixel Pathology Images
ArXiv, 2017. [arXiv] [bibtex coming soon] [blog post]
-
Deep learning approaches to problems in speech recognition, computational chemistry, and natural language text processing
Ph.D. thesis, 2015. [pdf] [bibtex]
My dissertation includes an exposition of my own personal approach to machine learning suitable for non-specialist readers as well as some material not published elsewhere. -
Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships
In Journal of Chemical Information and Modeling, 2015. [ACS] [pdf draft] [bibtex]
This paper describes experiments performed at Merck using my code on the contest data. I answer some questions about my team's winning Kaggle solution in this blog post. -
Multi-task Neural Networks for QSAR Predictions
ArXiv, 2014. [arXiv] [bibtex]
This paper includes experiments on public data using similar methods to what my team used to win the Merck Kaggle contest. -
Improvements to Deep Convolutional Neural Networks for LVCSR
In ASRU 2013 [pdf] [bibtex]
This paper includes experiments completing the investigation of dropout and ReLUs with full sequence training started in "Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout" -
On the importance of initialization and momentum in deep learning
In ICML 2013 [pdf] [bibtex]
-
Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout
In ICASSP 2013 [pdf] [bibtex]
-
Large-Scale Malware Classification Using Random Projections and Neural Networks
In ICASSP 2013 [pdf] [bibtex]
-
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
IEEE Signal Processing Magazine, 29, November 2012 [pdf] [bibtex]
-
Training Restricted Boltzmann Machines on Word Observations
In ICML 2012 [pdf] [arXiv preprint] [bibtex] [alias method pseudocode] [poster]
-
Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition
In IEEE Transactions on Audio, Speech, and Language Processing [pdf] [bibtex]
Winner of the 2013 IEEE Signal Processing Society Best Paper Award
-
Large Vocabulary Continuous Speech Recognition with Context-Dependent DBN-HMMs
In ICASSP 2011 [pdf] [bibtex]
This paper is a conference-length version of the journal paper listed immediately above. -
Deep Belief Networks Using Discriminative Features for Phone Recognition
In ICASSP 2011 [pdf] [bibtex]
-
Acoustic Modeling using Deep Belief Networks
In IEEE Trans. on Audio, Speech and Language Processing. [pdf] [bibtex]
-
Deep Belief Networks for Phone Recognition
In NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009. [pdf] [bibtex]
The journal version of this work (listed immediately above) should be viewed as the definitive version. -
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
In Advances in Neural Information Processing Systems 23, 2010. [pdf] [bibtex]
-
Incorporating Side Information into Probabilistic Matrix Factorization Using Gaussian Processes
In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010. [pdf] [bibtex] [code]
Code
The code below is (as of 2019) several years old and you probably don't want most of it at this point. Nowadays, you should use something like JAX.
I have implemented a version of the Hessian Free (truncated Newton) optimization approach that is based on James Martens's exposition of it in his paper that explored using HF for deep learning (please see James Martens's research page). My particular implementation was made possible with Ilya Sutskever's guidance and some of the implementation choices have been made to make it easier to compare my code to various optimizers he has written. Despite Ilya's generous assistance, any bugs or defects that might exist in the code I post here are my own. Please see Ilya's publication page for code he has released for HF and recurrent neural nets. It isn't too difficult to wrap his recurrent neural net model code in a way that lets my optimizer code optimize it. Without further ado, here is the code. The file is large because it also contains a copy of the curves dataset. The code requires gnumpy to run and I recommend using cudamat, written by Volodymyr Mnih, and running the code on a GPU and not in the slower simulation mode of gnumpy.
I have some python code (once again using gnumpy) I am tentatively dubbing gdbn. In it, I have implemented (RBM) pre-trained deep neural nets (sometimes called DBNs). A gzipped copy of the data needed to run the example can be downloaded here. This is just an initial release for now, hopefully later there will be more features and even some documentation.
I have just released (7/7/2015) a new python deep neural net library on bitbucket called gdnn. It supports learning embeddings, hierarchical softmax output layers, full DAG layer connectivity, and of course the deep neural net essentials.