George E. Dahl
- PhD Machine Learning Group
- Department of Computer Science
- University of Toronto
- Ontario, Canada
- email: Can be easily derived from the URL for this page.
About meI am a recent graduate of the Machine Learning Group and my supervisor was Geoffrey Hinton.
I am a recipient of the Microsoft Research PhD Fellowship (2012). I led the team that won the Merck molecular activity challenge on kaggle. I answer some questions about our solution in this blog post and describe some experiments on public data in the tech report below.
- deep learning architectures
- speech recognition and language processing
- undirected graphical models
- most of statistical machine learning
Embedding Text in Hyperbolic Spaces
In TextGraphs 2018 [arXiv] [bibtex]
Large scale distributed neural network training through online distillation
In ICLR 2018 [abstract] [pdf] [arXiv] [bibtex]
Neural Message Passing for Quantum Chemistry
In ICML 2017 [abstract] [pdf] [supplementary material] [arXiv] [bibtex] [blog post]
Prediction errors of molecular machine learning models lower than hybrid DFT error
In J. Chem. Theory Comput. [ACS] [bibtex coming soon]
Fast machine learning models of electronic and energetic properties consistently reach approximation errors better than DFT accuracy
ArXiv, 2017. [arXiv] [bibtex coming soon]
This is a preprint of the definitive journal version above.
Detecting Cancer Metastases on Gigapixel Pathology Images
ArXiv, 2017. [arXiv] [bibtex coming soon] [blog post]
Deep learning approaches to problems in speech recognition, computational chemistry, and natural language text processing
Ph.D. thesis, 2015. [pdf] [bibtex]
My dissertation includes an exposition of my own personal approach to machine learning suitable for non-specialist readers as well as some material not published elsewhere.
Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships
In Journal of Chemical Information and Modeling, 2015. [ACS] [pdf draft] [bibtex]
This paper describes experiments performed at Merck using my code on the contest data.
Multi-task Neural Networks for QSAR Predictions
ArXiv, 2014. [arXiv] [bibtex]
This paper includes experiments on public data using similar methods to what my team used to win the Merck Kaggle contest.
Improvements to Deep Convolutional Neural Networks for LVCSR
In ASRU 2013 [pdf] [bibtex]
This paper includes experiments completing the investigation of dropout and ReLUs with full sequence training started in "Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout"
On the importance of initialization and momentum in deep learning
In ICML 2013 [pdf] [bibtex]
Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout
In ICASSP 2013 [pdf] [bibtex]
Large-Scale Malware Classification Using Random Projections and Neural Networks
In ICASSP 2013 [pdf] [bibtex]
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
IEEE Signal Processing Magazine, 29, November 2012 [pdf] [bibtex]
Training Restricted Boltzmann Machines on Word Observations
In ICML 2012 [pdf] [arXiv preprint] [bibtex] [alias method pseudocode] [poster]
Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition
In IEEE Transactions on Audio, Speech, and Language Processing [pdf] [bibtex]
Winner of the 2013 IEEE Signal Processing Society Best Paper Award
Large Vocabulary Continuous Speech Recognition with Context-Dependent DBN-HMMs
In ICASSP 2011 [pdf] [bibtex]
This paper is a conference-length version of the journal paper listed immediately above.
Deep Belief Networks Using Discriminative Features for Phone Recognition
In ICASSP 2011 [pdf] [bibtex]
Acoustic Modeling using Deep Belief Networks
In IEEE Trans. on Audio, Speech and Language Processing. [pdf] [bibtex]
Deep Belief Networks for Phone Recognition
In NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009. [pdf] [bibtex]
The journal version of this work (listed immediately above) should be viewed as the definitive version.
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
In Advances in Neural Information Processing Systems 23, 2010. [pdf] [bibtex]
Incorporating Side Information into Probabilistic Matrix Factorization Using Gaussian Processes
In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010. [pdf] [bibtex] [code]
I have implemented a version of the Hessian Free (truncated Newton) optimization approach that is based on James Martens's exposition of it in his paper that explored using HF for deep learning (please see James Martens's research page). My particular implementation was made possible with Ilya Sutskever's guidance and some of the implementation choices have been made to make it easier to compare my code to various optimizers he has written. Despite Ilya's generous assistance, any bugs or defects that might exist in the code I post here are my own. Please see Ilya's publication page for code he has released for HF and recurrent neural nets. It isn't too difficult to wrap his recurrent neural net model code in a way that lets my optimizer code optimize it. Without further ado, here is the code. The file is large because it also contains a copy of the curves dataset. The code requires gnumpy to run and I recommend using cudamat, written by Volodymyr Mnih, and running the code on a GPU and not in the slower simulation mode of gnumpy.
I have some python code (once again using gnumpy) I am tentatively dubbing gdbn. In it, I have implemented (RBM) pre-trained deep neural nets (sometimes called DBNs). A gzipped copy of the data needed to run the example can be downloaded here. This is just an initial release for now, hopefully later there will be more features and even some documentation.
I have just released (7/7/2015) a new python deep neural net library on bitbucket called gdnn. It supports learning embeddings, hierarchical softmax output layers, full DAG layer connectivity, and of course the deep neural net essentials.