George E. Dahl
- PhD Candidate Machine Learning Group
- Department of Computer Science
- University of Toronto
- Ontario, Canada
- email: Can be easily derived from the URL for this page.
About me
I am a PhD Student in the Machine Learning Group, supervised by Geoffrey Hinton.I am a recipient of the Microsoft Research PhD Fellowship (2012). I led the team that won the Merck molecular activity challenge on kaggle. I answer some questions about our solution in this blog post.
Research interests
- deep learning architectures
- speech recognition and language processing
- undirected graphical models
- most of statistical machine learning
Selected Publications
-
On the importance of initialization and momentum in deep learning
In ICML 2013 [pdf] [bibtex coming soon]
-
Improving Deep Neural Networks for LVCSR Using Rectified Linear Units and Dropout
In ICASSP 2013 [pdf] [bibtex coming soon]
-
Large-Scale Malware Classification Using Random Projections and Neural Networks
In ICASSP 2013 [pdf] [bibtex coming soon]
-
Deep Neural Networks for Acoustic Modeling in Speech Recognition
IEEE Signal Processing Magazine, 29, November 2012 [pdf] [bibtex coming soon]
-
Training Restricted Boltzmann Machines on Word Observations
In ICML 2012 [pdf] [arXiv preprint] [bibtex] [alias method pseudocode] [poster]
-
Context-Dependent Pre-trained Deep Neural Networks for Large Vocabulary Speech Recognition
In IEEE Transactions on Audio, Speech, and Language Processing [pdf] [bibtex]
-
Large Vocabulary Continuous Speech Recognition with Context-Dependent DBN-HMMs
In ICASSP 2011 [pdf] [bibtex]
This paper is a conference-length version of the journal paper listed immediately above. -
Deep Belief Networks Using Discriminative Features for Phone Recognition
In ICASSP 2011 [pdf] [bibtex]
-
Acoustic Modeling using Deep Belief Networks
In IEEE Trans. on Audio, Speech and Language Processing. [pdf] [bibtex]
-
Deep Belief Networks for Phone Recognition
In NIPS Workshop on Deep Learning for Speech Recognition and Related Applications, 2009. [pdf] [bibtex]
The journal version of this work (listed immediately above) should be viewed as the definitive version. -
Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine
In Advances in Neural Information Processing Systems 23, 2010. [pdf] [bibtex]
-
Incorporating Side Information into Probabilistic Matrix Factorization Using Gaussian Processes
In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence, 2010. [pdf] [bibtex] [code]
Code
I have implemented a version of the Hessian Free (truncated Newton) optimization approach that is based on James Martens's exposition of it in his paper that explored using HF for deep learning (please see James Martens's research page). My particular implementation was made possible with Ilya Sutskever's guidance and some of the implementation choices have been made to make it easier to compare my code to various optimizers he has written. Despite Ilya's generous assistance, any bugs or defects that might exist in the code I post here are my own. Please see Ilya's publication page for code he has released for HF and recurrent neural nets. It isn't too difficult to wrap his recurrent neural net model code in a way that let's my optimizer code optimize it. Without further ado, here is the code. The file is large because it also contains a copy of the curves dataset. The code requires gnumpy to run and I recommend using cudamat, written by Volodymyr Mnih, and running the code on a GPU and not in the slower simulation mode of gnumpy.
I have some python code (once again using gnumpy) I am tentatively dubbing gdbn. In it, I have implemented (RBM) pre-trained deep neural nets (sometimes called DBNs). A gzipped copy of the data needed to run the example can be downloaded here. This is just an initial release for now, hopefully later there will be more features and even some documentation.