George E. Dahl

About me

I am currently a research scientist at Google on the Brain team in Mountain View. I graduated from the U of T Machine Learning Group and my supervisor was Geoffrey Hinton.

I received the Microsoft Research PhD Fellowship (2012). I led the team that won the Merck molecular activity challenge on kaggle. I answer some questions about our solution in this blog post and describe some experiments on public data in the tech report below.

Research interests

Selected Publications

Google scholar profile


I have implemented a version of the Hessian Free (truncated Newton) optimization approach that is based on James Martens's exposition of it in his paper that explored using HF for deep learning (please see James Martens's research page). My particular implementation was made possible with Ilya Sutskever's guidance and some of the implementation choices have been made to make it easier to compare my code to various optimizers he has written. Despite Ilya's generous assistance, any bugs or defects that might exist in the code I post here are my own. Please see Ilya's publication page for code he has released for HF and recurrent neural nets. It isn't too difficult to wrap his recurrent neural net model code in a way that lets my optimizer code optimize it. Without further ado, here is the code. The file is large because it also contains a copy of the curves dataset. The code requires gnumpy to run and I recommend using cudamat, written by Volodymyr Mnih, and running the code on a GPU and not in the slower simulation mode of gnumpy.

I have some python code (once again using gnumpy) I am tentatively dubbing gdbn. In it, I have implemented (RBM) pre-trained deep neural nets (sometimes called DBNs). A gzipped copy of the data needed to run the example can be downloaded here. This is just an initial release for now, hopefully later there will be more features and even some documentation.

I have just released (7/7/2015) a new python deep neural net library on bitbucket called gdnn. It supports learning embeddings, hierarchical softmax output layers, full DAG layer connectivity, and of course the deep neural net essentials.