In recent years, Deep Learning has become a dominant Machine Learning tool for a wide variety of domains. One of its biggest successes has been in Computer Vision where the performance in problems such object and action recognition has been improved dramatically. In this course, we will be reading up on various Computer Vision problems, the state-of-the-art techniques involving different neural architectures and brainstorming about promising new directions.

Please sign up here in the beginning of class.

This class is a graduate seminar course in computer vision. The class will cover a diverse set of topics in Computer Vision and various Neural Network architectures. It will be an interactive course where we will discuss interesting topics on demand and latest research buzz. The goal of the class is to learn about different domains of vision, understand, identify and analyze the main challenges, what works and what doesn't, as well as to identify interesting new directions for future research.

Prerequisites: Courses in computer vision and/or machine learning (e.g., CSC320, CSC420, CSC411) are highly recommended (otherwise you will need some additional reading), and basic programming skills are required for projects.

back to top

When emailing me, please put CSC2523 in the subject line.

Forum

This class uses piazza. On this webpage, we will post announcements and assignments. The students will also be able to post questions and discussions in a forum style manner, either to their instructors or to their peers.

back to top

We will have an invited speaker for this course:


as well as several invited lectures / tutorials:


Each student will need to write two paper reviews each week, present once or twice in class (depending on enrollment), participate in class discussions, and complete a project (done individually or in pairs).


The final grade will consist of the following
Participation (attendance, participation in discussions, reviews) 15%
Presentation (presentation of papers in class)25%
Project (proposal, final report)60%

Paper reviewing

Every week (except for the first two) we will read 2 to 3 papers. The success of the discussion in class will thus be due to how prepared the students come to class. Each student is expected to read all the papers that will be discussed and write two detailed reviews about the selected two papers. Depending on enrollment, each student will need to also present a paper in class. When you present, you do not need to hand in the review.

Deadline: The reviews will be due one day before the class.

Structure of the review
Short summary of the paper
Main contributions
Positive and negatives points
How strong is the evaluation?
Possible directions for future work

Presentation

Depending on enrollment, each student will need to present a few papers in class. The presentation should be clear and practiced and the student should read the assigned paper and related work in enough detail to be able to lead a discussion and answer questions. Extra credit will be given to students who also prepare a simple experimental demo highlighting how the method works in practice.

A presentation should be roughly 20 minutes long (please time it beforehand so that you do not go overtime). Typically this is about 15 to 20 slides. You are allowed to take some material from presentations on the web as long as you cite the source fairly. In the presentation, also provide the citation to the paper you present and to any other related work you reference.

Deadline: The presentation should be handed in one day before the class (or before if you want feedback).

Structure of presentation:
High-level overview with contributions
Main motivation
Clear statement of the problem
Overview of the technical approach
Strengths/weaknesses of the approach
Overview of the experimental evaluation
Strengths/weaknesses of evaluation
Discussion: future direction, links to other work

Project

Each student will need to write a short project proposal in the beginning of the class (in January). The projects will be research oriented. In the middle of semester course you will need to hand in a progress report. One week prior to the end of the class the final project report will need to be handed in and presented in the last lecture of the class (April). This will be a short, roughly 15-20 min, presentation.

The students can work on projects individually or in pairs. The project can be an interesting topic that the student comes up with himself/herself or with the help of the instructor. The grade will depend on the ideas, how well you present them in the report, how well you position your work in the related literature, how thorough are your experiments and how thoughtful are your conclusions.

close Detailed Requirements

back to top

The first class will present a short overview of neural network architectures, however, the details will be covered when reading on particular topics. Readings will touch on a diverse set of topics in Computer Vision. The course will be interactive -- we will add interesting topics on demand and latest research buzz.


Neural Architectures
convolutional and deconvolutional neural networks
recurrent neural networks
autoencoders
restricted boltzmann machines
analysis, visualization, "tricking" the NNs
Computer Vision
object / scene recognition
semantic segmentation
action recognition
stereo / flow
attributes
3D from RGB (normals, depth form single image)
recognition in RGB-D
caption generation, question & answering
image generation
close Tentative Schedule

back to top

DateTopicReading / MaterialSpeakerSlides
Jan 12Admin & Introduction(s)  Sanja Fidleradmin
Convolutional Neural Networks
Jan 19Convolutional Neural Nets (tutorial)Resources: Stanford's cs231 class, VGG's Practical CNN Tutorial
Code: CNN Tutorial for TensorFlow, Tutorial for caffe, CNN Tutorial for Theano
 Yukun Zhu
(invited)
[pdf]
[code]
Image SegmentationSemantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs   [PDF] [code]
L-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L Yuille
 Shenlong Wang[pdf]
[code]
Jan 26Very Deep NetworksHighway Networks  [PDF] [code]
Rupesh Kumar Srivastava, Klaus Greff, Jurgen Schmidhuber

Deep Residual Learning for Image Recognition  [PDF]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Renjie Liao
(invited)
[pdf]
Object DetectionRich feature hierarchies for accurate object detection and semantic segmentation   [PDF] [code]
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks   [PDF] [code (Matlab)] [code (Python)]
Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun
 Kaustav Kundu[pdf]
Feb 2Stereo
Siamese Networks
Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches  [PDF] [code]
Jure Žbontar, Yann LeCun

Learning to Compare Image Patches via Convolutional Neural Networks  [PDF] [code]
Sergey Zagoruyko, Nikos Komodakis
Wenjie Luo[pdf]
Depth from Single ImageDesigning Deep Networks for Surface Normal Estimation   [PDF]
Xiaolong Wang, David Fouhey, Abhinav Gupta
 Mian Wei[pptx]  [pdf]
Feb 9Image GenerationUnsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks   [PDF]
Alec Radford, Luke Metz, Soumith Chintala

Generating Images from Captions with Attention   [PDF]
Elman Mansimov, Emilio Parisotto, Jimmy Lei Ba, Ruslan Salakhutdinov
 Elman Mansimov
(invited)
[pdf]
Domain Adaptation, Zero-shot LearningSimultaneous Deep Transfer Across Domains and Tasks   [PDF]
Eric Tzeng, Judy Hoffman, Trevor Darrell

Predicting Deep Zero-Shot Convolutional Neural Networks using Textual Descriptions   [PDF]
Jimmy Ba, Kevin Swersky, Sanja Fidler, Ruslan Salakhutdinov
 Lluis Castrejon[pdf]
Recurrent Neural Networks
Feb 23RNNs and Neural Language ModelsUnifying Visual-Semantic Embeddings with Multimodal Neural Language Models   [PDF] [code]
Ryan Kiros, Ruslan Salakhutdinov, Richard Zemel

Skip-Thought Vectors   [PDF] [code]
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S. Zemel, Antonio Torralba, Raquel Urtasun, Sanja Fidler
 Jamie Kiros
(invited)
Mar 1Modeling WordsEfficient Estimation of Word Representations in Vector Space   [PDF] [code]
Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean
 Eleni Triantafillou
[pdf]
Describing VideosSequence to Sequence -- Video to Text   [PDF]
Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko
 Erin Grant
[pdf]
Image-based QAAsk Your Neurons: A Neural-based Approach to Answering Questions about Images   [PDF]
Mateusz Malinowski, Marcus Rohrbach, Mario Fritz
 Yunpeng Li
[pdf]
Mar 8Variational AutoencodersAuto-Encoding Variational Bayes   [PDF]
Diederik P Kingma, Max Welling

Tutorial: Bayesian Reasoning and Deep Learning   [PDF]
Shakir Mohamed
 Yura Burda
(invited)
[pdf]
Text-based QAEnd-To-End Memory Networks   [PDF]
Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus
 Marina Samuel
[pdf]
Neural ReasoningRecursive Neural Networks Can Learn Logical Semantics   [PDF]
Samuel R. Bowman, Christopher Potts, Christopher D. Manning
 Rodrigo Toro Icarte
[pdf]
Mar 15Neural ProgrammingNeural GPUs Learn Algorithms   [PDF]
Lukasz Kaiser, Ilya Sutskever

Neural Programmer-Interpreters   [PDF]
Scott Reed, Nando de Freitas

Neural Programmer: Inducing Latent Programs with Gradient Descent   [PDF]
Arvind Neelakantan, Quoc V. Le, Ilya Sutskever
 Jimmy Ba
(invited)
Conversation ModelsA Neural Conversational Model   [PDF]
Oriol Vinyals, Quoc Le
 Caner Berkay Antmen
[pdf]
Sentiment AnalysisRecursive Deep Models for Semantic Compositionality Over a Sentiment Treebank   [PDF]
Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng and Christopher Potts
 Zhicong Lu
[pdf]
Mar 22Video RepresentationsUnsupervised Learning of Video Representations using LSTMs   [PDF]
Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov
 Kamyar Ghasemipour
[pdf]
CNN VisualizationExplaining and Harnessing Adversarial Examples   [PDF]
Ian J. Goodfellow, Jonathon Shlens, Christian Szegedy
 Neill Patterson
[pdf]
Mar 29Direction Following (Robotics)Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences   [PDF]
Hongyuan Mei, Mohit Bansal, Matthew R. Walter
 Alan Yusheng Wu
[pdf]
Visual AttentionRecurrent Models of Visual Attention   [PDF]
Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu
 Matthew Shepherd
[pdf]
MusicA First Look at Music Composition using LSTM Recurrent Neural Networks   [PDF]
Douglas Eck, Jurgen Schmidhuber

Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network   [PDF]
Andrew J.R. Simpson, Gerard Roma, Mark D. Plumbley
 Charu Jaiswal
[pdf]
Music generation Overview of music generation Urban Jezernik
(invited)
Pose and AttributesPANDA: Pose Aligned Networks for Deep Attribute Modeling   [PDF]
Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, Lubomir Bourdev
 Sidharth Sahdev
[pptx]
Image StyleA Neural Algorithm of Artistic Style   [PDF]  [code]
Leon A. Gatys, Alexander S. Ecker, Matthias Bethge
 Nancy Iskander
[pdf]
Apr 5Human gazeWhere Are They Looking?   [PDF]
Adria Recasens, Aditya Khosla, Carl Vondrick, Antonio Torralba
 Abraham Escalante
[pdf]
Instance SegmentationMonocular Object Instance Segmentation and Depth Ordering with CNNs   [PDF]
Ziyu Zhang, Alex Schwing, Sanja Fidler, Raquel Urtasun

Instance-Level Segmentation with Deep Densely Connected MRFs   [PDF]
Ziyu Zhang, Sanja Fidler, Raquel Urtasun
 Min Bai
[pdf]
Scene UnderstandingAttend, Infer, Repeat: Fast Scene Understanding with Generative Models   [PDF]
S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, Koray Kavukcuoglu, Geoffrey E. Hinton
 Namdar Homayounfar
[pdf]
Reinforcement LearningPlaying Atari with Deep Reinforcement Learning   [PDF]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller
 Jonathan Chung
[pdf]
Medical ImagingClassifying and Segmenting Microscopy Images Using Convolutional Multiple Instance Learning   [PDF]
Oren Z. Kraus, Lei Jimmy Ba, Brendan Frey
 Alex Lu
[pptx]
HumorWe Are Humor Beings: Understanding and Predicting Visual Humor   [PDF]
Arjun Chandrasekaran, Ashwin K Vijayakumar, Stanislaw Antol, Mohit Bansal, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
 Shuai Wang
[pdf]

back to top

Tutorials, related courses:

  •   Introduction to Neural Networks, CSC321 course at University of Toronto
  •   Course on Convolutional Neural Networks, CS231n course at Stanford University
  •   Course on Probabilistic Graphical Models, CSC412 course at University of Toronto, advanced machine learning course

Software:

  •   Caffe: Deep learning for image classification
  •   Tensorflow: Open Source Software Library for Machine Intelligence (good software for deep learning)
  •   Theano: Deep learning library
  •   mxnet: Deep Learning library
  •   Torch: Scientific computing framework with wide support for machine learning algorithms
  •   LIBSVM: A Library for Support Vector Machines (Matlab, Python)
  •   scikit: Machine learning in Python

Popular datasets:

  •   ImageNet: Large-scale object dataset
  •   Microsoft Coco: Large-scale image recognition, segmentation, and captioning dataset
  •   Mnist: handwritten digits
  •   PASCAL VOC: Object recognition dataset
  •   KITTI: Autonomous driving dataset
  •   NYUv2: Indoor RGB-D dataset
  •   LSUN: Large-scale Scene Understanding challenge
  •   VQA: Visual question answering dataset
  •   Madlibs: Visual Madlibs (question answering)
  •   Flickr30K: Image captioning dataset
  •   Flickr30K Entities: Flick30K with phrase-to-region correspondences
  •   MovieDescription: a dataset for automatic description of movie clips
  •   Action datasets: a list of action recognition datasets
  •   MPI Sintel Dataset: optical flow dataset
  •   BookCorpus: a corpus of 11,000 books

Online demos:


Main conferences:

  •   NIPS (Neural Information Processing Systems)
  •   ICML (International Conference on Machine Learning)
  •   ICLR (International Conference on Learning Representations)
  •   AISTATS (International Conference on Artificial Intelligence and Statistics)
  •   CVPR (IEEE Conference on Computer Vision and Pattern Recognition)
  •   ICCV (International Conference on Computer Vision)
  •   ECCV (European Conference on Computer Vision)
  •   ACL (Association for Computational Linguistics)
  •   EMNLP (Conference on Empirical Methods in Natural Language Processing)


back to top