Homepage for CSC311, Fall 2021
Introduction to Machine Learning
Department of Mathematical and Computational Sciences
University of Toronto Mississauga
ANNOUNCEMENTS:
- Please see the Quercus page for this course.
- Lectures and tutorials will be conducted online using
Zoom. A more-detailed announcement on this will be made
shortly.
- The midterm test and the final exam will be conducted online
using Zoom and Markus.
- You will need a video camera, a microphone and a speaker for
the midterm and the final exam.
COURSE DESCRIPTION:
- Machine learning aims to build computer systems that learn
from experience, instead of being directly programmed. It is an
exciting interdisciplinary field, with historical roots in
computer science, statistics, pattern recognition, and even
neuroscience and physics. In the past ten years, many of these
approaches have converged and led to rapid advances and
real-world applications.
- This course is a broad introduction to machine learning.
It will start with basic methods of regression and
classification and problems of over fitting and the evaluation
of learning algorithms, and then move on to more sophisticated
methods such as neural networks. Besides reinforcing what
you learn in class, the homework assignments will extend your
Python skills and introduce you to the basics of scientific
programming, data visualization and computational statistics,
all of which are ubiquitous in machine learning. As a
fringe benefit, you will also find out what all that math you
learned is actually used for!
PREREQUISITES:
- Formally required: CSC207, (MAT223 or MAT240), MAT232
and STA256.
- Recommended: CSC338 (Numerical Methods) or a course in
computational statistics.
- Informally required: a solid knowledge of calculus, linear
algebra, probability, computer programming (including Python)
and good geometric intuition.
- Machine learning is highly mathematical, and the ability to
write and understand rigorous proofs is essential, as is the
ability to use mathematics to solve real problems (as in Physics
and Engineering). Consequently, mathematical maturity will
be assumed.
- Prerequisites will not be waived.
INSTRUCTOR:
- Anthony
Bonner
- email: bonner [at] cs [dot] toronto [dot] edu
- Phone: 905-828-3813 (UTM), 416-978-7441 (St George)
- Office: DH 3090 (UTM), BA 5230 (St George)
- Office hours: Tues and Weds 4-5pm online using Zoom.
GENERAL INFORMATION :
- Course syllabus (The same
for all lecture sections)
- Classes: Weds 9-11am, Tues 5-7pm, Weds 5-7pm, online
using Zoom.
- Tutorials:
- Friday at 9am, 10am, 11am, noon and 5pm, online using Zoom.
- There are nine tutorial sections.
- Teaching Assistants: TBA
- Textbook: There is no required text, but we will recommend specific readings from various
books and papers, but mostly from The
Elements of Statistical Learning (ELS), by Hastie,
Tibshirani and Friedman. The book can be downloaded for
free as a pdf file.
- ATTENDANCE: We expect students to attend all classes and all
tutorials. This is especially important because we will cover
material in class that is not included in the textbook. Also,
the tutorials will not only be for review and answering
questions, but new material will also be covered.
- In general, the lectures will outline the theory of machine
learning, the tutorials will provide additional details,
examples and guidance, and the assignments will help you turn
the theory into practice. Doing the assignments is where
you will really learn machine learning!
- The assignments will require proving theorems and writing
programs. Often, your programs will implement a theorem
you have proved.
- The assignments will require you to do scientific programming
on large matrices and vectors. Most of you have never done
this before. Scientific programming minimizes the use of
loops and maximizes the opportunities for massive parallel
programming using GPUs. As you will discover, without
this, machine learning is impossibly slow, since programs can
take years instead of minutes to finish executing.
- Lecture slides
- Tutorial Slides
SOFTWARE:
- The Python 3.8
programming language. Be sure to install a 64-bit
version. (A 32-bit version is not accurate enough for
serious numerical computing and can result in wrong answers,
which may cost you marks.)
- The NumPy libraries
(Numerical Python)
- The SciPy libraries
(Scientific Python)
- The scikit-learn
libraries (machine learning in Python)
- The Spyder
IDE (Scientific Python Development Environment) (optional).
- Recommendations:
- Use Conda,
not Pip, to install all Python-related software.
- Create a Conda virtual environment to install all software
in, because some software (notably 64-bit Python) may conflict
with other software already installed on your computer.
- Use Anaconda,
a Python-based data-science platform that includes many
popular data-science packages, including NumPy, SciPy,
scikit-learn and Spyder. This is by far the easiest way
to go.
- Last year, there were many reported problems with running
Python on Windows. Here
is some advice for dealing with them. (The main piece of
advice is to use Anaconda.) Googling "problems with
Python on Windows" seems to indicate problems when using
Windows 10. If you are using Windows, please see if the
following installation sequence works.
- Recommended installation sequence:
- Open a terminal window (Mac) or a command line window
(Windows).
- Install Conda
(or update to the latest version by typing conda update
conda) before doing anything else.
- Create a Conda virtual environment (call it csc311)
- Activate the virtual environment
- Use Conda to install Python 3.8 (64-bit version).
- Use Conda to install Anaconda
- Within your virtual environment, run anaconda-navigator,
from which you can launch Spyder.
- In Linux and Mac (and maybe on Windows), the following
sequence of commands will accomplish this (after conda has
been installed or updated):
- conda create —name csc311
- conda activate csc311
- conda install python=3.8
- conda install anaconda
- anaconda-navigator
- To leave the virtual environment, type conda deactivate
- To re-enter the virtual environment, type conda activate
csc311
- You must enter the virtual environment, run
anaconda-navigator and then launch Spyder each time you want
to write and run programs for this course.
- We recommend the above sequence for installing the software
for this course. You may, however, install the software
in any way you see fit. However, if you do not use the
recommended installation sequence, we may not be able to help
you with software problems that you encounter.
- Documentation:
- Tutorial
on machine learning in scikit-learn
ASSIGNMENTS:
MIDTERM TEST:
- Online using Zoom and Markus.
- You will need a video camera, a microphone and a speaker.
- The test is open book.
- A summary of
the procedures for taking the midterm test online.
- Detailed instructions
for taking the test on-line. You should read these carefully
well before the test date.
- Friday Oct 22(?), 8-9pm, plus 15 minutes to upload your
answers.
- You are also responsible for all material covered before the
midterm in tutorials, assignments and assignment solutions.
- There will be some Python programming questions on the
midterm.
- The midterm test will follow the "I don't know" policy: if
you do not know the answer to a question, and you write "I don't
know", you will receive 20% of the marks of that question. If
you just leave a question blank with no such statement, you will
get 0 marks for that question.
- Here is an old midterm test, and
here are the solutions.
- Midterm solutions
FINAL EXAM:
- Online using Zoom and Markus, like the midterm.
- You will need a video camera, a microphone and a speaker.
- The exam is open book
- A summary of the
procedures for taking the exam online.
- Detailed instructions for
taking the exam on-line. You should read these carefully well
before the exam date.
- You must receive at least 30% on the final exam to pass the
course.
- The exam will cover the entire course, but will emphasize
material not on the midterm.
- The most difficult questions (and the most marks) will be on
material related to the assignments, since this is what you know
best.
- You are responsible for all lectures, tutorials, assignments
and assignment solutions.
- There will be some Python programming questions on the exam.
- The exam will follow the "I don't know" policy: if you do not
know the answer to a question, and you write "I don't know", you
will receive 20% of the marks of that question. If you just
leave a question blank with no such statement, you will get 0
marks for that question.
- Here is an old exam
and the solutions.
(Note: this exam does not cover exactly the same material that
we covered this year, but there is a lot of overlap.)
- More details will be published shortly.
DEFERRED EXAM:
- The deferred exam will also be online and will follow the
rules and policies given above for the final exam.
- Here is a summary
of the procedures for taking the deferred exam
- Here are detailed
instructions for taking the deferred exam online.
- You should read the summary and detailed instructions
carefully before the exam date. They are similar to those
given above for the final exam.
ADDITIONAL RESOURCES:
Machine Learning Books: Most of the following books are
either readable online as a web page or downloadable as a free
pdf.
- Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The
Elements of Statistical Learning, Second
Edition. (ESL)
- Christopher Bishop, Pattern Recognition and
Machine Learning, 2006. Free downloadable pdf.
(PRML)
- Richard S. Sutton and Andrew
G. Barto, Reinforcement
Learning: An Introduction, Second Edition,
2018. (RL)
- Ian Goodfellow, Yoshua Bengio
and Aaron Courville, Deep Learning,
2016.
- David Mackay, Information
Theory, Inference, and Learning Algorithms.
- Ethan Alpaydin, Introduction
to Machine Learning, 2nd Edition, 2010. (Good for
undergrads)
- Kevin Murphy, Machine Learning: a Probabilistic
Perspective. (advanced)
- Gareth James, Daniela Witten, An
Introduction to Machine Learning, Trevor Hastie, and
Robert Tibshirani, An Introduction to Statistical Learning,
2017.
- Shai Shalev-Shwartz and Shai Ben-David, Understanding
Machine Learning: From Theory to Algorithms, 2014.
Mathematical Background:
- Petersen and Pedersen, The Matrix Cookbook. Free Download
- F. R. Kschischang, Probability Refresher. Free Download
- Lipschutz and Lipson, Schaum's Outline of Linear Algebra.
(very handy, very cheap)
- Wrede and Spiegle, Schaum's Outline of Advanced Calculus.
(very handy, very cheap)
Useful videos and web-pages:
PLAGIARISM AND CHEATING:
- Students should become familiar with and are expected to
adhere to the Code
of Behaviour on Academic Matters, which can be found in
the UTM Calendar. The following web sites may also be helpful: