Overview
Data Science-related courses are offered mostly by the Deparment of Computer Science and the Department of Statistical Sciences. You may also be interested in exploring the offerings of the Dept. of Electrical Engineering (in particular in machine learning and information theory) and the Rotman School of Management (in business analytics). While the focus here is on the undergraduate course offerings at UofT, strong students can sometimes petition to take graduate courses. Graduate course listings are available on individual Departments' websites.
Data Scientists need a solid background in Statistics and Computer Science, but often domain expertise — specific knowledge about the domain from which the data to be analyzed came from — is invaluable. During your time at UofT, it would be valuable to develop expertise in some application domain; examples include finance, life science, physical science, psychology, and marketing. Some ideas for course sequences in an application domain can be found in the description of the Applied Statistics Specialist.
Methods of Data Analysis and the Design of Scientific Studies
- STA302 and STA303 (Methods of Data Analysis I and II) introduce you to the most established and frequently used methods of data analysis. You will also learn how to use R to perform data analysis.
- STA305 (Design of Scientific Studies) introduces you to the design of scientific studies. Data Scientists in industry routinely conduct studies and experiments, for example to determine an optimal business strategy.
Statistical Theory
- STA355 (Theory of Statistical Practice) allows you to more deeply understand and explain the results of statistical analysis, and enables you to read research studies.
Machine Learning
Machine Learning is used to obtain insights for large-scale datasets. Machine Learning courses complement the Data Analysis courses.
Multiple machine learning courses are available. Their contents overlap to some extent.
- CSC321 (Introduction to Neural Networks and Machine Learning) is a third-year level course which introduces machine learning, but focus on neural networks. Neural networks are a set of techniques whose use has been booming in recent years. They saw applications in artificial intelligence, but in recent years Data Scientists have been using them in a variety of application domains. You should take one of CSC411/STA414/ECE521 to complement CSC321.
- CSC411, STA414, and ECE521 all offer a broad introduction to Machine Learning. While it is possible to take more than one of those courses, if you did well in one of them, you might like to explore other options.
Working with Non-Data Scientists
Working with non-Data Scientists is probably the most important skill of the Data Scientist. A number of courses throughout the university (most of which are limited-enrollment) allow for an opportunity to work with non-Statisticians/Computer Scientists.
Opportunities to do that while taking courses include courses such as the ones offered in DCSIL.
Data Wrangling
For many Data Scientists, facility with wrangling data is absolutely essential. You should take CSC343 and practice processing datasets as soon as possible. For people working with large datasets, knowledge of Linux-like systems is essential to be able to process data. CSC209 is a good place to start.
Facility with data wrangling comes with practice -- you’ll acquire it if you work with large datasets. It is a good idea to learn to do Data Wrangling in R as well.
Database Systems
If your interest is in building systems for storing large amounts of data, consider taking CSC369, followed by CSC443.