Course description This seminar course will support studens as they work on a data science project with a dataset that they selected. The course introduces several core techniques in data science, in lectures and in mini-projects. Students will select a dataset of interest to them and produce an analysis or a data product, and a project report. Students will combine domain knowledge and technical expertise to produce their analyses and/or data products.
Assignments are only accepted up to 72 hours (3 days) after the deadline.
We will be using the Python NumPy/SciPy stack in this course. Python 2 and Python 3 are both acceptable.
The most convenient Python distribution to use is Anaconda. If you are using an IDE and download Anaconda, be sure to have your IDE use the Anaconda Python.
I recommend the Pyzo IDE available here. Jupyter Notebooks are favored by some people, though I recommend coding using an IDE with a debugger.
We will be using PyTorch and Stan/Stan/RStan towards the end of the course.
If your project requires a substantial amount of compute power, I recommend signing up for AWS Educate to obtain $100 in free credits for AWS. Instructions for running RStudio Server on AWS Educate are here. GCP and Microsoft Azure also offer free credits for students.
Adroit is another option for GPU computing.
Students for whom AWS/Google Cloud/Azure and Adroit are insufficient should consult the course instructor.
MTSR20
). Uses TensorFlow rather than PyTorch. Not as authoritative or comprehensive or free as Goodfellow et al., but an easier read.Design credit: CS229, Jan 2019.