Contact information

Instructors Annie En-Shiun Lee, Raeid Saqur, and Zining Zhu.
Office Zoom
Office hours Wednesdays 12.30-1.30 pm
Email csc401-2023-01@cs. (add the toronto.edu suffix)
Forum (Piazza) Piazza
Quercus https://q.utoronto.ca/courses/293764
Email policy For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.
Back to top

Course outline

This course presents an introduction to natural language computing in applications such as information retrieval and extraction, intelligent web searching, speech recognition, and machine translation. These applications will involve various statistical and machine learning techniques. Assignments will be completed in Python. All code must run on the 'teaching servers'.

The theme for this year is speech and text analysis in a post-truth society.

Prerequisites: CSC207H1 / CSC209H1 ; STA247H1 / STA255H1 / STA257H1
Recommended Preparation: MAT221H1 / MAT223H1 / MAT240H1 / CSC311 are strongly recommended. For advice, contact the Undergraduate Office.

The course information sheet is available here.

Back to top

Meeting times

TL;DR All meetings are in-person at EM 001, except for the first section (10h-11h) on Wednesdays.

Locations EM 001 Emmanuel College [Classfinder link]
AH 100 Muzzo Family Alumni Hall [Classfinder link]
Lectures M 10-11h; 11-12h in EM 001
W 10h-11h in AH 100
11h-12h in EM 001
Tutorials F 10-11h; 11-12h in EM 001
Back to top

Syllabus

The following is an estimate of the topics to be covered in the course and is subject to change.

  1. Introduction to corpus-based linguistics
  2. N-gram models
  3. Entropy and decisions
  4. Neural language models and word embedding
  5. Machine translation (statistical and neural) (MT)
  6. Hidden Markov models (HMMs)
  7. Natural Language Understanding (NLU)
  8. Automatic speech recognition (ASR)
  9. Information retrieval (IR)
  10. Interpretability and Large Language Models

Calendar

9 JanuaryFirst lecture
17 JanuaryLast day to add CSC 2511
23 JanuaryLast day to add CSC 401
10 FebruaryAssignment 1 due
20 FebruaryLast day to drop CSC 2511
20--24 FebruaryReading week -- no lectures or tutorial
10 MarchAssignment 2 due
14 MarchLast day to drop CSC 401
8 AprilLast lecture
8 AprilAssignment 3 due
8 AprilProject report due
TBD April Final exam

See Dates for undergraduate students.

See Dates for graduate students.

Back to top

Readings for this course

Optional Foundations of Statistical Natural Language Processing C. Manning and H. Schutze
Optional Speech and Language Processing D. Jurafsky and J.H. Martin (2nd ed.)
Optional Deep Learning I Goodfellow, Y Bengio, and A Courville

Supplementary reading

Please see additional lecture specific supplementary resources under Lecture Materials section.

Topic Title Author(s) Misc
ML History What is science for? The Lighthill report on artificial intelligence reinterpreted Jon Agar
Smoothing An Empirical Study of Smoothing Techniques for Language Modeling Stanley F Chen and Joshua Goodman
Hidden Markov models A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Lawrence R. Rabiner
Sentence alignment A Program for Aligning Sentences in Bilingual Corpora William A. Gale and Kenneth W. Church
Transformation-based learning Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging Eric Brill
Sentence boundaries Sentence boundaries Read J, Dridan R, Oepen S, Solberg LJ
Seq2Seq Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Transformer Attention Is All You Need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Attention-based NMT Effective Approaches to Attention-based Neural Machine Translation Minh-Thang Luong, Hieu Pham, Christopher D. Manning
NMT Neural machine translation by jointly learning to align and translate Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio
NMT Massive exploration of neural machine translation architectures Britz, Denny, et al.
Back to top

Evaluation policies

General
You will be graded on three homework assignments and a final exam. The relative proportions of these grades are as follows:
Assignment with lowest mark15%
Assignment with median mark20%
Assignment with highest mark25%
Final exam40%
Graduate students enrolled in CSC2511 will have the option of undertaking a course project (instead of the assignments), in teams of at most two students, for 60% of the course grade (the final exam, worth 40%, is still required). Information on the course project can be found here.
Lateness
A 10% (absolute) deduction is applied to late homework one minute after the due time. Thereafter, an additional 10% deduction is applied every 24 hours up to 72 hours late at which time the homework will receive a mark of zero. No exceptions will be made except in emergencies, including medical emergencies, at the instructor's discretion.
Final
A mark of at least D- on the final exam is required to pass the course. In other words, if you receive an F on the final exam then you automatically fail the course, regardless of your performance in the rest of the course.
Collaboration and plagiarism
No collaboration on the homeworks is permitted. The work you submit must be your own. `Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, copying of written answers, and sharing of answers prior to submission of the work (including the final exam). Failure to observe this policy is an academic offense, carrying a penalty ranging from a zero on the homework to suspension from the university. See Academic integrity at the University of Toronto.
Back to top

Lecture materials

Assigned readings give you more in-depth information on ideas covered in lectures. You will not be asked questions relating to readings for the assignments, but they will be useful in studying for the final exam.

Provided PDFs are ~ 10% of their original size for portability, at the expense of fidelity.

For pre-lecture readings and in-class note taking, please see the Quercus page Lecture Materials and Handouts. The final versions (ex-post errata, and/or other modifications will be posted here on the course website.

  1. Introduction.
    • Date: 9 Jan.
    • Reading: Manning & Schütze: Sections 1.3-1.4.2, Sections 6.0-6.2.1
  2. Corpora, language models, Zipf, and smoothing.
    • Dates: 11, 16 Jan.
    • Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
    • Reading: Jurafsky & Martin: 3.4-3.5
  3. Features and classification.
    • Dates: 18, 23 Jan.
    • Reading: Manning & Schütze: Section 16.1, 16.4
    • Reading: Jurafsky & Martin (2nd ed): Sections 5.1-5.5
  4. Entropy and decisions.
    • Dates: 25, 30 Jan.
    • Reading: Manning & Schütze: Sections 2.2, 5.3-5.5
  5. Neural models of language.
    • Dates: 1, 6 Feb.
    • Reading: DL (Goodfellow et al.). Sections: 6.3, 6.6, 10.2, 10.5, 10.10
    • (Optional) Supplementary resources and readings:
      • Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space. (2013)" link
      • Xin Rong. "word2Vec Parameter Learning Explained". link
      • Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? debiasing word embeddings." NeurIPS (2016). link
      • Greff, Klaus, et al. "LSTM: A search space odyssey." IEEE (2016). link
      • Jozefowicz, Sutskever et al. "An empirical exploration of recurrent network architectures." ICML (2015). link
      • GRU: Cho, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." (2014). link
      • ELMo: Peters, Matthew E., et al. "Deep contextualized word representations. (2018)." link
      • Blogs:
      • The Unreasonable Effectiveness of Recurrent Neural Networks. link
      • Colah's Blog. "Understanding LSTM Networks". link.
  6. Machine translation (MT).
    • Dates: 8, 13, 15 Feb.
    • Readings:
      • Manning & Schuütze Sections 13.0, 13.1.2, 13.1.3, 13.2, 13.3, 14.2.2
      • DL (Goodfellow et al.). Sections: 10.3, 10.4, 10.7
      • Vaswani et al. "Attention is all you need." (2017). link
    • (Optional) Supplementary resources and readings:
      • Papineni, et al. "BLEU: a method for automatic evaluation of machine translation." ACL (2002). link
      • Sutskever, Ilya, Oriol Vinyals et al. "Sequence to sequence learning with neural networks."(2014). link
      • Bahdanau, Dzmitry, et al. "Neural machine translation by jointly learning to align and translate."(2014). link
      • Luong, Manning, et al. "Effective approaches to attention-based neural machine translation." arXiv (2015). link
      • Britz, Denny, et al. "Massive exploration of neural machine translation architectures."(2017). link
      • BPE: Sennrich, et al. "Neural machine translation of rare words with subword units." arXiv (2015). link
      • Wordpiece: Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv (2016). link
      • Blogs:
      • Distill: Olah & Carter "Attention and Augmented RNNs"(2016). link
      • Jay Allamar. "The Illustrated Transformer". link.
  7. More neural language models.
    • Date: 27 Feb.
    • Readings: No required readings for this lecture.
    • (Optional) Supplementary resources and readings:
      • Bommasani et al. "On the opportunities and risks of foundation models." (2022). link
      • Devlin et al. "BERT: Pre-training of deep bidirectional transformers for language understanding." (2019). link
      • Clark et al. "What does bert look at? an analysis of bert's attention." (2019). link
      • Tenney et al. "BERT rediscovers the classical NLP pipeline." (2019). link
      • Rogers, Anna et al. "A primer in BERTology: What we know about how bert works." TACL(2020). link
      • Lewis et al. "BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension." (2019). link
      • T5: Raffel et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020). link
      • GPT3: Radford et al. "Language models are few-shot learners." (2020). link
      • InstructGPT: Ouyang, Long, et al. "Training language models to follow instructions with human feedback. arXiv preprint (2022)." link
      • RLHF: Christiano et al. "Deep reinforcement learning from human preferences." (2017). link
      • RLHF: Stiennon et al. "Learning to summarize with human feedback." (2020). link
      • Kaplan et al. "Scaling laws for neural language models." (2020). link
      • Kudo and Richardson. "Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing." (2018). link
      • Token-free models:
      • Clark et al. "CANINE: Pre-training an efficient tokenization-free encoder for language representation." (2021). link
      • Xue et al. "ByT5: Towards a token-free future with pre-trained byte-to-byte models." (2022). link
  8. Hidden Markov Models (HMMs).
    • Dates: 1, 6, 8 Mar
    • Reading: Manning & Schütze: Section 9.2-9.4.1 (an alternative formulation)
    • (Optional) Supplementary resources and readings:
      • Rabiner, Lawrence R. "A tutorial on HMMs and selected applications in speech recognition." (1990). link
      • Stamp, Mark. "A revealing introduction to hidden Markov models." (2004). link
      • Bilmes, Jeff. "A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and HMMs." (1998). link
      • Chen and Goodman. "An empirical study of smoothing techniques for language modeling." (1999). link
      • Hidden Markov Model Toolkit (HTK). link
      • Scikit's HMM. link
  9. Natural Language Understanding (NLU). Slides: NLU-1/2; NLU-2/2, Colab notebook.
    • Dates: 13, 15 Mar.
    • Readings:
      • Maas et al., "Learning Word Vectors for Sentiment Analysis" (2011). link
      • Levesque et al., "The Winograd Schema Challenge" (2012). link
  10. Automatic Speech Recognition (ASR).
    • Dates: 20, 22 Mar.
    • Readings:
      • Jurafsky & Martin SLP3 (3rd ed.): Chapter 16. link
      • Chan et al., "Listen, Attend and Spell." (2015). link
  11. Information Retrieval (IR).
    • Date: 27 Mar.
    • Readings:
      • Jurafsky & Martin SLP3 (3rd ed.): Chapter 14, only the first part (14.1). link
  12. Interpretability.
    • Date: 29 Mar.
    • Readings:
      • Lundberg & Lee "A Unified approach to interpreting model predictions" (2022). link
  13. Large Language Models (LLMs).
    • Date: 3 Apr.
    • (Optional) Supplementary resources and readings:
      • GPT3: Radford et al. "Language models are few-shot learners." (2020). link
      • OpenAI. "ChatGPT: Optimizing Language models for Dialogue." (2022). link
      • Nature News. C. Walker. "ChatGPT listed as author on research papers: many scientists disapprove". (2023). link
      • AI Readings. UofT Academic and Collaborative Technologies (ACT). link
  14. Summary and Review.
    • Date: 5 Apr.

Tutorial materials

Enrolled students: Please see the Quercus page Tutorial Materials. The final version (ex-post errata, and/or other modifications will be posted here on the course website (for anyone auditing).

Assignments

Enrolled students: Please use the Quercus Assignments page for all materials. The final version (ex-post errata, updates) will be posted here (for anyone auditing the course). Here is the ID template that you must submit with your assignments. Here is the MarkUs link you use to submit them.

Extension requests: Please follow the extension request procedure detailed here. A copy of Special Consideration Form here.

Remark requests: Please follow the remarking policy detailed here.


Project

The course project is an optional replacement for the assignments available to graduate students in CSC2511.

Back to top

Past course materials

Fall 2022: course page

Old Exams

  • The old exam repository from UofT libraries here (May not contain this particular course's final exams).
  • (Old) final exam from 2017 here. Please N.B. while the exam bears structural similarity with a concurrent final exam, the materials may or may not corresopnd to concurrent syllabus (e.g. all statistical MT questions are outside scope in W23), thereby appear as estoeric.
Back to top

News and announcements

  • FIRST LECTURE: 9 January at 10h or 11h (check your section on ACORN enrolment).
  • FIRST TUTORIAL: There will be a tutorial on the first week of lectures (i.e. 13 January, Fri).
  • READING WEEK BREAK: The week of Feb. 20-24 - there will be no lectures or tutorials.
  • FINAL EXAM: 22-APR-2023: 14.00-15.00. Exam schedule and (location) info now available here.

Back to top

Frequently asked questions

Please see the Arts and Science departmental student F.A.Q page for additional answers.

  1. How much Machine Learning background do I need?

    Please look at lecture slides posted and see how comfortable you are with the middle and later materials. Please consider the waitlist and the eligible people waiting to get into the class. Machine Learning is highly recommended but not a pre-requisite of the course. A lot of the course content requires some basic knowledge and appreciation of machine learning, and it will help if you have some knowledge. Obviously, you do not need all the knowledge of CSC311 Introduction to Machine Learning, but some understanding of machine learning is helpful, such as Andrew Ng's "Machine Learning for Everyone". You should at least know, supervised learning, classification, features, vectors, feature extraction/engineering, training/test dataset, and naive bayes for the first 1/3 of the lectures then deep learning, softmax, objective functions, basic training mechanism etc. for the remainder.

    Also see this instructor answer on piazza.

  2. Are the lectures recorded?

    Students are expected to attend all lectures and tutorials in person, consistent and quality video recordings are not guaranteed; so it'd be best to assume "No" to this question and plan accordingly. Any recordings (if any) may be posted under the Class Recordings page on Quercus.

  3. What if I am unable to attend the class?

    If a student has to miss a substantial portion of the course or is consistently missing one of the weekly lecture times and needs to watch a recorded one; then they should drop the course to allow someone else on the waiting list to take the course.

  4. How to audit the course?

    Students are allowed to audit the course, as long as they are not taking up any course resources (including instructor/TA/admin time). They can attend lectures assuming there is space, and access any publicly accessible course materials (e.g. a Quercus page that is set to Institution visibility), but they don't get access to anything else. For example, if a regular student in that section (or another section) cannot find a seat at the lecture, they should give up the seat. They shouldn't expect questions to be answered on course resources or assignments/projects graded.

  5. Course enrollment issues?

    Unfortunately, the course staff of 401 and the instructors do not have the ability to handle the lecture sections enrollment or move a student from one section to another. You might need to get in contact with your college registrar for anything related to ACORN/course enrollment. If you have further questions and concerns, please do not hesitate to reach out.

  6. Lecture conflicts?

    Unfortunately, the course staff of 401 and the instructors do not have the ability to move a student from one section to another. And given the long waiting lists, it is unlikely that you will get a spot in a section that you request in ACORN only at this late date. One thing that may help you is that if you remain registered in the section with the conflict, we will allow you to attend lectures in any other in-person section as long as there is space in the classroom. However, if that isn't possible for you to attend one of the weekly lecture times in person, you will need to drop CSC401 for this term and hopefully take it in the future.

  7. (Grad. students only) What does doing the 'Project' option mean for course deliverables?

    You can do the project in place of doing the 3 assignments. However, you can NOT skip the final exam (which will include materials covered for the assignments). Please read the project document and see Quercus for details. Assignments are optional (but recommended for those new to NLP), Final Exam is required.

Back to top