Contact information

InstructorsGerald Penn, Raeid Saqur, and Sean Robertson.
OfficePT 283; BA 2270
Office hoursGerald Penn: F 12h-14h at PT 283; RS/SR: M 12-13h at BA 2270
Emailcsc401-2024-01@cs. (add the toronto.edu suffix)
Forum (Piazza)Piazza - (signup)
Quercushttps://q.utoronto.ca/courses/337533
Email policyFor non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.
Back to top

Lecture materials

Assigned readings give you more in-depth information on ideas covered in lectures. You will not be asked questions relating to readings for the assignments, but they will be useful in studying for the final exam.

Provided PDFs are ~ 10% of their original size for portability, at the expense of fidelity.

For pre-lecture readings and in-class note taking, please see under Quercus Modules. The final versions (ex-post errata, and/or other modifications will be posted here on the course website.

  1. Introduction.
    • Date: 8 Jan.
    • Reading: Manning & Schütze: Sections 1.3-1.4.2, Sections 6.0-6.2.1
  2. Corpora and Smoothing.
    • Dates: 10 Jan.
    • Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
    • Reading: Jurafsky & Martin: 3.4-3.5
  3. Entropy and information theory.
    • Dates: 15, 17 Jan.
    • Reading: Manning & Schütze: Sections 2.2, 5.3-5.5
  4. Features and Classification.
    • Date: 22 Jan.
    • Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
    • Reading: Jurafsky & Martin: 3.4-3.5
  5. Intro. to NNs and Neural Langauge Models.
    • Dates: 24, 29 Jan.
    • Reading: DL (Goodfellow et al.). Sections: 6.3, 6.6, 10.2, 10.5, 10.10
    • (Optional) Supplementary resources and readings:
      • Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space. (2013)" link
      • Xin Rong. "word2Vec Parameter Learning Explained". link
      • Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? debiasing word embeddings." NeurIPS (2016). link
      • Greff, Klaus, et al. "LSTM: A search space odyssey." IEEE (2016). link
      • Jozefowicz, Sutskever et al. "An empirical exploration of recurrent network architectures." ICML (2015). link
      • GRU: Cho, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." (2014). link
      • ELMo: Peters, Matthew E., et al. "Deep contextualized word representations. (2018)." link
      • Blogs:
      • The Unreasonable Effectiveness of Recurrent Neural Networks. link
      • Colah's Blog. "Understanding LSTM Networks". link.
  6. Machine translation (MT).
    • Dates: 31 Jan.; 5,7 Feb.
    • Readings:
      • Manning & Schuütze Sections 13.0, 13.1.2, 13.1.3, 13.2, 13.3, 14.2.2
      • DL (Goodfellow et al.). Sections: 10.3, 10.4, 10.7
    • (Optional) Supplementary resources and readings:
      • Papineni, et al. "BLEU: a method for automatic evaluation of machine translation." ACL (2002). link
      • Sutskever, Ilya, Oriol Vinyals et al. "Sequence to sequence learning with neural networks."(2014). link
      • Bahdanau, Dzmitry, et al. "Neural machine translation by jointly learning to align and translate."(2014). link
      • Luong, Manning, et al. "Effective approaches to attention-based neural machine translation." arXiv (2015). link
      • Britz, Denny, et al. "Massive exploration of neural machine translation architectures."(2017). link
      • BPE: Sennrich, et al. "Neural machine translation of rare words with subword units." arXiv (2015). link
      • Wordpiece: Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv (2016). link
      • Blogs:
      • Distill: Olah & Carter "Attention and Augmented RNNs"(2016). link
  7. Transformers.
    • Dates: 12, 14 Feb.
    • Readings:
      • Vaswani et al. "Attention is all you need." (2017). link
    • (Optional) Supplementary resources and readings:
      • RoPE: Su, Jianlin, et al. "Roformer: Enhanced transformer with rotary position embedding." (2021). [arxiv]
      • Ba, Kiros, and Hinton. "Layer normalization." (2016). [link]
      • Xiong, Ruibin, et al. "On layer normalization in the transformer architecture." ICML PMLR (2020). [link]
      • Xie et al. "ResiDual: Transformer with Dual Residual Connections." (2023). [arxiv] [github]
      • BERTology:
      • Devlin et al. "BERT: Pre-training of deep bidirectional transformers for language understanding." (2019). link
      • Clark et al. "What does bert look at? an analysis of bert's attention." (2019). link
      • Rogers, Anna et al. "A primer in BERTology: What we know about how bert works." TACL(2020). link
      • Tenney et al. "BERT rediscovers the classical NLP pipeline." (2019). link
      • Niu et al. "Does BERT rediscover a classical NLP pipeline." (2022). link
      • Lewis et al. "BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension." (2019). link
      • T5: Raffel et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020). link
      • GPT3: Radford et al. "Language models are few-shot learners." (2020). link
      • Attention-free models:
      • Fu, Daniel, et al. "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture." (2023). [arxiv]. [blog].
      • Token-free models:
      • Clark et al. "CANINE: Pre-training an efficient tokenization-free encoder for language representation." (2021). link
      • Xue et al. "ByT5: Towards a token-free future with pre-trained byte-to-byte models." (2022). link
      • Blogs:
      • Harvard NLP. "The Annotated Transformer". link.
      • Jay Allamar. "The Illustrated Transformer". link.
  8. Large language models.
    • Date: 26 Feb.
    • Readings: No required readings for this lecture.
    • (Optional) Supplementary resources and readings:
      • Zhao, Wayne Xin et al. "A Survey of Large Language Models." (2023). [arxiv] [list of LLMs]
      • Bommasani et al. "On the opportunities and risks of foundation models." (2022). link
      • Kaddour, Jean, et al. "Challenges and Applications of Large Language Models." (2023). [arxiv]
      • Jason Wei et al. Emergent abilities of large language models. (2022). [arxiv]
      • Schaeffer, Rylan et al. "Are emergent abilities of Large Language Models a mirage?." (2023). [arxiv]
      • Kaplan et al. "Scaling laws for neural language models." (2020). link
      • Li, Xiang Lisa, and Percy Liang. "Prefix-tuning: Optimizing continuous prompts for generation." (2021). [arxiv] [github]
      • Kudo and Richardson. "Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing." (2018). link
      • Instruction Finetuning:
      • REINFORCE: Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning. (1992)." link
      • InstructGPT: Ouyang, Long, et al. "Training language models to follow instructions with human feedback. arXiv preprint (2022)." link
      • RLHF: Christiano et al. "Deep reinforcement learning from human preferences." (2017). link
      • RLHF: Stiennon et al. "Learning to summarize with human feedback." (2020). link
      • RLHF: Casper, Stephen, et al. "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback." (2024). link
      • Korbak, Tomasz, et al. "Pretraining language models with human preferences." ICML. 2023. [link]
      • DPO: Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." (2023). [arxiv]
      • Saqur, Raeid, et al. "Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect." NeuRIPS (2024). link
      • Meta AI. "LLaMa: Open and Efficient Foundation Language Models." (2023). [arxiv] [blog] [blog]
      • PEFTs & Quantizations:
      • LoRA: Edward Hu et al. "Low-rank Adaptation of Large Language Models." (2021). link
      • Liu, Haokun, et al. "PEFT:Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning." Advances in Neural Information Processing Systems (2022). [paper] [github]
      • Llm.int8(): Dettmers, Tim, et al. "8-bit matrix multiplication for transformers at scale." (2022). link
      • QLoRA: Dettmers, Tim, et al. "Qlora: Efficient finetuning of quantized llms." NeuRIPS (2024). link
      • Benchmarks:
      • HELM Liang, Percy, et al. "Holistic evaluation of language models." (2022). [arxiv] [github]
      • BIG-bench Srivastava, Aarohi, et al. "Beyond the imitation game: Quantifying and extrapolating the capabilities of language models." (2022). [arxiv] [github]
      • MMLU Hendrycks, Dan, et al. "Measuring massive multitask language understanding." (2020). [arxiv] [github]
  9. Acoustics and phonetics.
  10. Speech features and speaker identification.
    • Dates: 6 Mar.
    • Readings:
      • Jurafsky & Martin SLP3 (3rd ed.): Chapter 16. link
  11. Dynamic programming for speech recognition.
    • Dates: 11, 13, 18 Mar.
    • Readings: N/A
  12. Information Retrieval (IR).
    • Date(s): 20 Mar.
    • Readings:
      • Jurafsky & Martin SLP3 (3rd ed.): Chapter 14, only the first part (14.1). link
  13. Text Summarization.
    • Date(s): 25 Mar.
  14. Guest Lecture on Ethics: [Module 1], [Module 2].
    • Date(s): 27 Mar., 1 Apr.
    • Supplementary materials/links:
      • Guest lecture on Ethics ft. Steven Coyne.
      • The Embedded Ethics Education Initiative at UofT, SRI Institute link
      • SRI Institute events
  15. Summary and Review (last lecture).
    • Date: 3 Apr.

Tutorial materials

Enrolled students: Please see under Quercus Modules. The final version (ex-post errata, and/or other modifications will be posted here on the course website (for anyone auditing).

Assignments

Enrolled students: Please use the Quercus Assignments page for all materials. The final version (ex-post errata, updates) will be posted here (for anyone auditing the course). Here is the ID template that you must submit with your assignments. Here is the MarkUs link you use to submit them.

Extension requests: Please follow the extension request procedure detailed here. A copy of Special Consideration Form here.
Remark requests: Please follow the remarking policy detailed here.

General Tips & F.A.Q.:


Back to top

News and announcements

  • [15-Jan-2024] ASSIGNMENTS: A1 has been released.
  • FIRST LECTURE: 8 January at 10h or 11h (check your section on ACORN enrolment).
  • FIRST TUTORIAL: There will be NO tutorial on the first week of lectures (i.e. 12 January, Fri).
  • READING WEEK BREAK: The week of Feb. 19-23 - there will be no lectures or tutorials.
  • LAST LECTURE: 5 April (check sessional calenders).
  • FINAL EXAM: April 25, 2024 [ArtSci Final Exam Schedule ]

Back to top