Contact information
Instructors Gerald Penn, Raeid Saqur, and Sean Robertson. Office PT 283; BA 2270 Office hours GP: F 12h-14h at PT 283; RS, SR: M 12-13h at BA 2270 Email csc401-2024-01@cs. (add the toronto.edu suffix) Forum (Piazza) Piazza - (signup) Quercus https://q.utoronto.ca/courses/337533 Email policy For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.
Lecture materials
Lecture Videos: Please note the in-person attendance requirement and clarification regarding lecture videos (see F.A.Q.). Selected video recordings in playlists: W24, W23
Assigned readings give you more in-depth information on ideas covered in lectures. You will not be asked questions relating to readings for the assignments, but they will be useful in studying for the final exam.
Provided PDFs are ~ 10% of their original size for portability, at the expense of fidelity.
For pre-lecture readings and in-class note taking, please see under Quercus Modules . The final versions (ex-post errata, and/or other modifications) will be posted here on the course website.
- Introduction.
- Date: 8 Jan.
- Reading: Manning & Schütze: Sections 1.3-1.4.2, Sections 6.0-6.2.1
- Corpora and Smoothing.
- Dates: 10 Jan.
- Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
- Reading: Jurafsky & Martin: 3.4-3.5
- Entropy and information theory.
- Dates: 15, 17 Jan.
- Reading: Manning & Schütze: Sections 2.2, 5.3-5.5
- Features and Classification.
- Date: 22 Jan.
- Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
- Reading: Jurafsky & Martin: 3.4-3.5
- Intro. to NNs and Neural Langauge Models.
- Dates: 24, 29 Jan.
- Reading: DL (Goodfellow et al.). Sections: 6.3, 6.6, 10.2, 10.5, 10.10
- (Optional) Supplementary resources and readings:
- Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space. (2013)" link
- Xin Rong. "word2Vec Parameter Learning Explained". link
- Bolukbasi, Tolga, et al. "Man is to computer programmer as woman is to homemaker? debiasing word embeddings." NeurIPS (2016). link
- Greff, Klaus, et al. "LSTM: A search space odyssey." IEEE (2016). link
- Jozefowicz, Sutskever et al. "An empirical exploration of recurrent network architectures." ICML (2015). link
- GRU: Cho, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." (2014). link
- ELMo: Peters, Matthew E., et al. "Deep contextualized word representations. (2018)." link Blogs:
- The Unreasonable Effectiveness of Recurrent Neural Networks. link
- Colah's Blog. "Understanding LSTM Networks". link.
- Machine translation (MT).
- Dates: 31 Jan.; 5,7 Feb.
- Readings:
- Manning & Schuütze Sections 13.0, 13.1.2, 13.1.3, 13.2, 13.3, 14.2.2
- DL (Goodfellow et al.). Sections: 10.3, 10.4, 10.7
- (Optional) Supplementary resources and readings:
- Papineni, et al. "BLEU: a method for automatic evaluation of machine translation." ACL (2002). link
- Sutskever, Ilya, Oriol Vinyals et al. "Sequence to sequence learning with neural networks."(2014). link
- Bahdanau, Dzmitry, et al. "Neural machine translation by jointly learning to align and translate."(2014). link
- Luong, Manning, et al. "Effective approaches to attention-based neural machine translation." arXiv (2015). link
- Britz, Denny, et al. "Massive exploration of neural machine translation architectures."(2017). link
- BPE: Sennrich, et al. "Neural machine translation of rare words with subword units." arXiv (2015). link
- Wordpiece: Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv (2016). link Blogs:
- Distill: Olah & Carter "Attention and Augmented RNNs"(2016). link
- Transformers.
- Dates: 12, 14 Feb.
- Readings:
- Vaswani et al. "Attention is all you need." (2017). link
- (Optional) Supplementary resources and readings:
- RoPE: Su, Jianlin, et al. "Roformer: Enhanced transformer with rotary position embedding." (2021). [arxiv]
- Ba, Kiros, and Hinton. "Layer normalization." (2016). [link]
- Xiong, Ruibin, et al. "On layer normalization in the transformer architecture." ICML PMLR (2020). [link]
- Xie et al. "ResiDual: Transformer with Dual Residual Connections." (2023). [arxiv] [github] BERTology:
- Devlin et al. "BERT: Pre-training of deep bidirectional transformers for language understanding." (2019). link
- Clark et al. "What does bert look at? an analysis of bert's attention." (2019). link
- Rogers, Anna et al. "A primer in BERTology: What we know about how bert works." TACL(2020). link
- Tenney et al. "BERT rediscovers the classical NLP pipeline." (2019). link
- Niu et al. "Does BERT rediscover a classical NLP pipeline." (2022). link
- Lewis et al. "BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension." (2019). link
- T5: Raffel et al. "Exploring the limits of transfer learning with a unified text-to-text transformer." J. Mach. Learn. Res. 21.140 (2020). link
- GPT3: Radford et al. "Language models are few-shot learners." (2020). link Attention-free models:
- Fu, Daniel, et al. "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture." (2023). [arxiv]. [blog]. Token-free models:
- Clark et al. "CANINE: Pre-training an efficient tokenization-free encoder for language representation." (2021). link
- Xue et al. "ByT5: Towards a token-free future with pre-trained byte-to-byte models." (2022). link Blogs:
- Harvard NLP. "The Annotated Transformer". link.
- Jay Allamar. "The Illustrated Transformer". link.
- Large language models.
- Date: 26 Feb.
- Readings: No required readings for this lecture.
- (Optional) Supplementary resources and readings:
- Zhao, Wayne Xin et al. "A Survey of Large Language Models." (2023). [arxiv] [list of LLMs]
- Bommasani et al. "On the opportunities and risks of foundation models." (2022). link
- Kaddour, Jean, et al. "Challenges and Applications of Large Language Models." (2023). [arxiv]
- Jason Wei et al. Emergent abilities of large language models. (2022). [arxiv]
- Schaeffer, Rylan et al. "Are emergent abilities of Large Language Models a mirage?." (2023). [arxiv]
- Kaplan et al. "Scaling laws for neural language models." (2020). link
- Li, Xiang Lisa, and Percy Liang. "Prefix-tuning: Optimizing continuous prompts for generation." (2021). [arxiv] [github]
- Kudo and Richardson. "Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing." (2018). link Instruction Finetuning:
- REINFORCE: Williams, Ronald J. "Simple statistical gradient-following algorithms for connectionist reinforcement learning. (1992)." link
- InstructGPT: Ouyang, Long, et al. "Training language models to follow instructions with human feedback. arXiv preprint (2022)." link
- RLHF: Christiano et al. "Deep reinforcement learning from human preferences." (2017). link
- RLHF: Stiennon et al. "Learning to summarize with human feedback." (2020). link
- RLHF: Casper, Stephen, et al. "Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback." (2024). link
- Korbak, Tomasz, et al. "Pretraining language models with human preferences." ICML. 2023. [link]
- DPO: Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." (2023). [arxiv]
- Saqur, Raeid, et al. "Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect." NeuRIPS (2024). link
- Meta AI. "LLaMa: Open and Efficient Foundation Language Models." (2023). [arxiv] [blog] [blog] PEFTs & Quantizations:
- LoRA: Edward Hu et al. "Low-rank Adaptation of Large Language Models." (2021). link
- Liu, Haokun, et al. "PEFT:Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning." Advances in Neural Information Processing Systems (2022). [paper] [github]
- Llm.int8(): Dettmers, Tim, et al. "8-bit matrix multiplication for transformers at scale." (2022). link
- QLoRA: Dettmers, Tim, et al. "Qlora: Efficient finetuning of quantized llms." NeuRIPS (2024). link Benchmarks:
- HELM Liang, Percy, et al. "Holistic evaluation of language models." (2022). [arxiv] [github]
- BIG-bench Srivastava, Aarohi, et al. "Beyond the imitation game: Quantifying and extrapolating the capabilities of language models." (2022). [arxiv] [github]
- MMLU Hendrycks, Dan, et al. "Measuring massive multitask language understanding." (2020). [arxiv] [github]
- Acoustics and phonetics.
- Dates: 28 Feb., 4 Mar.
- Reading: Phonetics: J&M SLP2 (2nd ed.) Chapter 7; J&M SLP3 (3rd ed.) Chapter H
- Speech features and speaker identification.
- Dates: 6 Mar.
- Readings:
- Jurafsky & Martin SLP3 (3rd ed.): Chapter 16. link
- Dynamic programming for speech recognition.
- Dates: 11, 13, 18 Mar.
- Readings: N/A
- Information Retrieval (IR).
- Date(s): 20 Mar.
- Readings:
- Jurafsky & Martin SLP3 (3rd ed.): Chapter 14, only the first part (14.1). link
- Text Summarization.
- Date(s): 25 Mar.
- Guest Lecture on Ethics: [Module 1], [Module 2].
- Date(s): 27 Mar., 1 Apr.
- Supplementary materials/links:
- Guest lecture on Ethics ft. Steven Coyne.
- The Embedded Ethics Education Initiative at UofT, SRI Institute link
- SRI Institute events
- Summary and Review (last lecture).
- Date: 3 Apr.
Tutorial materials
Enrolled students: Please see under Quercus Modules. The final version (ex-post errata, and/or other modifications will be posted here on the course website (for anyone auditing).
- Assignment 1 tutorials:
- Jan. 19, 2024: Tutorial 1
- Jan. 26, 2024: Tutorial 2 - Entropy and decisions
- Feb. 9, 2024: A1 Q/A + O.H. w/ the TAs (no slides).
- Assignment 2 tutorials:
- Feb. 16, 2024: Tutorial-I: Intro. to PyTorch (ft. Yun Shun)
- Mar. 1, 2024: Tutorial-II: Machine Translation with Transformers (ft. Julia Watson)
- Mar. 8, 2024: A2 Q/A + O.H. w/ the TAs (no slides).
- Assignment 3 tutorials:
- Mar. 15, 2024: Tutorial-I
- Mar. 22, 2024: Tutorial-II
- Mar. 29, 2024: Good Friday - No Tutorial (Q/A + O.H. w/ the TAs in person).
- See Quercus -> Pages -> Tutorial Materials
Assignments
Enrolled students: Please use the Quercus Assignments page for all materials. The final version (ex-post errata, updates) will be posted here (for anyone auditing the course). Here is the ID template that you must submit with your assignments. Here is the MarkUs link you use to submit them.
Extension requests: Please follow the extension request procedure detailed here. A copy of Special Consideration Form here.
Remark requests: Please follow the remarking policy detailed here.General Tips & F.A.Q.:
- Working on teach.cs (wolf) server: A1-intro.pdf
- Creating a local env mimicking teach.cs environment: wolf-py310-requirements.txt
- Assignment 1: Identifying political affiliations on Reddit
- Released: Jan 15, 2024. Due: Feb 9, 2024
- For all A1 related emails, please use: csc401-2024-01-a1@cs. (add the toronto.edu suffix)
- The starter-code, marking rubric, requirements.txt files will be on teach.cs server - please see handout
- Working on teach.cs + A1 Q/As: A1-intro.pdf
- Assignment 2: Neural Machine Translation with Transformers
- Released: Feb 10, 2024. Due: Mar 8, 2024
- For all A2 related emails, please use: csc401-2024-01-a2@cs. (add the toronto.edu suffix)
- Download the starter code from MarkUS
- Marking rubric.pdf, criteria.yml.
- The associated requirements file,
- LaTex Report template a2_report.zip,
- Assignment 3: ASR, Speakers, and Lies
- Released: Mar 6, 2024. Due: Apr 5, 2024
- For all A3 related emails, please use: csc401-2024-01-a3@cs. (add the toronto.edu suffix)
- The starter code is available on: MarkUS
Back to top
News and announcements
- [15-Jan-2024] ASSIGNMENTS: A1 has been released.
- FIRST LECTURE: 8 January at 10h or 11h (check your section on ACORN enrolment).
- FIRST TUTORIAL: There will be NO tutorial on the first week of lectures (i.e. 12 January, Fri).
- READING WEEK BREAK: The week of Feb. 19-23 - there will be no lectures or tutorials.
- LAST LECTURE: 5 April (check sessional calenders).
- FINAL EXAM: April 25, 2024 [ArtSci Final Exam Schedule ]