CSC401/2511 :: Natural Language Computing

Contact information

Instructor		Gerald Penn
Office		PT 283
Office hours		M 4-6pm
Email		gpenn@teach.cs.toronto.edu (please put CSC 401/2511 in the subject line)

Forum (Piazza)		Piazza - (signup)
Quercus		https://q.utoronto.ca/courses/352606

Email policy		For non-confidential inquiries, consult the Piazza forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.

Back to top

Course overview

This course presents an introduction to natural language computing in applications such as information retrieval and extraction, intelligent web searching, speech recognition, and machine translation. These applications will involve various statistical and machine learning techniques. Assignments will be completed in Python. All code must run on the 'teaching servers'.

Prerequisites: CSC207/ CSC209/ APS105/ APS106/ ESC180/ CSC180 and STA237/ STA247/ STA255/ STA257/ STAB52/ ECE302/ STA286/ CHE223/ CME263/ MIE231/ MIE236/ MSE238/ ECE286 and a CGPA of 3.0 or higher or a CSC subject POSt. MAT 223 or 240, CSC 311 (or equivalent) are strongly recommended.

Meeting times

Locations	BA	Bahen Centre for Information Technology
Lectures	MW	10-11h at BA 1180; 11-12h at BA 1190
Tutorials	F	10-11h at BA 1180; 11-12h at BA 1190

Back to top

Syllabus

The following is an estimate of the topics to be covered in the course and is subject to change.

Introduction to corpus-based linguistics
N-gram, linguistic features, word embeddings
Entropy and information theory
Intro to deep neural networks and neural language models
Machine translation (statistical and neural) (MT)
Transformers, attention based models and variants
Large language models (LLMs)
Acoustics and phonetics
Speech features and speaker identification
Dynamic programming for speech recognition.
Speech synthesis (TTS)
Information Retrieval (IR)
Text Summarization
Ethics in NLP

Calendar

4 September		First lecture
18 September		Last day to enrol
24 September		Part 1 of Assignment 1 due
8 October		Assignment 1 due
28 October		Last day to drop CSC 2511
28 October - 1 November		Reading week -- no lectures or tutorial
4 November		Last day to drop CSC 401
5 November		Assignment 2 due
3 December		Last lecture
3 December		Assignment 3 due
6-21 December		Final exam period

See Dates for undergraduate students.

See Dates for graduate students.

Back to top

Readings for this course

Optional	Foundations of Statistical Natural Language Processing	C. Manning and H. Schutze	Errata Online edition (free if you're on a UofT computer of VPN)
Optional	Speech and Language Processing	D. Jurafsky and J.H. Martin (2nd ed.)	Errata 3rd ed. N.B. all readings sections refer to the 2nd ed.
Optional	Deep Learning	I Goodfellow, Y Bengio, and A Courville

Supplementary reading

Please see additional lecture specific supplementary resources under Lecture Materials section.

Topic	Title	Author(s)	Misc
Good-Turing Smoothing	A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams	Kenneth Church and William Gale
ML History	What is science for? The Lighthill report on artificial intelligence reinterpreted	Jon Agar
Smoothing	An Empirical Study of Smoothing Techniques for Language Modeling	Stanley F Chen and Joshua Goodman
Hidden Markov models	A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition	Lawrence R. Rabiner
Sentence alignment	A Program for Aligning Sentences in Bilingual Corpora	William A. Gale and Kenneth W. Church
Transformation-based learning	Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging	Eric Brill
Sentence boundaries	Sentence boundaries	Read J, Dridan R, Oepen S, Solberg LJ
Seq2Seq	Sequence to Sequence Learning with Neural Networks	Ilya Sutskever, Oriol Vinyals, Quoc V. Le
Transformer	Attention Is All You Need	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Attention-based NMT	Effective Approaches to Attention-based Neural Machine Translation	Minh-Thang Luong, Hieu Pham, Christopher D. Manning
NMT	Neural machine translation by jointly learning to align and translate	Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio
NMT	Massive exploration of neural machine translation architectures	Britz, Denny, et al.

Back to top

Evaluation policies

General

You will be graded on three homework assignments and a final exam. The relative proportions of these grades are as follows:

Assignment 1		20%
Assignment 2		20%
Assignment 3		20%
Ethics Surveys (2x)		1%
Final exam		39%

Lateness

A 10% (absolute) deduction is applied to late homework one minute after the due time. Thereafter, an additional 10% deduction is applied every 24 hours up to 72 hours late at which time the homework will receive a mark of zero. No exceptions will be made except in case of documented emergencies.

Final

The final exam will be a timed 3-hour test. A mark of at least 50 on the final exam is required to pass the course. In other words, if you receive a 49 or less on the final exam then you automatically fail the course, regardless of your performance in the rest of the course.

Collaboration and plagiarism

No collaboration on the homeworks is permitted. The work you submit must be your own. `Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, copying of written answers, and sharing of answers prior to or after submission of the work (including the final exam). Failure to observe this policy is an academic offense, carrying a penalty ranging from a zero on the homework to suspension from the university. The use of AI writing assistance (ChatGPT, Copilot, etc) is allowed only for refining the English grammar and/or spelling of text that you have already written. Submitting any Python code generated or modified by any AI assistants is strictly prohibited. See Academic integrity at the University of Toronto.

Back to top

Lecture materials

Introduction
- Date: 4 Sep.
- Reading: Manning & Schütze: Sections 1.3-1.4.2, Sections 6.0-6.2.1
Corpora and Smoothing
- Dates: 9-16 Sep.
- Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
- Reading: Jurafsky & Martin: 3.4-3.5
- See also the supplementary reading for Good-Turing smoothing
Features and Classification
- Date: 18-23 Sep.
- Reading: Manning & Schütze: Section 1.4.3, Section 6.1-6.2.2, Section 6.2.5, Sections 6.3
- Reading: Jurafsky & Martin: 3.4-3.5
Entropy and information theory
- Dates: 25-30 Sep.
- Reading: Manning & Schütze: Sections 2.2, 5.3-5.5
Intro. to NNs and Neural Langauge Models
- Dates: 7, 9 Oct.
- Reading: DL (Goodfellow et al.). Sections: 6.3, 6.6, 10.2, 10.5, 10.10
- (Optional) Supplementary resources and readings:
Machine Translation (MT)
- Dates: 16,21,23 Oct.
- Readings:
- (Optional) Supplementary resources and readings:
Transformers
- Dates: 4,6 Nov.
- Readings:
- (Optional) Supplementary resources and readings:
Acoustics and Phonetics
- Dates: 8,11 Nov.
- Reading: Phonetics: J&M SLP2 (2nd ed.) Chapter 7; J&M SLP3 (3rd ed.) Chapter H
Speech Features and Speaker Identification
- Dates: 13,18 Nov.
- Readings:
Dynamic Programming for Speech Recognition
- Dates: 18,20 Nov.
- Readings: N/A
Information Retrieval (IR)
- Date(s): 20 Nov.
- Readings:
Text Summarization
- Date(s): 25 Nov.
Guest Lectures on Ethics: [Module 1], [Module 2]
- Date(s): 27 Nov., 29 Nov.
- Supplementary materials/links:
Summary and Review (last lecture).
- Date: 3 Dec.

Tutorial materials

Assignment 1 tutorials:

Sept. 6, 2024: Tutorial 0
Sept. 13, 2024: Tutorial 1 with slides
Sept. 27, 2024: Tutorial 2 with slides

Assignment 2 tutorials:

Oct. 11, 2024: Tutorial 1
Oct. 18, 2024: Tutorial 2

Assignment 3 tutorials:

Nov. 15, 2024: Tutorial 1
Nov. 22, 2024: Tutorial 2

Assignments

Here is the ID template that you must submit with your assignments.

Head TA: Ken Shi

Extension requests: All extension requests must be made to the head TA. All undergrads should follow the FAS student absences policy. Specifically, undergrads must file an ACORN absence declaration when it is allowed, and a VOI form for extensions due to illness when it is not allowed (because an ACORN declaration has already been filed this term). Grads should always use a VOI form for extensions due to illness.

Remark requests: Please follow the remarking policy.

General Tips & F.A.Q.:

Working on teach.cs (wolf) server: CSC401_F24_Assignments.pdf
Creating a local env mimicking teach.cs environment:
Note that a 24-hour `silence policy' will be in effect -- we do not guarantee that the instructors or TAs will respond to your request within 24 hours before an assignment's due time.

Assignment 1: Financial Sentiment Analysis

Due: 24th September / 8th October, 2024
For all A1-related emails, please use: csc401-2024-09-a1@cs.toronto.edu
Download the starter code from MarkUS

Assignment 2: Neural Machine Translation with Transformers

Due: 5th November, 2024
For all A2-related emails, please use: csc401-2024-09-a2@cs.toronto.edu

Assignment 3: ASR, Speakers, and Lies

Due: 3rd December, 2024
For all A3 related emails, please use: csc401-2024-09-a3@cs.toronto.edu

Back to top

Past course materials

Fall 2024: S24 course page

Fall 2023: S23 course page

Fall 2022: S22 course page

Old Exams

The old exam repository from UofT libraries (may not contain this particular course's final exams).
Final exam from 2017 (it bears structural similarity to our final exam this term, but the material we cover now has changed)

Back to top

News and announcements

FIRST WEEK: Our first lecture will take place on 4th September at 10:00 or 11:00, depending on your section. There will be a tutorial on the 6th.
ANNOUNCEMENT FROM ACCESSIBILITY SERVICES: Accessibility Services is seeking volunteer note takers for students in this class who are registered in Accessibility Services. By volunteering to take notes for students with disabilities, you are making a positive contribution to their academic success. By volunteering as a note-taker, you will benefit as well - It is an excellent way to improve your own note-taking skills and to maintain consistent class attendance.� At the end of term, we would be happy to provide a Certificate of Appreciation for your hard work. To request a Certificate of Appreciation please email us at as.notetaking@utoronto.ca. You may also qualify for a Co-Curricular Record by registering your volunteer work on�Folio before the end of June. We also have a draw for qualifying volunteers throughout the academic year. Register online as a Volunteer Note-Taker at: � https://clockwork.studentlife.utoronto.ca/custom/misc/home.aspx Email us at as.notetaking@utoronto.ca if you have questions or require any assistance with uploading notes. If you are no longer able to upload notes for a course, please also let us know immediately. Thank you for your support and for making notes more accessible for our students.

Back to top

CSC401/2511 - Natural Language Computing

Fall 2024

Contact information

Course overview

Meeting times

Syllabus

Calendar

Readings for this course

Supplementary reading

Evaluation policies

Lecture materials

Tutorial materials

Assignments

Past course materials

Old Exams

News and announcements