Contact information
-
Instructor Frank Rudzicz Office 550 University avenue, room 12-175 Office hours W 10h-11h Office phone 416 597 3422 x7971 Email frank@cdf.utoronto.[CANADA] (fix the suffix) Forum https://csc.cdf.toronto.edu/mybb/forumdisplay.php?fid=432 Email policy For non-confidential inquiries, consult the CDF forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example. TAs Krish Perumal. Fix the suffix in the linked email address.
Course outline
-
This course covers, as the name suggests, computational linguistics and the understanding and generation of natural language by machines. Topics include syntactic processing, semantics and semantic interpretation, pragmatics, pronouns, definite descriptions, discourse context, and machine translation. It differs from CSC401/2511 Natural Language Computing primarily in that this course is more closely aligned with linguistic aspects, and CSC401/2511 is more aligned with statistics and applications.
Assignments will be completed in Python. All code must run on the CDF machines.
Prerequisites: STA247H1/STA255H1/STA257H1 or familiarity with basic probability theory; CSC209H1 or proficiency in C++, Java, or Python. CSC324H1/CSC330H1/CSC384H1 is strongly recommended. For advice, contact the Undergraduate Office in the Bahen Centre.
The course information sheet is available here.
Readings for this course
-
Strongly recommended Speech and Language Processing D. Jurafsky and J.H. Martin Strongly recommended Natural Language Processing with Python S. Bird, E. Klein, and E. Loper - Free online
Possibly helpful Text Processing in Python D. Mertz
Assigned reading
-
This is information on how we're going to do assigned course readings this year: Reading-instructions.pdf.
Topic Title Author(s) Misc Philosophy of AI Computing Machinery and Intelligence Alan M. Turing Instructions Morphology and IR An algorithm for suffix stripping Martin F. Porter Instructions
BackgroundStatistics, ambiguity, and PP attachment Statistical models for unsupervised prepositional phrase attachment Adwait Ratnaparkhi Instructions Disambiguation, (un-)supervised learning Unsupervised word sense disambiguation rivaling supervised methods David Yarowsky Instructions Word senses, neural models for word vectors A unified model for word sense representation and disambiguation Xinxiong Chen, Zhiyuan Liu, Maosong Sun Instructions
Background
CodeCausal relations; question answering Automatic detection of causal relations for question answering Roxana Girju Text mining; causal relations Text mining for causal relations Roxana Girju and Dan Moldovan
Evaluation policies
- General
- You will be graded on five homework assignments and five 'write-ups' on assigned reading matrial. The relative proportions of these grades are as follows:
Write-up 1 5% Assignment 1 12% Write-up 2 5% Assignment 2 18% Write-up 3 5% Assignment 3 12% Write-up 4 5% Assignment 4 18% Write-up 5 5% Assignment 5 15% - Lateness
- A 10% deduction is applied to late homework one minute after the due time. Thereafter, an additional 10% deduction of the original maximum mark is applied every 24 hours up to 96 hours late at which time the homework will receive a mark of zero. No exceptions will be made except in emergencies, including medical emergencies, at the instructor's discretion.
- Collaboration and plagiarism
- No collaboration on the homeworks is permitted. The work you submit must be your own. 'Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, copying of written answers, and sharing of answers prior to submission of the work. Failure to observe this policy is an academic offense, carrying a penalty ranging from a zero on the homework to suspension from the university. See Academic integrity at the University of Toronto.
Syllabus
-
The following is an estimate of the topics to be covered in the course and is subject to change.
- Intro to computational linguistics
- Grammars and parsing
- Chart parsing
- Parsing with features
- Ambiguity resolution
- Statistical attachment disambiguation
- Lexical semantics
- Word sense disambiguation
- Statistical parsing
- Anaphora resolution
- Semantic representations
Calendar
-
16 September First lecture 23 September Write-up 1 due 27 September Last day to add CSC 485 28 September Last day to add CSC 2501 512 OctoberAssignment 1 due 7 October Write-up 2 due 27 October Assignment 2 due 21 October Write-up 3 due 2 November Last day to drop CSC 2501 4 November Assignment 3 due 4 November Write-up 4 due 8 November Last day to drop CSC 485 1618 NovemberAssignment 4 due 18 November Write-up 5 due 25 November Last lecture 9 December Assignment 5 due
News and announcements
-
- FIRST LECTURE: 16 September at 13h in LM157.
- FIRST TUTORIAL: 17 September at 16h in LM157.
- 30 September lecture postponed. A1 now due 12 October.
- A4 now due 18 November.
Lecture materials
-
Part Subjects Lecture slides Assigned reading 1
(16 Sep)- Introduction to CL
Jurafsky & Martin: sections 1; Bird et al.: 1 [2.3, 4] 2
(23 Sep)- Introduction to syntax
Jurafsky & Martin: sections 5.0-1, 12.0-12.3.3, 12.3.7, [13.1-2]; Bird et al.: 8.0-4 3
(7 Oct)- Chart parsing
Jurafsky & Martin: 13.3-4.
Allen: 3.4, 3.6.
Bird et al.: 8.4, online extras 8.2 to end of section "Chart Parsing in NLTK".4
(also 7 Oct)- Parsing with features
Jurafsky & Martin: 12.3.4-6, 15.0-3;
[Allen: 4.1-5];
Bird et al: 9.5
(14 Oct)- Resolution of ambiguity
6
(21 Oct)- Resolution of PP attachment ambiguity with statistics
Hindle, D., Rooth, M. (1993) Structural ambiguity and lexical relations, Computational Linguistics - Special issue on using large corpora: I, 19(1):103-120 7
(28 Oct)- Lexical semantics; (Word|Frame)Net
Jurafsky & Martin: 19.1-4, 20.8;
Bird et al: 2.58
(4 Nov)- Word sense disambiguation
Jurafsky & Martin: 20.1-5 9
(11 Nov)- Statistical parsing
Jurafsky & Martin: 5.2-5.5.2, 5.6, 12.4, 14.0-1, 14.3-4, 14.6-7.
Bird et al: 8.6.10
(18 Nov)- Neural models of word representations
11
(25 Nov)- Anaphora; semantic processing
Tutorial materials
Assignments
Here is the ID template that you must submit with your assignments.
- Assignment 1.
- Assignment 2.
- Assignment 3. Here are the files you'll need: wordlist, pp-corpus, and Miller-pairs
- Assignment 4. Here is a file you'll need: Causal-verbs.txt, and the rest are on CDF as specificed in the handout. The two Girju readings are above.
- Assignment 5. Here is data_utils.py and modules.tar.bz2.
2014 website
Here is the website for the iteration of this course offered in 2014, with additional information: CSC485/2501 2014 webpage