Contact information

InstructorFrank Rudzicz
Office550 University avenue, room 12-175
Office hoursW 10h-11h
Office phone416 597 3422 x7971
Emailfrank@cdf.utoronto.[CANADA] (fix the suffix)
Forumhttps://csc.cdf.toronto.edu/mybb/forumdisplay.php?fid=432
Email policyFor non-confidential inquiries, consult the CDF forum first. Otherwise, for confidential assignment-related inquiries, consult the TA associated with the particular assignment. Emails sent with appropriate subject headings and from University of Toronto email addresses are most likely not to be redirected towards junk email folders, for example.

TAsKrish Perumal. Fix the suffix in the linked email address.
Back to top

Meeting times

LecturesW 13h-15h in LM 157
TutorialsTh 16h-17h in LM 157
Back to top

Course outline

This course covers, as the name suggests, computational linguistics and the understanding and generation of natural language by machines. Topics include syntactic processing, semantics and semantic interpretation, pragmatics, pronouns, definite descriptions, discourse context, and machine translation. It differs from CSC401/2511 Natural Language Computing primarily in that this course is more closely aligned with linguistic aspects, and CSC401/2511 is more aligned with statistics and applications.

Assignments will be completed in Python. All code must run on the CDF machines.

Prerequisites: STA247H1/STA255H1/STA257H1 or familiarity with basic probability theory; CSC209H1 or proficiency in C++, Java, or Python. CSC324H1/CSC330H1/CSC384H1 is strongly recommended. For advice, contact the Undergraduate Office in the Bahen Centre.

The course information sheet is available here.

Back to top

Readings for this course

Strongly recommended Speech and Language ProcessingD. Jurafsky and J.H. Martin
Strongly recommended Natural Language Processing with PythonS. Bird, E. Klein, and E. Loper
  • Free online
Possibly helpful Text Processing in PythonD. Mertz

    Assigned reading

    This is information on how we're going to do assigned course readings this year: Reading-instructions.pdf.

    TopicTitleAuthor(s)Misc
    Philosophy of AIComputing Machinery and IntelligenceAlan M. TuringInstructions
    Morphology and IRAn algorithm for suffix strippingMartin F. PorterInstructions
    Background
    Statistics, ambiguity, and PP attachmentStatistical models for unsupervised prepositional phrase attachmentAdwait RatnaparkhiInstructions
    Disambiguation, (un-)supervised learningUnsupervised word sense disambiguation rivaling supervised methodsDavid YarowskyInstructions
    Word senses, neural models for word vectorsA unified model for word sense representation and disambiguationXinxiong Chen, Zhiyuan Liu, Maosong SunInstructions
    Background
    Code
    Causal relations; question answeringAutomatic detection of causal relations for question answeringRoxana Girju
    Text mining; causal relationsText mining for causal relationsRoxana Girju and Dan Moldovan
    Back to top

    Evaluation policies

    General
    You will be graded on five homework assignments and five 'write-ups' on assigned reading matrial. The relative proportions of these grades are as follows:
    Write-up 15%
    Assignment 112%
    Write-up 25%
    Assignment 218%
    Write-up 35%
    Assignment 312%
    Write-up 45%
    Assignment 418%
    Write-up 55%
    Assignment 515%
    Lateness
    A 10% deduction is applied to late homework one minute after the due time. Thereafter, an additional 10% deduction of the original maximum mark is applied every 24 hours up to 96 hours late at which time the homework will receive a mark of zero. No exceptions will be made except in emergencies, including medical emergencies, at the instructor's discretion.
    Collaboration and plagiarism
    No collaboration on the homeworks is permitted. The work you submit must be your own. 'Collaboration' in this context includes but is not limited to sharing of source code, correction of another's source code, copying of written answers, and sharing of answers prior to submission of the work. Failure to observe this policy is an academic offense, carrying a penalty ranging from a zero on the homework to suspension from the university. See Academic integrity at the University of Toronto.
    Back to top

    Syllabus

    The following is an estimate of the topics to be covered in the course and is subject to change.

    • Intro to computational linguistics
    • Grammars and parsing
    • Chart parsing
    • Parsing with features
    • Ambiguity resolution
    • Statistical attachment disambiguation
    • Lexical semantics
    • Word sense disambiguation
    • Statistical parsing
    • Anaphora resolution
    • Semantic representations

    Calendar

    16 SeptemberFirst lecture
    23 SeptemberWrite-up 1 due
    27 SeptemberLast day to add CSC 485
    28 SeptemberLast day to add CSC 2501
    5 12 OctoberAssignment 1 due
    7 OctoberWrite-up 2 due
    27 OctoberAssignment 2 due
    21 OctoberWrite-up 3 due
    2 NovemberLast day to drop CSC 2501
    4 NovemberAssignment 3 due
    4 NovemberWrite-up 4 due
    8 NovemberLast day to drop CSC 485
    16 18 NovemberAssignment 4 due
    18 NovemberWrite-up 5 due
    25 NovemberLast lecture
    9 DecemberAssignment 5 due

    See Dates for undergraduate students.

    See Dates for graduate students.

    Back to top

    News and announcements

    • FIRST LECTURE: 16 September at 13h in LM157.
    • FIRST TUTORIAL: 17 September at 16h in LM157.
    • 30 September lecture postponed. A1 now due 12 October.
    • A4 now due 18 November.

    Back to top

    Lecture materials

    PartSubjectsLecture slidesAssigned reading
    1
    (16 Sep)
    • Introduction to CL
    Jurafsky & Martin: sections 1; Bird et al.: 1 [2.3, 4]
    2
    (23 Sep)
    • Introduction to syntax
    Jurafsky & Martin: sections 5.0-1, 12.0-12.3.3, 12.3.7, [13.1-2]; Bird et al.: 8.0-4
    3
    (7 Oct)
    • Chart parsing
    Jurafsky & Martin: 13.3-4.
    Allen: 3.4, 3.6.
    Bird et al.: 8.4, online extras 8.2 to end of section "Chart Parsing in NLTK".
    4
    (also 7 Oct)
    • Parsing with features
    Jurafsky & Martin: 12.3.4-6, 15.0-3;
    [Allen: 4.1-5];
    Bird et al: 9.
    5
    (14 Oct)
    • Resolution of ambiguity
    6
    (21 Oct)
    • Resolution of PP attachment ambiguity with statistics
    Hindle, D., Rooth, M. (1993) Structural ambiguity and lexical relations, Computational Linguistics - Special issue on using large corpora: I, 19(1):103-120
    7
    (28 Oct)
    • Lexical semantics; (Word|Frame)Net
    Jurafsky & Martin: 19.1-4, 20.8;
    Bird et al: 2.5
    8
    (4 Nov)
    • Word sense disambiguation
    Jurafsky & Martin: 20.1-5
    9
    (11 Nov)
    • Statistical parsing
    Jurafsky & Martin: 5.2-5.5.2, 5.6, 12.4, 14.0-1, 14.3-4, 14.6-7.
    Bird et al: 8.6.
    10
    (18 Nov)
    • Neural models of word representations
    11
    (25 Nov)
    • Anaphora; semantic processing

    Tutorial materials

    Assignments

    Here is the ID template that you must submit with your assignments.

    Back to top

    2014 website

    Here is the website for the iteration of this course offered in 2014, with additional information: CSC485/2501 2014 webpage

    Back to top