CSC 401/2511 -- Natural Language Computing

Winter 2010

Index of this document

Contact information

Instructor: Gerald Penn
Office: PT 396B (St. George campus)
Office hours: immediately following lectures (normally Mondays and Fridays) 1-2, or by appointment
Tel: 978-7390
Back to the index

Meeting times

Lectures: MF 12-1, BA 1190
Tutorials: W 12-1, BA 1190
(Exceptions: there will be lectures on MWF, 4/6/8 January - no tutorial first week;

there will be a lecture on Wednesday, 10 February and a tutorial on Friday, 12 February;
there will be a tutorial on Monday,  22 February and a lecture on Wednesday, 24 February;
there will be a lecture on Wednesday, 17 March, and a tutorial on Friday, 19 March;
there will be a tutorial on Monday, 22 March, and a lecture on Wednesday, 24 March;
there will no lecture or tutorial on Friday, 2 April)
Assignment Due Tutor
1 5 February Siavash Kazemian
2 5 March Jackie Cheung
3 1 April Frank Rudzicz

A bulletin board has also been created for the class, which willi be monitored by the TAs.

Back to the index

Texts for the Course

Required C. Manning & H. Schuetze, Foundations of Statistical Natural Language Processing, MIT, 1999. Errata
  for which there is an on-line edition from MIT CogNet  
Optional D. Jurafsky & J. Martin, Speech and Language Processing, Prentice Hall, 2nd ed., 2008. Errata
Recommended A. Martelli, Python in a Nutshell, 2nd ed., O'Reilly, 2006. Errata
Optional M. Lutz, Learning Python, 3rd ed., O'Reilly, 2007. Errata
Free! various tutorials on the Python website

Supplementary Reading for the Lectures

Topic Title Author Publication Details
phrase structure models
Statistical Language Learning E. Charniak MIT Press, 1993.
machine learning The Elements of Statistical Learning T. Hastie, R. Tibshirani and J. Friedman Springer, 2001.
information theory 
(including entropy)
Elements of Information Theory T. M. Cover and J. A. Thomas Wiley & Sons, 1991.
maximum entropy modelling A Maximum Entropy Approach to Natural Language Processing A. L. Berger, S. A. Della Pietra and V. J. Della Pietra Computational Linguistics, 22(1): 39-71.
hidden Markov models 
(state emission)
Fundamentals of Speech Recognition, Chapter 6. L. Rabiner and B.-H. Juang Prentice Hall, 1993.
Good-Turing estimation A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams K. Church and W. Gale Computer Speech and Language 5:19-54.
information retrieval Modern Information Retrieval R. Baeza-Yates and B. Ribeiro-Neto ACM Press, 1999.
text summarization Automatic Summarization I. Mani Benjamins, 2001.
 phonetics (articulatory and acoustic) Acoustic Phonetics K. N. Stevens MIT Press, 1998.

Back to the index

Tentative Course outline

Back to the index

Calendar of important course-related events

Date Event
Mon, 4 January First lecture
Fri, 15 January Last day to add course (CSC 2511)
Sun, 10 January Last day to add course (CSC 401)
Fri, 5 February Assignment 1 due
15-19 February Reading Week - no classes
Fri, 26 February Last day to drop course (CSC 2511)
Sun, 7 March Last day to drop course (CSC 401)
Fri, 5 March Assignment 2 due
Mon, 29 March Last lecture
Thu, 1 April Assignment 3 due
7-23 April Final exam period

Back to the index

Evaluation and related policies

There will be three homeworks, and a final exam. The relative weights of these components towards the final mark are shown in the table below:
Assignment 1 20%
Assignment 2 20%
Assignment 3 20%
Final 40%

Important note on final: A mark of at least a D- on the final exam is required to pass the course.  In other words, if you receive an F on the final exam you automatically fail the course, regardless of your performance on homeworks.

Important note on homeworks: No late homeworks will be accepted except in case of documented medical or other emergencies.

Policy on collaboration: No collaboration on homeworks is permitted.  The work you submit must be your own.  No student is permitted to discuss the final exam with any other student until the instructor or TAs make the solutions publicly available.  Failure to observe this policy is an academic offense, carrying a  penalty ranging from a zero on the homework to suspension from the university.

Back to the index


In this space, you will find announcements related to the course. Please check this space at least weekly. Back to the index


In this space you will find on-line PDF versions of course handouts, including homeworks.

To view these handouts you will need access to a PDF viewer. If your machine does not have the required software, you can download Adobe Acrobat Reader for free.

Back to the index

Old Exams

Some old midterm and final exams for this course (with no solutions). Back to the index

Gerald Penn, 6 April, 2010
This web-page was adapted from the web-page for another course, created by Vassos Hadzilacos.