CSC 401/2511 -- Natural Language Computing
Winter 2008
Index of this document
Contact information
Instructor: Gerald Penn
-
Office: PT 396B (St. George campus)
-
Office hours: Wednesays and Fridays 1-2, or by appointment
-
Tel: 978-7390
-
Email: gpenn@cdf.utoronto.ca
Back to the index
Meeting times
-
Lectures: WF 12-1, RW 110
-
Tutorials: M 12-1, RW 110
-
(Exceptions: the tutorials on 28th January and 4th February will take
place in BA 2200;
there will be a lecture
on Monday, 11th February and a tutorial on Friday, 15th February;
there will be a lecture
on Monday, 10th March and a tutorial on Friday, 14th March
there will be lectures
on Monday, Wednesday and Friday on the week of 24th March - no tutorial)
A bulletin
board has also been created for the class, which willi be monitored
by the TAs.
Back to the index
Texts for the Course
| Required |
C. Manning &
H.
Schuetze,
Foundations
of Statistical Natural Language Processing, MIT,
1999. |
Errata |
| |
for which there is an
on-line edition from MIT CogNet |
|
| Optional |
D. Jurafsky
& J. Martin, Speech
and Language Processing, Prentice
Hall, 2000. |
Errata |
| Recommended |
A. Martelli, Python
in a Nutshell, O'Reilly,
2003. |
Errata |
| Optional |
M. Lutz, D.
Ascher, Learning Python, 2nd ed., O'Reilly,
2003. |
Errata |
| Free! |
various tutorials on the Python website |
|
Supplementary Reading for the Lectures
| Topic |
Title |
Author |
Publication Details |
parsing,
phrase structure models |
Statistical
Language Learning |
E. Charniak |
MIT Press, 1993. |
| machine learning |
The
Elements of Statistical Learning |
T. Hastie, R. Tibshirani and J. Friedman |
Springer, 2001. |
information theory
(including entropy) |
Elements
of Information Theory |
T. M. Cover and J. A. Thomas |
Wiley & Sons, 1991. |
| maximum entropy modelling |
A Maximum Entropy Approach to Natural Language Processing |
A. L. Berger, S. A. Della Pietra and V. J. Della Pietra |
Computational
Linguistics, 22(1): 39-71. |
hidden Markov models
(state emission) |
Fundamentals
of Speech Recognition, Chapter 6. |
L. Rabiner and B.-H. Juang |
Prentice Hall, 1993. |
| Good-Turing estimation |
A comparison of the enhanced Good-Turing and deleted estimation methods
for estimating probabilities of English bigrams |
K. Church and W. Gale |
Computer
Speech and Language 5:19-54. |
| information retrieval |
Modern
Information Retrieval |
R. Baeza-Yates and B. Ribeiro-Neto |
ACM Press, 1999. |
| text summarization |
Automatic
Summarization |
I. Mani |
Benjamins, 2001. |
| phonetics (articulatory and acoustic) |
Acoustic
Phonetics |
K. N. Stevens |
MIT Press, 1998. |
Back to the index
Tentative Course outline
-
Introduction to Corpus-based Linguistics
-
Text Categorisation
-
N-gram Models
-
Markov Models
-
Automatic Speech Recognition
-
Part-of-Speech Tagging
-
Information Retrieval
-
Text Summarisation
-
Statistical Machine Translation
Back to the index
Calendar of important course-related events
| Date |
Event |
| Mon, 7 January |
First lecture |
| Fri, 18 January |
Last day to add course (CSC 2511) |
| Sun, 20 January |
Last day to add course (CSC 401) |
| Mon, 11 February |
Assignment 1 due |
| 18-22 February |
Reading Week - no classes |
| Fri, 29 February |
Last day to drop course (CSC 2511) |
| Sun, 9 March |
Last day to drop course (CSC 401) |
| Mon, 10 March |
Assignment 2 due |
| Mon, 7 April |
Assignment 3 due |
| Fri, 11 April |
Last lecture |
| 21 April - 9 May |
Final exam period |
Back to the index
Evaluation and related policies
There will be three homeworks, and a final exam. The relative weights of
these components towards the final mark are shown in the table below:
| Assignment 1 |
20% |
| Assignment 2 |
20% |
| Assignment 3 |
20% |
| Final |
40% |
Important note on final: A mark of at least a D- on the final
exam is required to pass the course. In other words, if you receive
an F on the final exam you automatically fail the course, regardless of
your performance on homeworks.
Important note on homeworks: No late homeworks will be accepted
except in case of documented medical or other emergencies.
Policy on collaboration: No collaboration on homeworks is permitted.
The work you submit must be your own. Failure to observe this policy
is an academic offense, carrying a penalty ranging from a zero on
the homework to suspension from the university.
Back to the index
Announcements
In this space, you will find announcements related to the course. Please
check this space at least weekly.
-
2 April MATERIAL COVERED IN WEEK 11: Lexical semantics,
word-sense disambiguation. You should read M&S Chp. 7.
-
22 March MATERIAL COVERED IN WEEK 10: articulatory phonetics. You
should read J&M sections 4.1-4.2 and Chp. 7.
-
15 March MATERIAL COVERED IN WEEK 9: interpolation methods for language
modelling, part-of-speech tagging, transformation-based taggers.
-
9 March MATERIAL COVERED IN WEEK 8: Viterbi algorithm, Baum-Welch re-estimation,
Good-Turing smoothing. You should read M&S Chps. 9 and 10.
-
29 February MATERIAL COVERED IN WEEK 7: smoothing, hidden Markov models.
-
13 February MATERIAL COVERED IN WEEK 6: maximum entropy modelling, language
models, n-grams, maximum likelihood estimation, Bayes's rule. You
should read M&S chapters 4 and 6.
-
12 February MATERIAL COVERED IN WEEK 5: k-nearest-neighbours, perceptron
classifiers, Lagrange's method. You should read M&S 2.1-2.2.4.
-
2 February MATERIAL COVERED IN WEEK 4: decision trees. The lecture
on 1st February will be rescheduled.
-
25 January MATERIAL COVERED IN WEEK 3: genre classification, the cosine
method, entropy. You should read M&S Chapter 16, 15.2-15.2.1,
and 8.1.
-
18 January: MATERIAL COVERED IN WEEK 2: more parts of speech, corpus annotation.
Here is our Python tutorial.
-
9 January: MATERIAL COVERED IN WEEK 1: Zipf's law, parts of speech.
You should read M&S Chapter 1, section 3.1, and section 4.3.2.
-
7 January: PREREQUISITES. CSC 207 or 209 or 228, and STA 247 or 255 or
257 and a CGPA of 3.0 or higher or a CSC subject POSt. MAT 223 or 240 is
strongly recommended. Note that the University's automatic registration
system does not check for prerequisites: even if you have registered for
the course, you will not receive credit for it unless you had satisfied
the prerequisite before you registered.
Back to the index
Handouts
In this space you will find on-line postscript versions of course handouts,
including homeworks and solutions (posted after the due date).
To view these handouts you will need access to a postscript previewer.
If your machine does not have the required software, you can allegedly
download
it for free.
Back to the index
Old Exams
Some old midterm and final exams for this course (with no solutions).
Back to the index
Gerald Penn, 2 April,
2008
This web-page was adapted from the web-page for another course,
created by Vassos Hadzilacos.