CSC 401/2511 -- Natural Language Computing
Winter 2010
Index of this document
Contact information
Instructor: Gerald Penn
-
Office: PT 396B (St. George campus)
-
Office hours: immediately following lectures (normally Mondays and
Fridays) 1-2, or by appointment
-
Tel: 978-7390
-
Email: gpenn@cdf.utoronto.ca
Back to the index
Meeting times
-
Lectures: MF 12-1, BA 1190
-
Tutorials: W 12-1, BA 1190
-
(Exceptions: there will be lectures on MWF, 4/6/8 January - no tutorial
first week;
there will be a lecture on Wednesday, 10 February and a tutorial
on Friday, 12 February;
there will be a tutorial on Monday, 22 February and a lecture
on Wednesday, 24 February;
there will be a lecture on Wednesday, 17 March, and a tutorial on
Friday, 19 March;
there will be a tutorial on Monday, 22 March, and a lecture on Wednesday,
24 March;
there will no lecture or tutorial on Friday, 2 April)
A bulletin
board has also been created for the class, which willi be monitored
by the TAs.
Back to the index
Texts for the Course
Required |
C. Manning &
H.
Schuetze,
Foundations
of Statistical Natural Language Processing, MIT,
1999. |
Errata |
|
for which there is an
on-line edition from MIT CogNet |
|
Optional |
D. Jurafsky
& J. Martin, Speech
and Language Processing, Prentice
Hall, 2nd ed., 2008. |
Errata |
Recommended |
A. Martelli, Python
in a Nutshell, 2nd ed., O'Reilly,
2006. |
Errata |
Optional |
M. Lutz, Learning Python, 3rd
ed., O'Reilly, 2007. |
Errata |
Free! |
various tutorials on the Python website |
|
Supplementary Reading for the Lectures
Topic |
Title |
Author |
Publication Details |
parsing,
phrase structure models |
Statistical
Language Learning |
E. Charniak |
MIT Press, 1993. |
machine learning |
The
Elements of Statistical Learning |
T. Hastie, R. Tibshirani and J. Friedman |
Springer, 2001. |
information theory
(including entropy) |
Elements
of Information Theory |
T. M. Cover and J. A. Thomas |
Wiley & Sons, 1991. |
maximum entropy modelling |
A Maximum Entropy Approach to Natural Language Processing |
A. L. Berger, S. A. Della Pietra and V. J. Della Pietra |
Computational
Linguistics, 22(1): 39-71. |
hidden Markov models
(state emission) |
Fundamentals
of Speech Recognition, Chapter 6. |
L. Rabiner and B.-H. Juang |
Prentice Hall, 1993. |
Good-Turing estimation |
A comparison of the enhanced Good-Turing and deleted estimation methods
for estimating probabilities of English bigrams |
K. Church and W. Gale |
Computer
Speech and Language 5:19-54. |
information retrieval |
Modern
Information Retrieval |
R. Baeza-Yates and B. Ribeiro-Neto |
ACM Press, 1999. |
text summarization |
Automatic
Summarization |
I. Mani |
Benjamins, 2001. |
phonetics (articulatory and acoustic) |
Acoustic
Phonetics |
K. N. Stevens |
MIT Press, 1998. |
Back to the index
Tentative Course outline
-
Introduction to Corpus-based Linguistics
-
Text Categorisation
-
N-gram Models
-
Markov Models
-
Automatic Speech Recognition
-
Part-of-Speech Tagging
-
Information Retrieval
-
Text Summarisation
-
Statistical Machine Translation
Back to the index
Calendar of important course-related events
Date |
Event |
Mon, 4 January |
First lecture |
Fri, 15 January |
Last day to add course (CSC 2511) |
Sun, 10 January |
Last day to add course (CSC 401) |
Fri, 5 February |
Assignment 1 due |
15-19 February |
Reading Week - no classes |
Fri, 26 February |
Last day to drop course (CSC 2511) |
Sun, 7 March |
Last day to drop course (CSC 401) |
Fri, 5 March |
Assignment 2 due |
Mon, 29 March |
Last lecture |
Thu, 1 April |
Assignment 3 due |
7-23 April |
Final exam period |
Back to the index
Evaluation and related policies
There will be three homeworks, and a final exam. The relative weights of
these components towards the final mark are shown in the table below:
Assignment 1 |
20% |
Assignment 2 |
20% |
Assignment 3 |
20% |
Final |
40% |
Important note on final: A mark of at least a D- on the final
exam is required to pass the course. In other words, if you receive
an F on the final exam you automatically fail the course, regardless of
your performance on homeworks.
Important note on homeworks: No late homeworks will be accepted
except in case of documented medical or other emergencies.
Policy on collaboration: No collaboration on homeworks is permitted.
The work you submit must be your own. No student is permitted to
discuss the final exam with any other student until the instructor or TAs
make the solutions publicly available. Failure to observe this policy
is an academic offense, carrying a penalty ranging from a zero on
the homework to suspension from the university.
Back to the index
Announcements
In this space, you will find announcements related to the course. Please
check this space at least weekly.
-
FINAL EXAM: As a reminder, you will not be permitted to take any
exam aids into the final exam. We'll be having a special tutorial
session on Wednesday, 7th April from 1:30 to 3:30 in GB 404. The
purpose of this session is to answer questions you've had in revising for
the final exam.
-
MATERIAL COVERED IN WEEK 12: information retrieval, singular value decomposition.
You should read M&S Chapter 15.
-
MATERIAL COVERED IN WEEK 11: text summarization, naive Bayes classification.
-
MATERIAL COVERED IN WEEK 10: Fourier transforms, spectrograms, vowel classification,
relative entropy, mutual information, the flip-flop algorithm. You should
read M&S sections 7.2 and 8.4.
-
MATERIAL COVERED IN WEEK 9: articulatory phonetics, sound waves, acoustic
phonetics. You should read J&M sections 4.1-4.2 and Chapter 7.
-
MATERIAL COVERED IN WEEK 8: Part-of-speech tagging, tagging with HMMs,
transformation-based tagging, the Brill tagger. You should read M&S
Chapter 10.
-
MATERIAL COVERED IN WEEK 7: forward algorithm, backwards algorithm, Viterbi
decoding, Baum-Welch re-estimation.
-
MATERIAL COVERED IN WEEK 6: smoothing, Markov models, HMMs. You should
read M&S Chapter 9.
-
MATERIAL COVERED IN WEEK 5: iterative scaling, language modelling, n-grams,
maximum likelihood estimation. Bayes's rule. You should read M&S
Chapter 6.
-
MATERIAL COVERED IN WEEK 4: k-nearest neighbours, perceptron learning,
Lagrange's method, maximum entropy modelling. You should read M&S 2.1-2.2.4.
-
MATERIAL COVERED IN WEEK 3: cosine method, entropy, decision trees.
-
MATERIAL COVERED IN WEEK 2: corpus annotation, genre classification, end-of-sentence
boundary detection. You should read M&S Chapter 16, Sections
15.2-15.2.1 and Section 8.1.
-
MATERIAL COVERED IN WEEK 1: Zipf's Law, parts of speech. You should
read M&S Chapter 1, Section 3.1 and Chapter 4.
-
18 December: PREREQUISITES. CSC 207 or 209 or 228, and STA 247 or 255 or
257 and a CGPA of 3.0 or higher or a CSC subject POSt. MAT 223 or 240 is
strongly recommended. Note that the University's automatic registration
system does not check for prerequisites: even if you have registered for
the course, you will not receive credit for it unless you had satisfied
the prerequisite before you registered.
Back to the index
Handouts
In this space you will find on-line PDF versions of course handouts,
including homeworks.
To view these handouts you will need access to a PDF viewer. If your
machine does not have the required software, you can
download
Adobe Acrobat Reader for free.
Back to the index
Old Exams
Some old midterm and final exams for this course (with no solutions).
Back to the index
Gerald Penn, 6 April,
2010
This web-page was adapted from the web-page for another course,
created by Vassos Hadzilacos.