| Date |
Speaker |
Title (click on title to show/hide abstract) |
|---|
| Sep. 19 |
Naishi Liu |
Computational models of linguistic humor
|
| Oct. 3 |
TBD |
To be determined
|
| Oct. 17 |
TBD |
To be determined
|
| Oct. 31 |
TBD |
To be determined
|
| Nov. 14 |
TBD |
To be determined
|
| Nov. 28 |
TBD |
To be determined
|
| Dec. 12 |
TBD |
To be determined
|
|
|
| Winter 2008 |
| Jan. 18 |
Frank Rudzicz |
Speech Recognition and Computational Linguistics: How to wreck a nice beach whenever a wand Aztecs
Speech and language research is big. Very big. You just won't believe how vastly, hugely, mind- bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to speech and language research! Listen!
And so on...
|
| Jan. 30 |
Rada Mihalcea |
Linking Documents to Encyclopedic Knowledge: Using Wikipedia as a Source of Linguistic Evidence
Note special time and place: 10:30-12:00, Pratt 266
Wikipedia is an online encyclopedia that has grown to become one of the
largest online repositories of encyclopedic knowledge, with millions of
articles available for a large number of languages. In fact, Wikipedia
editions are available for more than 200 languages, with a number of
entries varying from a few pages to more than one million articles per
language.
In this talk, I will describe the use of Wikipedia as a source of
linguistic evidence for natural language processing tasks. In particular,
I will show how this online encyclopedia can be used to achieve
state-of-the-art results on two text processing tasks: automatic keyword
extraction and word sense disambiguation. I will also show how the two
methods can be combined into a system able to automatically enrich a text
with links to encyclopedic knowledge. Given an input document, the system
identifies the important concepts in the text and automatically links
these concepts to the corresponding Wikipedia pages. Evaluations of the
system showed that the automatic annotations are reliable and hardly
distinguishable from manual annotations. Additionally, an evaluation of
the system in an educational environment showed that the availability of
encyclopedic knowledge within easy reach of a learner can improve both the
quality of the knowledge acquired and the time needed to obtain such
knowledge.
This is joint work with Andras Csomai.
|
| Feb. 15 |
Graeme Hirst |
Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then show that optimizing over sentences gives better results than variants of the algorithm that optimize over fixed-length windows.
This talk represents collaborative work between Amber Wilcox-Hearn, Graeme Hirst, and Alexander Budanitsky
|
| Feb. 29 |
Cancelled |
Graduate Visit Day
|
| Mar. 14 |
(Afra Alishahi || Afsaneh Fazly) |
A Probabilistic Incremental Model of Word Learning in the Presence of Referential Uncertainty
We present a probabilistic incremental model of word
learning in children. The model acquires the meaning of words from exposure to
word usages in sentences, paired with appropriate semantic
representations, in the presence of referential uncertainty.
A distinct property of
our model is that it continually revises its learned knowledge of a
word's meaning, but over time converges on the most likely meaning of
the word. Another key feature is that
the model bootstraps its own partial knowledge of word--meaning
associations to help more quickly learn the meanings of novel words.
Results of simulations on naturalistic child-directed data show that our
model exhibits behaviours similar to those observed in the early
lexical acquisition of children, such as vocabulary spurt
and fast mapping.
|
| Mar. 28 |
Chris Parisien |
An Incremental Bayesian Model for Learning Syntactic Categories
Abstract:
I present a method for the unsupervised learning of syntactic categories from text. The method uses an incremental Bayesian clustering algorithm to find groups of words that occur within similar syntactic contexts. The model draws information from the distributional cues of words within an utterance, while explicitly bootstrapping its development on its own partial knowledge of syntactic categories. Using a corpus of child-directed speech, we demonstrate the benefit of a syntactic bootstrap for an incremental categorization model. The model is robust to the noise in real language data, manages lexical ambiguity, and shows learning behaviours similar to what we observe in children.
|
| Apr. 11 |
Tim Fowler |
Navigating the parsing landscape
Abstract:
We will introduce context free grammars (CFGs) and combinatory categorial grammars (CCGs) with a focus on how these formalisms deal with semantics. The known differences between the formalisms will be discussed and the Lambek calculus will be introduced as an ideal comparison point between the two. To do this, we will need to consider the formal language class of natural language. A recent polynomial time parsing result for the Lambek calculus will be introduced and we will discuss possible future research opened up by this result.
|
|
|
| Fall 2007 |
| Sept. 14 |
CL Group |
Fall 2007 Welcoming Meeting
|
| Sept. 28 |
Gerald Penn |
The Quantitative Study of Writing Systems
Abstract:
If you understood all of the world's languages, you would still not be
able to read many of the texts that you find on the world wide web,
because they are written in non-Roman scripts -- often ones that have
been arbitrarily encoded for electronic transmission in the absence of
an accepted standard. This very modern nuisance reflects a dilemma as
ancient as writing itself: the association between a language as it is
spoken and its written form has a sort of internal logic to it that we
can comprehend, but the conventions are different in every individual
case --- even among languages that use the same script, or between
scripts used by the same language. This conventional association
between language and script, called a writing system, is indeed
reminiscent of the Saussurean conception of language itself, a
conventional association of meaning and sound, upon which modern
linguistic theory is based. Despite linguists' reliance upon writing
to present and preserve linguistic data, however, writing systems were
a largely forgotten corner of linguistics until the 1960s, when Gelb
presented their first classification.
This talk will describe recent work that aims to place the study of
writing systems upon a sound computational and statistical foundation.
While archaeological decipherment may eternally remain the holy grail
of this area of research, it also has applications to speech
synthesis, machine translation, and multilingual document retrieval.
|
| Oct. 12 |
Paul Cook |
Pulling their Weight: Exploiting Syntactic Forms for the Automatic
Identification of Idiomatic Expressions in Context
Abstract:
Much work on idioms has focused on type identification, i.e.,
determining whether a sequence of words can form an idiomatic
expression. Since an idiom type often has a literal interpretation as
well, token classification of potential idioms in context is critical
for NLP. We explore the use of informative prior knowledge about the
overall syntactic behaviour of a potentially-idiomatic expression
(type-based knowledge) to determine whether an instance of the
expression is used idiomatically or literally (token-based knowledge).
We develop unsupervised methods for the task, and show that their
performance is comparable to that of standard supervised techniques.
|
| Oct. 26 |
Cancelled |
Cancelled
|
| Nov. 9 |
Graeme Hirst |
Views of Text-Meaning in Computational Linguistics
Abstract:
Three views of text-meaning compete in the philosophy of language:
objective, subjective, and authorial -- "in" the text, or "in" the
reader, or "in" the writer. Computational linguistics has ignored the
competition and implicitly embraced all three, and rightly so; but
different views have predominated at different times and in different
applications. Contemporary applications mostly take the crudest view:
meaning is objectively "in" a text. The more-sophisticated applications
now on the horizon, however, demand the other two views: as the computer
takes on the user's purpose, it must also take on the user's subjective
views; but sometimes, the user's purpose is to determine the author's
intent. Accomplishing this requires, among other things, an ability to
determine what could have been said but wasn't, and hence a sensitivity
to linguistic nuance. It is therefore necessary to develop
computational mechanisms for this sensitivity.
|
| Nov. 23 |
Diana Raffman |
Psychological Hysteresis and the Nontransitivity of Insignificant Differences
Abstract:
Vague words in natural language cause semantic and logical problems in a variety of disciplines. An especially persistent problem has to do with the nontransitivity of insignificant differences. For example, if eating one candy won't make me fat, then eating two won't; but if eating two won't, then eating three won't; and so on. It seems to follow that eating a thousand pieces of candy won't make me fat. This paradoxical result shows that the word 'fat' is vague. Similarly, if Hillary Clinton is a person, then she was a person one second ago; and if she was a person one second ago, then she was a person two seconds ago; etc. It seems to follow that the conceptus from which Hillary Clinton developed was also a person. The word 'person' is vague.
Clearly there is something wrong with this paradoxical form of reasoning, but a satisfactory diagnosis has not been found. In this talk I will propose a diagnosis that appeals to the hysteretical nature of our judgments involving vague words. To that end I will present preliminary results of a psychological study of our use of vague words.
|
| Dec. 7 |
TBD |
TBD
|