Meetings
If you would like to schedule a meeting, or for more information, please email the meeting organizers at cl-mo followed by @cs.toronto.edu.
Date | Speaker | Title (click on title to show/hide abstract) | Location |
---|---|---|---|
Winter 2019 | |||
Tuesday February 12, 1:30-3pm | Gerald Penn (UofT) | Can Deep Learning Compensate for a Shallow Evaluation?
The last ten years have witnessed an enormous increase in the application of "deep learning" methods to both spoken and textual natural language processing. Have they helped? With respect to some well-defined tasks such as language modelling and acoustic modelling, the answer is most certainly affirmative, but those are mere components of the real applications that are driving the increasing interest in our field. In many of these real applications, the answer is surprisingly that we cannot be certain because of the shambolic evaluation standards that have been commonplace --- long before the deep learning renaissance --- in the communities that specialized in advancing them. This talk will consider three examples in detail: sentiment analysis, text-to-speech synthesis, and summarization. We will discuss empirical grounding, the use of inferential statistics alongside the usual, more engineering-oriented pattern recognition techniques, and the use of machine learning in the process of conducting an evaluation itself.
| PT266 |
Tuesday February 26, 1:30-3pm | Rebecca Knowles (Johns Hopkins University) | Interactive and Adaptive Neural Machine Translation
Machine translation (using software to translate between text in different languages, e.g. Google Translate) has seen improvements in recent years, but certain translation use cases (e.g. law, medicine, marketing) require human-quality translations. Today, such translations are often created by human translators using computer aided translation (CAT) tools that are designed to increase productivity. This talk describes recent and continuing work on neural machine translation with a focus on challenges that relate to human translator interaction with machine translation output. This includes work on the handling of rare words, the incorporation of dictionaries and lexicons into neural machine translation models, and adapting neural machine translation systems to learn from human translators' corrections.
| PT266 |
Tuesday March 12, 1:30-3pm | Krishnapriya V (UofT) | TBD
TBD
| PT266 |
Tuesday March 19, 1:30-3pm | Kawin Ethayarajh (UofT) | TBD
TBD
| PT266 |
Tuesday March 26, 1:30-3pm | Julia Watson (UofT) | TBD
TBD
| PT266 |
Tuesday April 2, 1:30-3pm | Jenny Xie (UofT) | TBD
TBD
| PT266 |
Tuesday April 9, 1:30-3pm | Safwan Hossain (UofT) | TBD
TBD
| PT266 |
Tuesday April 16, 1:30-3pm | Bai Li (UofT) | TBD
TBD
| PT266 |
Fall 2018 | |||
Tuesday November 27, 1:30-3pm | Renato Ferreira | Children's overextension and the creation of chain complexes
Young children often stretch terms to novel objects when they lack the proper adult words—a phenomenon known as overextension. Psychologists have proposed that overextension relies on the formation of a chain complex, such that new objects may be linked to existing referents of a word depending on shared attributes including taxonomic similarities (e.g., ‘dog’ overextended to squirrels), visual analogies (e.g., ‘ball’ overextended to balloons), and predicate-based relations (e.g., ‘key’ overextended to doors). We build on these ideas by proposing a computational framework that creates chain complexes via multimodal fusion of resources from linguistics, psychological experiments, and deep learning algorithms. We test our models in a communicative scenario that simulates linguistic production and comprehension between a child and a caretaker. Our preliminary results show that the multimodal feature space accounts for substantial variation in children’s overextension reported from the literature. This work provides a formal approach toward characterizing linguistic creativity in early childhood.
| PT266 |
Tuesday November 13, 1:30-3pm | Eric Corlett | Probability and Program Complexity for NLP
Probabilistic models used in NLP often come from general frameworks into which otherwise difficult-to-define tasks can be embedded. The power of these frameworks can lead to situations in which traditional measures of descriptive complexity, such as worst-case running time, can overestimate the cost of running our algorithms. In this talk I look at how practical and theoretical complexities can differ by investigating the Most Probable Sentence problem, which was shown to be NP-complete by Khalil Sima'an in 2002. I show that linguistic entropy can be used to formulate a more natural bound for the running time of this problem, as well as its error of approximation.
| PT266 |
Tuesday October 30, 1:30-3pm | Demetres Kostas | Learning the Brain Rhythms of Speech
In this talk, I present work that uses deep neural networks trained with raw MEG data to predict the age of children performing a verb-generation task, a monosyllable speech-elicitation task, and a multi-syllabic speech-elicitation task. I argue that the network makes these predictions on the grounds of differences in speech development. Previous work has explored using neural networks to classify encephalographic recordings with some success, but they do little to acknowledge the structure of these data, typically relying on some popular contemporary architecture designed for a vaguely related application. Previous such approaches also typically require extensive feature engineering to succeed. I will show that configuring a neural network to mimic the common manual pipeline employed for brain-computer interface classifiers allows them to be trained with raw magnetoencephalography (MEG) and electroencephalography (EEG) recordings and achieve state-of-the-art accuracies with no hand engineered features.
| PT266 |
Tuesday October 16, 1:30-3pm | Nona Naderi | Computational analysis of arguments and persuasive strategies in
political discourse
Various strategies are used by politicians to persuade their audience. In this talk, I focus on analyzing framing and face-saving strategies. I examine generic and issue-specific frames using various models and show that recurrent neural net models can capture representations of generic frames more effectively than classifiers trained with topics. I also show that frames are transferable across genres. I further examine face-saving strategies in the parliament and show that the features that capture the interaction between pairs of reputation threat and reputation defence are effective in the classification of these strategies. Finally, I present our first steps in automating the evaluation of arguments.
| PT266 |
Tuesday October 2, 1:30-3pm | Ella Rabinovich | A Computational Approach to the Study of Bilingualism
The goal of this talk is to propose and evaluate an approach for bridging the gap between two related areas of research on bilingualism: translation studies and second language acquisition. I investigate the characteristics of language production that is influenced by the existence of another linguistic system -- language that is produced by a variety of multilinguals, including learners, advanced non-native speakers and translators. I ask whether these language varieties are subject to unified principles, governed by phenomena that stem from the co-existence of multiple linguistic systems in a bilingual brain. By applying a range of computational methodologies, I highlight factors that account for the commonalities and the distinctions between various crosslingual languages varieties. Major features of bilingualism, including grammatical, cognitive, and social aspects, have been extensively studied by scholars for over half a century. Crucially, much of this research has been conducted with small, carefully-curated datasets or in a laboratory experimental setup. I will show that the availability of large and diverse datasets of productions of non-native speakers stimulates new opportunities for pursuing the emerging direction of computational investigation of bilingualism, thereby tying empirical results with well-established theoretical foundations.
| PT266 |
Spring 2018 | |||
Tuesday April 24, 1:30-3pm | Gagandeep Singh | Statistical Parametric Speech Synthesis with focus on LDM based TTS
Over the past few years statistical models based speech synthesis, often called statistical parametric speech synthesis (SPSS), has gained popularity over the exemplar-based speech synthesis. In this talk, I will first present the general idea about the HMM-based speech synthesis (HTS) which has historically been the most popular SPSS model. I will then introduce the linear dynamical models (LDMs) based speech synthesis, which overcomes some of the challenges faced by HTS. After some general discussion about LDMs, I will discuss about their usage in speech synthesis. This will be followed by a brief discussion about neural-network-based speech synthesis, I will finally relate and compare the LDM-TTS to neural speech synthesis.
| PT266 |
Tuesday April 10, 1:30-3pm | Parinaz Sobhani, Georgian Partners | Stance Detection and Analysis in Social Media Computational approaches to opinion mining have mostly focused on polarity detection of product reviews by classifying the given text as positive, negative or neutral. While, there is less effort in the direction of socio-political opinion mining to determine favourability towards given targets of interest, particularly for social media data like news comments and tweets. In this talk, we explore the task of automatically determining from the text whether the author of the text is in favour of, against, or neutral towards a proposition or target. This talk is organized into three main parts: the first part on Twitter stance detection and interaction of stance and sentiment labels, the second part on detecting stance and the reasons behind it in online news comments, and the third part on multi-target stance classification.
| PT266 |
Tuesday March 20, 1:30-3pm | Hector Martinez Alonso, Thomson Reuters | Annotation of Nominal Regular Polysemy In this presentation, I will cover the content of my Phd defense [1] dealing with the human and automatic identification of regular polysemy in noun classes, known 'dot types' in the Generative Lexicon [2]. The main focus of the project is to assess the posibility of identifying the underspecified --or intermediate-- sense for regular polysemy often presented in theory of semantics, whereby a word like "cup" could potentially mean a container, a measure of volume, or both. The thesis presents an assessment of the ability of humans and classification algorithms to identify such readings, as well as a discussion on the annotator bias of three characteristic annotator types, namely crowdsourced, volunteer and expert, and the strengths and limitations they present to establish conclusions based on their annotations. [1] Martínez Alonso, H. Annotation of Regular Polysemy: An empirical assessment of the underspecified sense. PhD Thesis. University of Copenhagen. 2013. [2] Pustejovsky, J. The Generative Lexicon. MIT Press. 1995. Hector Martíiez Alonso is currently a research scientist in natural language processing at Thomson Reuters Labs in Toronto. He earned his PhD in computational linguistics as a joint degree between the UCPH (Denmark) and UPF (Spain) in 2013, and has been a postdoctoral fellow a UCPH and Inria, the French national center for computing. Some of his research interests are lexical semantics, dependency syntax, semi-supervised learning and linguistic annotation. You can find him at http://hectormartinez.github.io/
| PT266 |
Tuesday Feb 27, 1:30-3pm | Saif Mohammad, NRC | The Search for Emotions in Language Emotions are central to human experience and behavior. They are crucial for organizing meaning and reasoning about the world we live in. They are ubiquitous and everyday, yet their secrets remain elusive. In this talk, I will describe our work on the search for emotions in language -- by humans and by machines. I will describe large crowdsourced studies asking people to detect emotions associated with words, phrases, sentences, and tweets. I will flesh out the various ways in which emotions can be represented, challenges in obtaining reliable annotations, and approaches that address these problems. The emotion lexicons thus created, with entries for tens of thousands of English terms, have wide-ranging applications in natural language processing, psychology, social sciences, literary analysis, digital humanities, and data sonification. The human annotations also shed light on compelling research questions involving how we organize meaning, the fine-grained distinctions we make, our shared understanding of the world, and the extent to which differences in gender, age, and personality impact this shared understanding. In the second part of my talk, I will present automatic methods for detecting emotions associated with text. This will include our NRC-Canada system that stood first in three SemEval-2013 and SemEval-2014 sentiment analysis shared task competitions. Next, I will flesh out shared tasks that we have organized 2015 through 2018 that go beyond traditional sentiment classification. These include inferring stance from tweets that may or may not explicitly mention the target of interest and detecting fine-grained emotion intensity. Finally, I will conclude with ongoing work on assessing the degree of inappropriate biases in automatic emotion systems. Acknowledgments: This talk includes joint work with a number of researchers and graduate students, with substantial contributions from Svetlana Kiritchenko and Peter Turney. Bio: Dr. Saif M. Mohammad is Senior Research Scientist at the National Research Council Canada (NRC). He received his Ph.D. in Computer Science from the University of Toronto. Before joining NRC, Saif was a Research Associate at the Institute of Advanced Computer Studies at the University of Maryland, College Park. His research interests are in Computational Linguistics, especially Lexical Semantics, Crowdsourced Human Annotations, Sentiment Analysis, Social Media Analysis, and Information Visualization. He has served as the area chair for Sentiment Analysis in past ACL conferences. Saif is a co-organizer of WASSA (a sentiment analysis workshop) and co-chair of SemEval (the largest shared task platform for NLP tasks). His work on detecting emotions in social media and on generating music from text have garnered media attention, including articles in Time, Slashdot, LiveScience, io9, The Physics arXiv Blog, PC World, and Popular Science. Webpage: http://saifmohammad.com
| PT266 |
Tuesday Feb 13, 1:30-3pm | Yang Xu | Colexification across languages reflects cognitive efficiency Human language relies on a finite lexicon to express a potentially infinite set of ideas. A key result of this tension is that words become polysemous over time: A single word can be extended to express multiple different senses, e.g., face may refer to “body part”, “expression”, or “surface of an object”. Certain patterns of polysemy tend to recur across languages; that is, the same set of senses is labeled by a single word form, despite variations in language genealogy, geography, climate, and culture (Youn et al., PNAS, 2016). We examine the perspective that the cross-linguistic frequency distribution of shared polysemy reflects a drive toward cognitive efficiency. We test our hypothesis using a large database of digitized lexicons from the world's languages. Preliminary results suggest that semantic associativity predicts the frequency with which senses are colexified across languages, and it does so better than other alternative variables we have considered. This outcome is consistent with the view that recurring patterns of colexification arise from a historical process of word sense extension that tends to minimize cognitive effort.
| PT266 |
Fall 2017 | |||
Tuesday Dec 5, 1:30-3pm | Menna El-Assady | Visual Analysis of Verbatim Text Transcripts Verbatim text transcripts capture the rapid exchange of opinions, arguments, and information among participants of a conversation. As a form of communication that is based on social interaction, multiparty conversations are characterized by an incremental development of their content structure. In contrast to highly-edited text data (e.g., literary, scientific, and technical publications), verbatim text transcripts contain non-standard lexical items and syntactic patterns. Thus, analyzing these transcripts automatically introduces multiple challenges. In this talk, I will present approaches developed (in context of the VisArgue project) to enable humanities and social science scholars to get different perspectives on verbatim text data in order to capture strategies of successful rhetoric and argumentation. To analyze why specific discourse patterns occur in a transcript, three main pillars of communication are studied through answering the following questions: (1) What is being said? (2) How is it being said? (3) By whom is it being said? In addition to reporting on visualization techniques for the analysis of conversation dynamics, I will argue for the importance of tuning automatic content analysis models to unique textual characteristics, appearing, for example, in verbatim text transcripts. In particular, I will present a visual analytics framework for the progressive learning of topic modeling parameters. Our human-in-the-loop process simplifies the model tuning task through intuitive user feedback on the relationship between topics and documents. Example case study: http://presidential-debates.dbvis.de/
| PT266 |
Tuesday Nov 28, 1:30-3pm | Serena Jeblee | TBA TBA
| PT266 |
Tuesday Nov 14, 1:30-3pm | Jeff Pinto | Machine Learning Approaches for Analyzing Various Data on Mental Health from a CAMH Project My talk is an early stage review of my project at the Centre for Addiction and Mental Health (CAMH). CAMH’s Neurogenetics team is administering a long-running project to identify positive and negative correlations between specific genes and common psychotropic medications to improve personalized prescriptions. Approximately 7,500 patients have completed the study in its 5 years with over 20,000 pages of clinical notes collected. Biomedical experts on the team produced a Gold Standard Corpora (GSC) by manually annotating a 190-page subset of 63 patients, but this process is too expensive and time consuming to keep pace with the study’s volume of data. My project objective is to evaluate several supervised and semi-supervised NLP approaches for extracting and correlating symptoms, adverse drug events, and medications from the structured and free-text components of patients' electronic health records (EHRs). There are several challenges in this project, beginning with data fragmentation and hygiene issues due to privacy guidelines and operational changes. In addition, the GSC is <1% of the dataset which will make it hard to generalize a model. Finally, each EHR can be several pages long with features occurring throughout the text and occasionally providing conflicting inputs, so individual records must be evaluated in toto. My initial investigation is to compare a Recurrent Neural Network (RNN) with an Attention Mechanism (AM) versus 3 other methods: a semi-supervised Naive Bayesian approach, a bidirectional Long Short-Term Memory RNN, and the open source clinical Text Analysis and Knowledge Extraction System (cTAKES). I speculate that an AM RNN will perform best as it can leverage an RNN's ability to unearth features and use the full document context to evaluate conflicting data, link features temporally and unify discontiguous features. The focus of this talk is to present the current progress of my research, sample data and gather suggestions on alternate approaches.
| PT266 |
Tuesday Oct 24, 1:30-3pm | Gerald Penn | What Does it Mean to Parse with Categorial Grammar? There are many kinds of categorial grammar, but in practice there is very little variation in how categorial grammars are being used in practice by computational linguists. This talk will reassess the several arguments that have been offered in defense of the status quo in light of research on CG membership algorithms and corpora over the last 10 years.
| PT266 |
Tuesday Oct 10, 1:30-3pm | Misha Schwartz | Morphemes in translation: segmentation by grouping This is a "work in progress" talk, so please come to hear about Misha's interesting new approach and share your feedback and ideas.
| PT266 |
Monday July 10, 3:00-4:00 | Xiaodan Zhu | Sequential and Structured LSTM for Natural Language Inference Reasoning and inference are central to human and artificial intelligence. Modeling informal inference in human language is challenging but is a fundamental problem in natural language understanding and many applications. For example, it is interesting to determine whether a sentence entails another sentence (entailment), whether they contradict each other (contradiction), or whether they are not logically related (neutral). In this talk, we present our neural-network-based models for natural language inference, which achieve the state-of-the-art performance on Stanford Natural Language Inference (SNLI) dataset. We first discuss our Enhanced Sequential Inference Model (ESIM), which outperforms the previous models that use more complicated architectures. We further demonstrate that by explicitly considering recursive neural networks to encode syntax, we achieve additional improvement.
| PT266 |
Monday July 10, 11:00-12:00 | Tong Wang | Machine Comprehension by Question Answering and Question Generation Researchers in psychology have long observed that human reading comprehension can be improved by formulating questions about the reading materials before-hand. In this talk, I will present our recent work on analogous effects in machine comprehension. I will first introduce the task of question answering (QA) and related datasets. Then I will describe a sequence-to-sequence model for generating questions based on an article and an entity of interest (i.e., a potential answer). To address the limitations of current evaluation metrics for natural language generation, I will also present some alternative evaluation methods and how these metrics are used to improve generation quality using reinforcement learning. Then I will present our recent work on improving QA performance by training a QA model to *jointly* answer and ask questions through parameter sharing. Finally, I will describe our latest efforts towards autonomous information seeking via question generation conditioned only on documents. The presented studies are conducted in close collaboration with my colleagues at Microsoft Maluuba: Eric Xingdi Yuan, Adam Trischler, Alessandro Sordoni, Philip Bachman, Caglar Gulcehre (research intern, MILA), and Sandeep Subramanian (research intern, MILA), and with our academic adviser Professor Yoshua Bengio.
| PT266 |
Thursday June 22, 10:30-11:30 | Federico Nanni | Topic-based and Cross-lingual Scaling of Political Text Political text scaling aims to linearly order parties and politicians across political dimensions (e.g., left-to-right ideology) based on textual content (e.g., politician speeches or party manifestos). Existing models, such as Wordscores and Wordfish, scale texts based on relative word usage; by doing so, they do not take into consideration topical information and cannot be used for cross-lingual analyses. In my talk, I will present the efforts of the Data and Web Science Group toward developing topic-based and cross-lingual political text scaling approaches. First I will introduce our initial work, TopFish, a multilevel computational method that integrates topic detection and political scaling and shows its applicability for temporal aspect analyses of political campaigns (pre-primary elections, primary elections, and general elections). Next, I will present a new text scaling approach that leverages semantic representations of text and is suitable for cross-lingual political text scaling.
| PT266 |
Tuesday May 23, 1:30-3:30 | Rory Harder | Inferences and Neutrality of the Deliberative 'Ought' This talk provides an (opinionated) introduction to a recent debate in formal semantics about the meaning of deontic modals. This debate centres around the issue of semantic neutrality, which is roughly the requirement that a semantic theory adequately represent the different kinds of judgments that semantically competent speakers could make. First, I introduce the technical framework that many philosophers of language and linguists use when doing formal semantics: the typed lambda calculus (e.g. Heim & Kratzer, 1998). Second, I present Kratzer's (1981) classic semantics of modals, focusing on deontics. Third, I present and motivate, through inference pattern considerations, a more recent proposal that employs expected value (EV) theory (Cariani, 2008; Lassiter, 2011). Fourth, I present neutrality based criticisms, from Carr (2015) and Cariani (2016), of both the EV and Kratzer semantics, and present Carr's semantic proposal, which is a generalization of Kratzer's theory. Finally, I criticize Carr's proposal and suggest another, which is a generalization of EV semantics that employs Buchak's (2013) risk-weighted expected value theory.
| PT266 |
Tuesday May 9, 1:30-3:30 | Nona Naderi | Recognizing reputation defence strategies in critical political exchanges We examine whether and how reputation defence strategies are used in political speeches. The result is a corpus of parliamentary questions and answers that are annotated with reputation defence strategies. We then propose a model based on supervised learning to address the detection of these strategies, and report promising experimental results.
| PT266 |
Tuesday April 25, 1:30-3:30 | Ellen Korcovelos | Studying Neurodegeneration with Automated Linguistic Analysis of Speech Data Background: Recorded changes in the language and speech of aging individuals offer a new means of quantifying neurodegeneration. By analyzing linguistic features such as parts-of-speech, word length, word frequency, and acoustic variables, automated techniques in computational linguistics make it possible to classify groups of differing linguistic ability. Methods: We extract features from audio recordings, and their respective transcripts, of participants recalling the narrative of Cinderella. These features identify significant characteristics for each of four populations: aphasic stroke (ST; N=19), primary progressive aphasia (PPA; N=11), mild cognitive impairment and Alzheimer's disease (MCI/AD; N=9 and N=2, respectively), and healthy elderly controls (CT; N=26). We then use these features to train a machine-learning classifier to correctly distinguish healthy individuals from patients (CT vs. ST+PPA+MCI), MCI/AD patients from ST and PA patients, and controls from each individual patient group (e.g., CT vs. ST). Results: Our decision tree model is able to classify CT versus ST+PPA+MCI with 76.1% accuracy. We classify controls from MCI/AD patients with 89.2% accuracy, controls from PPA with 91.9% accuracy, and controls from stroke patients with 71.1% accuracy. Finally, the MCI/AD patients versus the combined stroke and PPA groups were classified with 80.5% accuracy. Word length and filled pauses were found to serve as prominent features in identifying pathology; however, when comparing controls and the MCI/AD group, acoustic features were selected more often than for the other populations’ feature sets. Conclusions: Binary classification between groups was between 13% and 21% more accurate than baseline values, and 4-way classification was 14.9% better. It appeared that linguistic features yielded better predictions than did the addition of acoustic features. Ongoing work aims to explain these phenomena and further evaluate the possible use of speech to serve as diagnostic
| PT266 |
Tuesday April 11, 1:30-3:30 | Barend Beekhuizen | TBA Abstract
| PT266 |
Tuesday April 4, 1:30-3:30 | Hui Jiang, from York University | A New General Deep Learning Approach for Various Natural Language Processing Problems Abstract The word embedding techniques to represent each discrete word as a dense vector in continuous high-dimension space have achieved huge success in many natural language processing (NLP) tasks. However, most NLP tasks rely on modeling a variable-length sequence of words not just each isolated word. The conventional approach is to formulate these NLP tasks as sequence labelling problems and use conditional random fields (CRF), convolutional neural networks (CNNs) and recurrent neural networks (RNNs) to solve them. In this talk, I will introduce a new, general deep learning approach for almost all NLP tasks, not just limited to sequence labelling problems. The proposed method is built upon a simple but theoretically-proved lossless encoding method, named fixed-size ordinally-forgetting encoding (FOFE), which can almost uniquely encode any variable-length sequence of words into a fixed-size representation. Next, a simple feedforward neural network is used as a universal function approximator to map the fixed-size FOFE codes to different targets in various NLP tasks. This framework is appealing since it is elegant and well founded in theory and meanwhile fairly easy and fast to train in practice. Moreover, it is totally data-driven without any feature engineering, and equally applicable to a wide range of NLP tasks. In this talk, I will introduce our recent work to apply this approach to several important NLP tasks, including word embedding, language modelling, named entity recognition (NER) and mention detection in recent KBP EDL contests, and Pronoun Disambiguation Problems (PDP) in Winograd Schema Challenge. Experimental results have shown that the proposed approach is very effective in all of these examined NLP tasks comparing with other more sophisticated conventional methods, such as CNNs and RNNs. As our future work, we will continue to explore this approach to solve more NLP problems, such as entity linking, semantic parsing, factoid Q/A and so on. Bio Hui Jiang received the B.Eng. and M.Eng. degrees from University of Science and Technology of China (USTC), China and the Ph.D. degree from the University of Tokyo, Japan, all in electrical engineering. Since 2002, he has been working at Department of Electrical Engineering and Computer Science, York University, Toronto, Canada, initially as an assistant professor, then an associate professor and currently a full professor. His current research interests include machine learning, especially deep learning or neural networks, with its applications to speech and audio processing, natural language processing and computer vision. He served as an associate editor for IEEE Trans. on Audio, Speech and Language Processing (T-ASLP) between 2009-2013, and some technical committees for several international conferences. He has recently received the 2016 IEEE SPS Best Paper Award for the pioneer work in applying convolutional neural networks to speech recognition. | PT266 |
Wednesday March 29, 11:00-12:00 | Amaru Cuba Gyllensten (Swedish Institute of Computer Science) | Factorization Machines and Word Embeddings Abstract Word embeddings, such as word2vec or GloVe, have garnered much interest in the past years. Recently, an increasing focus has been put on multimodality and feature interaction, a general theme in machine learning. One field with under-studied connections to word embeddings is recommendation systems. In recommendation systems, factorization machines have proven a very efficient framework. Contrasting the hype in deep neural network hype, they are essentially shallow networks with activation functions specially designed to capture specific types of interaction in the input data. Here I will try to bridge the gap between factorization machines and word embeddings, casting word-prediction as a word-recommendation problem. References: 1. FactorizationMachines: http://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle2010FM. pdf 2. Sound-Word2Vec: Learning Word Representations Grounded in Sounds: https://arxiv.org/abs/1703.01720 3. Discovery of Evolving Semantics through Dynamic Word Embedding Learning: https://arxiv.org/abs/1703.00607 4. Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms: https://arxiv.org/pdf/1607.08810.pdf 5. Field-aware Factorization Machines in a Real-world Online Advertising System: https://arxiv.org/abs/1701.04099 | PT266 |
Monday March 20, 2:00-3:00 | Hamidreza Chinaei | Cognitive computing systems for healthcare Abstract The ability to converse with humans is a common feature of many artificial intelligence products such as IBM Watson and healthcare robots. The datasets generated by such systems are complex, not only in terms of scale, but in the way the data are generated. Interesting aspects of learning in such data, collected through interaction with users, include modelling human intents as well as human decision-making processes. In this talk, I present my past work on learning human intents and human preferences using unsupervised topic modeling techniques and inverse reinforcement learning (a specific imitation learning approach), respectively. Given these learned models, I have introduced end-to-end approaches for the design of decision backends of healthcare systems, which I present in this talk. Further, I describe my current and future research in the design of cognitive computing, human computer interaction, and big data analytics within the context of healthcare, using various principled machine learning algorithms and tools. | PT266 |
Tuesday March 14, 1:30-3:30 | Yevgen Matusevych | Studying cross-linguistic influence with computational models Abstract A bilingual speaker is not a simple sum of two monolingual speakers. Languages interact in the mind, and cross-linguistic influence (CLI) is often, but not always, responsible for various differences between monolingual and bilingual speakers. To give an example, second language learners often make mistakes, many of which are attributed to CLI, but some are not. Using human experimental data alone, it is sometimes difficult to find out whether CLI is responsible for a particular kind of mistake. In this talk, I present two studies which demonstrate how the amount of CLI can be measured in computational cognitive models of language acquisition. The first study focuses on the acquisition of case-marking cues: learners of free word-order languages, such as German, often misinterpret Object-Verb-Subject sentences (e.g., The bear-ACC chases the dog-NOM). I simulate this task using a probabilistic model of bilingual construction learning, and demonstrate that CLI can account for this type of comprehension errors. In the second study, I present ongoing work on simultaneous color term acquisition in two languages using self-organizing maps. Cognitive representations of colors in bilingual speakers differ in various aspects from those of monolinguals, and CLI may explain some of these differences.
| PT266 |
Tuesday Feb 14, 12:00-2:00 | William Tunstall-Pedoe | (ML crosslist) Discussion on Evi's pre-acquisition technology on question-answering Abstract William founded Evi which got acquired by Amazon, and its technology is now an integral part of Amazon's Alexa. William will be discussing Evi's pre-acquisition technology on question-answering and give a rough overview of the space. | Med. Sci. building, room 2170 |
Tuesday Feb 7, 1:30-3:30 | Ofer Shai | Unlocking the World's Scientific Insights Abstract Meta is an artificial intelligence company specializing in big data analysis of scientific and technical literature. By applying machine learning and natural language processing methods to the entire corpus of scientific and technical literature, Meta organizes, forecasts, and reasons over scientific and technical discovery at speed and scale. I will present an overview of our company, and products we've built to provide publishers, government, and others in industry with tools to accelerate their research activities. I will describe our system for predicting paper impact, before it is published. This predictive capabilities are at the heart of our Bibliometrics Intelligence service, which, is a tool that we make available to publishers that provides them with early feedback about submitted manuscripts. Other than predicting the impact, the service highlights key concepts to the editor, provides feedback as to the fit to the submitted journal, and suggests potential reviewers. The Bibliometrics Intelligence service is aimed at aiding editors in making quicker, informed decisions, and speed up the publication lifecycle. | PT266 |
Tuesday Jan 24, 1:30-3:30 | Sean Robertson | Lock-step features for Convolutional Neural Networks? Abstract In recent years, the success of Mel-scale filter banks over MFCCs (Mohamed et al., 2012) showed that deep belief networks can achieve better recognition accuracy if feature bases remain correlated. Yet filter banks still average each coefficient over a very large window to wash out short-lived events and homogenize the rate of change of the individual coefficients of feature vectors. This is beneficial if all coefficients are to be treated the same way, but the loss of transient information will limit a speech recognizers effectiveness at characterizing some speech events, such as plosives. The next great improvement to features for speech recognition may not lie in finding new features, but in how they are fed into the recognition system. My current project aims to exploit the structure of Convolutional Neural Networks (CNNs) to allow different parts of the input space to be processed according to their implicit rate of change. Convolutions can only slow or retain the rate of change. By introducing faster-moving features in earlier layers and slower-moving features in later layers, the CNN may better model the temporal aspect of speech. My talk will be split into the two parts. First I will discuss the time-frequency tradeoff at an intuitive level, which should explain what it means for feature coefficients to move at different rates. Next, I will discuss the CNN architecture that I am building and some of the ideas I am building off of. | PT266 |
Fall 2016 | |||
Friday 16 September, 10:30--12:00 | Ted Pedersen, University of Minnesota, Duluth (Joint work with Bridget McInnes, Virginia Commonwealth University) | Improving Relatedness Measurements of Biomedical Concepts by Embedding Second-Order Vectors with Similarity Measurements
(or: eating your own tail is good for you) Abstract Vector space methods that measure semantic similarity and relatedness often rely on distributional information such as co-occurrence frequencies or statistical measures of association to weight the importance of particular co-occurrences. In this work we extend these methods by embedding a measure of semantic similarity based on a human curated taxonomy into a second-order vector representation. This results in a measure of semantic relatedness that combines both the contextual information available in a corpus-based vector space representation with the semantic knowledge found in a biomedical ontology. Our results show that embedding semantic semantic similarity into a second-order co-occurrence matrix improves correlation with human judgments for both similarity and relatedness. | PT266 |
Winter 2016 (Default time: 14:00--15:30 Alternating Thursdays) | |||
Mar. 31st | Stephen Clark | Evaluating Compositional Distributed Semantic Models with RELPRON Abstract In this talk I will describe a new dataset designed to test the capabilities of compositional semantic models based on vector spaces. The dataset, called RELPRON, consists of pairs of terms and properties, such as telescope : device that astronomer uses. The idea is that a good compositional model will produce a vector representation of the property which is close to the vector for the term. I will also survey a number of possible approaches to composing vectors, before describing the methods based on neural networks that I have been investigating. It turns out that, in line with many existing datasets, vector addition provides a very challenging baseline for RELPRON, but we are able to improve on the baseline by finding appropriate training data for modelling the semantics of the relative pronoun. | PT266 |
Fall 2015 | |||
Nov. 5th | Xiaodan Zhu | Several Neural Network
Models for Semantic Composition Abstract Modeling semantic compositionality is a core problem in NLP. In this talk, I will describe several neural-net models we developed recently, including a framework that considers both compositional and non-compositional property in semantic composition. I will discuss two specific networks that extend Long Short Term Mermory (LSTM) to tree and DAG (directed acyclic graph) structures. Bio (http://www.xiaodanzhu.com/about.html) Xiaodan Zhu is a research scientist of National Research Council Canada in Ottawa, Canada. His research interests include natural language processing, spoken document understanding, and machine learning. Xiaodan received his Ph.D. from the Department of Computer Science of the University of Toronto in 2010 and Masters of Engineering from the Department of Computer Science and Technology of Tsinghua University, Beijing in 2000. | PT266 |
Oct. 15th | Frank Rudzicz, Stefania Raimondo, and Jorge Gomez Garcia | Practice talks, etc. Abstract N/A | PT266 |
Sep. 23rd | G Wu and his colleagues (Maluuba, Inc) | Advanced Conversational Systems with Deep Learning Abstract Conversational Systems provide a natural way for human interaction. One conversational system is usually composed by spoken language understanding, state tracking, dialogue manager and natural language generation subsystems. The current methods are usually limited by small number of domains it can support and the inflexible dialogue manager. In addition, it’s difficult to make use of the user feedback to improve the system. In this talk, we will talk about the practice of applying deep learning algorithms to solve some of these problems in Maluuba and also list the main challenges for this problem. | PT266 |
Winter 2015 | |||
Feb. 11th | Shalom Lappin | Predicting Grammaticality Judgements with Enriched Language Models
(Joint work with Jey Han Lau and Alexander Clark) Abstract I present recent experimental work on unsupervised language models trained on large corpora. We apply scoring functions to the probability distributions that the models generate for a corpus of test sentences. The functions discount the role of sentence length and word frequency, while highlighting other properties, in determining a grammaticality score for a sentence. The test sentences are annotated by Amazon Mechanical Turk crowd sourcing. Some of the models and scoring functions produce encouraging Pearson correlations with the mean human judgements. I also describe current work on other corpus domains, cross domain training and testing, and grammaticality prediction in other languages. Our results provide experimental support for the view that syntactic knowledge is represented as a probabilistic system, rather than as a classical formal grammar. | PT266 |
Feb. 18th | Hamidreza Chinaei | Personalized Question
Answering through User Topic Models
(Joint work with Luc Lamontagne, Francois Laviolette, and Richard Khoury) Abstract In this talk, I introduce the framework that we used for building our personalized Question Answering system. In particular, I describe its personalization functionality in which we have proposed a new probabilistic scoring approach based on the topics of the question and candidate answers. First, a set of topics of interest to the user is learned based on a topic modeling approach such as Latent Dirichlet Allocation. Then, the similarity of questions asked by the user to the candidate answers, returned by the search engine, is estimated by calculating the probability of the candidate answer given the question. This similarity is used to re-rank the answers returned by the search engine. Our preliminary experiments show that the re-ranking highly increases the performance of the Question Answering system calculated based on accuracy and MRR (mean reciprocal rank). In this talk, I also introduce the other research topics that we are currently performing in this domain. | PT266 |
May 4th | William Li | Language Technologies for
Understanding Law, Politics, and Public Policy
Note special time 13h30 Abstract Through their activities, governments generate large amounts of heterogeneous text data, including judicial opinions, congressional and parliamentary bills, and laws and regulations. The public availability of these datasets offers opportunities for computational social scientists to develop novel algorithms and systems and answer research in law, political science, and public policy. In this talk, I will focus on two recent projects in this domain: 1) authorship attribution of unsigned U.S. Supreme Court opinions and 2) unsupervised pattern discovery on the United States Code, in which we use analogies from software engineering to analyze and visualize the U.S. legal code like a large software codebase. Finally, I will discuss ongoing work on developing and applying text reuse methods to find and summarize repeated sections of text in government bills and citizen comments, including a probabilistic extension of existing deterministic text reuse methods inspired by topic modeling approaches. Speaker Bio William Li is a PhD student in computer science at MIT. His dissertation research focuses on natural language processing and data science on open government datasets. He also is interested in accessibility and language-based assistive technologies for people with disabilities; he co-taught a semester-long assistive technology design course in Fall 2014 and helps run the MIT Assistive Technology Club.
| PT266 |
Fall 2014 | |||
Sept. 17th | Krish Perumal | Automatic Sequence Tagging of Argument Components in Persuasive Essays
Note special time 11h30 Abstract Argumentative writing is a skill essential to convince the reader of one's opinion on a particular topic. Argumentative writing support systems aim to improve this skill by providing tailored feedback to writers of argumentation. Automatically identifying a writer's arguments (i.e. argument mining) is the first step towards the success of these systems. Argument mining is a growing field that pertains to the identification of argument components (claims and premises) and their structure in text. Existing corpora and approaches to tackle this problem are limited either in the application domain(s) or the level of argumentation granularity during mining. This talk describes an attempt to apply a supervised sequence tagging approach to classify argument components at the token level. This work was completed by the speaker during his research at TU Darmstadt (Germany) under the supervision of Prof. Iryna Gurevych and Christian Stab. | PT266 |
Oct. 1st | Andre Cunha | Coh-Metrix-Dementia:
Automatic Analysis of Language Impairment in Dementia using Natural
Language Processing Tools Abstract Dementia is a high-cost social problem, whose management will be a challenge on the next decades, according to the 2012 World Health Organization report. Traditional dementia diagnosis is based on the analysis of linguistic and cognitive aspects of the patient, many of such analyses being grounded on fluency, naming, and repetition exams. Recent studies, however, have been revealing the prominence of discourse analysis as a more powerful and ecologically valid tool for assessing language performance and impairment. Unfortunately, quantitative manual analysis of speech samples is extremely effort-demanding, which hinders its broad adoption in the clinical practice. In this talk, I'll present my current MSc research, which focuses on the creation of a computational environment called Coh-Metrix-Dementia. It aims at helping health professionals to employ lexical, syntax, semantic, and discourse analysis in dementia diagnosis, by (1) using NLP tools to automatically extract relevant information; and by (2) using Machine Learning classifiers to inform the class of a subject, based on data from his transcribed speech. | PT266 |
Oct. 15th | Muyu Zhang | Triple based Background
Knowledge Ranking for Document Enrichment Abstract Document enrichment is the task of retrieving additional knowledge from external resource over what is available through source document. This task is essential because of the phenomenon that text is generally replete with gaps and ellipses since authors assume a certain amount of background knowledge. The recovery of these gaps is intuitively useful for better understanding of document. In this talk, I will introduce my work on this topic which was published on Coling 2014. We propose a document enrichment framework which automatically extracts “argument1; predicate; argument2” triple from any text corpus as background knowledge, so that to ensure the compatibility with any resource (e.g. news text, ontology, and on-line encyclopedia) and improve the enriching accuracy. We first incorporate source document and background knowledge together into a triple based document-level graph and then propose a global iterative ranking model to propagate relevance score and select the most relevant knowledge triple. | PT266 |
Oct. 29th | Erick Galani Maziero | Rhetorical analysis based
on large amount of data Abstract A text possesses an elaborated structure that relates all of its content, giving it coherence. Several methodologies have been employed in automatic discourse analysis, among them approaches based on lexical patterns and supervised machine learning. These approaches rely on annotated data, which is costly to obtain. The use of unlabelled data, which is cheap and abundant, is possible in semi-supervised learning, but many challenges arise with this approach. In this talk I am going to present an overview of my PhD research and what I am developing here, in the UofT. Basically, my PhD is about the use of never ending (with large amount of data) semi-supervised learning of the discourse analysis, according to Rhetorical Structure Theory (RST). The corpora, feature set, proposed architecture and some challenges of the proposed learning are some details of the talk. Also, I am going to speak about the biggest NLP group in Brazil, which is called NILC. | PT266 |
Nov. 12th | Youness Aliyari | Mean shift type algorithms and their applications
Abstract Mean shift (MS) and subspace constrained mean shift (SCMS) algorithms are non-parametric, iterative methods to find a representation of a high dimensional data set on a principal curve or surface embedded in a high dimensional space. The representation of high dimensional data on a principal curve or surface, the class of mean shift type algorithms and their properties, and applications of these algorithms are the main focus of this talk. I will give a brief review on principal curves and different algorithms to estimate them. Then, I will review the SCMS algorithm and its theoretical properties, as a recently proposed technique to find principal curves/surfaces. Finally, I will present new potential applications of the MS and SCMS algorithm. These applications involve finding straight lines in digital images; pre-processing data before applying locally linear embedding (LLE) and ISOMAP for dimensionality reduction; noisy source vector quantization where the clean data need to be estimated before the quanization step; improving the performance of kernel regression in certain situations; and skeletonization of digitally stored handwritten characters. | PT266 |
Summer 2014 | |||
May 29th | Hal Daume III | The Many Flavors of
Language: Understanding and Adapting Statistical Models
Note special time 11h00 Abstract Language use can vary along many axes, including genre, topic, register and communication medium. Rounded to two decimal points, of all text produced today, 0.00% of it is newswire. Yet most of our statistical models are built based on labeled data drawn from news and related media. These systems fall apart when applied on other types of language, often falling short of the performance of oft maligned "rule-based systems." If we want statistical systems that we can use on the diverse types of language we see today (social media, scientific texts, speech, etc.) we essentially have two choices: annotate new types of data for all relevant tasks or develop better learning technology. We take the second approach because it scales better to the large variety of types of language and large number of interesting tasks. I'll begin this exploration into language flavors by asking the question: when statistical models are applied to new domains, what goes wrong? Despite almost a decade of research in domain adaption, very little effort has gone into answering this question. My goal is to convince you that by taking this analysis problem seriously, we can develop much better hypotheses about how to build better systems. Once we understand the problem, I'll discuss my work that addresses the various aspects of the adaptation problem with applications ranging from simple text categorization through structured prediction and all the way to machine translation. (Along the way I'll also highlight applications of these technologies to other domains like vision and robotics.) This is joint work with a large number of students and collaborators: Arvind Agarwal, Marine Carpuat, Larry Davis, Shobeir Fakhraei, Katharine Henry, Ann Irvine, David Jacobs, Jagadeesh Jagarlamudi, Abhishek Kumar, Daniel Marcu, John Morgan, Dragos Munteanu, Jeff Phillips, Chris Quirk, Rachel Rudinger, Avishek Saha, Abhishek Sharma, Suresh Venkatasubramanian. | PT290C |
Jun. 20 | Pushpak
Bhattacharyya (IIT Bombay) | Multilingual
Projection Note special time 14h00 Abstract Languages of the world, though different, share structures and vocabulary. NLP depends crucially on annotation which, however, is costly. In this presentation we show ways of using multilingual computation and resources. Our first application is word sense disambiguation which is a particularly resource intensive computation. In a completely unsupervised setting, we show how two languages can help each other's WSD, given only linked wordnets of the two languages. The key idea is to use expectation maximization to estimate sense distribution parameters P(S|W) for either language, where 'S' is the sense given word 'W'. From WSD we then move to IR. Lexical resource based query expansion was long replaced by pseudo relevance feedback (PRF) where expansion terms are picked from top K retrieved documents. We propose to collect feedback terms from not one language but multiple languages. This framework of multilingual pseudo relevance feedback (MultiPRF) beats PRF by significant margin for many European languages. We mix pseudo relevance feedback of 'own' language with that of another 'assisting' language to achieve this superior performance. The familial proximity of the assisting language to the query language is an interesting question in its own right. The work reported was done with many PhD, Masters and B.Tech students (Mitesh, Manoj, Karthik, Salil, Saurabh, Anup, Sapan and Piyush) and has been reported at ACL, SIGIR, IJCNLP, EMNLP, LREC and so on. About the speaker Dr. Pushpak Bhattacharyya is a Professor of Computer Science and Engineering at the Indian Institute of Technology Bombay (IITB) where he heads the Center for Natural Language Processing. Dr. Bhattacharyya was educated at IIT Kharagpur (B.Tech), IIT Kanpur (M.Tech) and IIT Bombay (PhD). During PhD, he was a visiting scholar at MIT, Cambridge, USA. Subsequently he has been visiting professor at Stanford University (2004), University of Grenoble (2005, 2009 and 2011) and distinguished lecturer in University of Houston, USA (2012). He has published extensively in top quality conferences and journals (about 200). He has advised 12 PhDs in NLP and ML, and is currently supervising 10 PhD students. He has also advised close to 125 masters students and above 40 bachelor degree students for their research work. The research grants he has got from international and national agencies- government and industries included- have been substantial, with 15 completed and 8 ongoing projects in various areas of machine translation, search, sentiment analysis and text entailment. Prof. Bhattacharyya was the organizing chair of the COLING 2012 at IIT Bombay. He also has been Associate Editor, ACM Transaction on Asian Language Information Processing (TALIP, 2010- till date). He has been PC member and area chair in ACL, COLING, IJCNLP, EMNP, GWC, ICON and so on. Prof. Bhattacharyya has been recipient of a number of prestigious awards and honors: Speaker as expert in multilingual computation in the prestigious Dagstuhl Seminar, Germany (2012), Yahoo Faculty Award (2011), Manthan Award (2009; given by Ministry of IT, India and Digital India Foundation), IIT Bombay’s P. K. Patwardhan Award for Technology Development (2008), IBM Faculty Award (2007), Microsoft Distinguished Research Grant in a focused Area (2007) and United Nations Research Grant (1996). Personal home page URL: http://www.cse.iitb.ac.in/~pb/ | PT266 |
Jul. 17 | Brian McMahan | Towards Real-World
Conversational Competence: A Bayesian Approach to the Meanings of
Color Descriptions Note special time 11h00 Abstract Mapping color descriptions to the physical world is challenging for two reasons: (1) properties of the physical world are difficult to model and (2) language use is the result of pragmatic processes such as context sensitivity, task relevance, and speaker goals. Most models of grounded language learning and grounded language use address the first challenge. We propose a model of color vocabulary, the Lexicon of Uncertain Color Standards (LUX), that addresses the second challenge. LUX is derived using machine learning methods from a large corpus of free text descriptions of color patches. LUX supports future efforts in grounded language understanding and generation by linking 829 English color descriptions probabilistically to context-sensitive regions in HSV color space. We learn LUX with an innovative model of grounded language use. Human semantic representations are vague; they come with uncertainty about the boundaries that delimit label categories in context. Speakers choose words using these representations probabilistically; they provisionally commit to the label being true and simultaneously work to meet the background expectation that they are using language in an ordinary way. Statistical evaluation of our model and two competing models documents the accuracy of LUX and the usefulness of our modeling techniques to ground representations of linguistic meaning in the perceptual domain while respecting context and speaker choice. This work was done in collaboration with Dr. Matthew Stone at Rutgers University. | PT266 |
Aug. 1st | Carlos Ramisch | Multiword expressions?
Who cares...
Abstract Multiword expressions (MWEs) are recurrent and usual word combinations like "big deal", "give up" and "once in a blue moon". These constructions give a hard time for natural language processing applications like parsing, machine translation and information extraction. My talk is about computational methods for dealing with MWEs in texts. First, I will present some basic definitions and examples, in order to illustrate the challenging and pervasive nature of these constructions, and the importance of dealing with them in NLP tasks. Second, I will present some experimental results on the evaluation of the impact of English phrasal verbs (a specific type of MWE) on the quality of machine translation (Ramisch et al., 2013). This research is aimed as a first step towards further investigation about the optimal way to integrate MWE treatment into machine translation. Short Bio Carlos Ramisch is an assistant professor at Aix-Marseille University and Laboratoire d'Informatique Fondamentale de Marseille (France). He holds a double PhD in Computer Science from the University of Grenoble (France) and from the Federal University of Rio Grande do Sul (Brazil). His research interests include multiword expressions acquisition, representation and applications, lexical resources, machine translation, corpus-based statistical methods and machine learning. Carlos was co-organiser of the 2010, 2011 and 2013 editions of the MWE workshop; area chair for MWEs in *SEM 2012; and guest editor of the 2013 special issue on MWEs of the ACM TSLP journal. Carlos also develops and maintains the mwetoolkit framework for automatic MWE acquisition (http://mwetoolkit.sf.net). Additionally, he maintains the website and mailing list of the MWE SIGLEX Section (http://multiword.sf.net). Homepage: http://pageperso.lif.univ-mrs.fr/~carlos.ramisch | BA5256 |
Aug. 8th | Nathan Schneider (Carnegie Mellon University) | Bridging the Gap:
Integrated Development of Linguistic Resources and Analyzers for
NLP Note special time 11h00 Abstract When building datasets and analyzers for NLP, the path from linguistic description to computational implementation need not be disjointed: the goals and methodologies of each can inform one another. This talk presents forays into NLP for new text genres, following a trajectory that tightly integrates linguistic data preparation and computational modeling methodologies. I will discuss analyzers for syntax and semantics built with a process encompassing 1) representation, 2) annotation, and 3) automation. First, I will describe a new framework for broad-coverage lexical semantic analysis, with special attention to multiword expressions (such as "high school" and "go over") in the web reviews domain (Schneider et al., 2014 in LREC and TACL). To facilitate robust and efficient modeling at the token level, the lexical semantic representation has been designed to be compatible with shallow discriminative sequence models, with algorithmic enhancements to accommodate multiword expressions containing gaps. Second, I will touch on efforts to build syntactic datasets, taggers, and dependency parsers for Twitter messages (Gimpel et al., ACL 2011; Owoputi et al., NAACL 2013; Schneider et al., LAW 2013; Kong et al., EMNLP 2014). Finally, if time permits, I will mention parallel efforts to model relational and functionalist semantics. Bio Nathan Schneider recently defended his Ph.D. at Carnegie Mellon University's Language Technologies Institute, advised by Noah Smith. Nathan’s research focuses on linguistic analysis problems in NLP involving syntax and semantics, especially in web genres. His dissertation develops a framework for broad-coverage, token-level computational lexical semantics. As an undergraduate, he studied Computer Science and Linguistics (with an emphasis on cognitive approaches to Semitic morphology) at the University of California, Berkeley. In the fall he will join the University of Edinburgh for a postdoc under the supervision of Mark Steedman. Website: http://nathan.cl | PT266 |
Winter 2014 | |||
Apr. 9 | Hong Yu (Univ. of Massachusetts, Amherst) | Biomedical Natural Language Figure Processing Note special time 13h00--14h00 Abstract To date, most work on biomedical language processing has addressed entity recognition (e.g., identifying gene names in text), information extraction (finding information about very constrained types of relations between entities, e.g., protein-protein interactions), and information retrieval (e.g., retrieving documents from large text collections), while largely ignoring the important knowledge represented in figures. Literature incorporates an approximation of 100 million figures. An intelligent figure search engine will not only assist biocuration and allow individual biomedical researcher to access figures more efficiently from full-text biomedical articles, but also is an important step towards automatic validations of genome-wide high-throughput predictions. In this talk, I will describe innovative biomedical natural language figure processing (BioNLfP) approaches developed in my lab. BioNLfP semantically associates text with figures, ranks figures based on biological importance, summarizes the content of figures, and evaluates new user-interfaces. BioNLfP is funded by both National Institutes of Health and Elsevier, the latter of which allows BioFigureSearch--the implementation of BioNLfP--to access over 2 million full-text biomedical articles. | PT378 |
Fall 2013 | |||
Sep. 18 | Katie Fraser | Automatic text and speech
processing for the detection of dementia (Practice research
proposal)
Abstract Dementia is a gradual decline of cognitive abilities, often resulting from neurodegeneration. In some cases, such as primary progressive aphasia (PPA), language abilities are specifically impaired. In other cases, such as Alzheimer’s disease, language disabilities may occur together with other cognitive impairments. In each of these instances, a narrative language sample can provide a wealth of data regarding an individual’s linguistic capabilities. Traditionally, analysis of speech samples was conducted by hand, but this is painstaking and time-consuming work. We showed that many lexical and syntactic features can be automatically extracted from speech transcripts and used in machine learning classifiers to distinguish between PPA and controls, and between subtypes of PPA. The features which were significant between groups were found to be well-supported by previous studies of manually analyzed transcripts. One surprising finding was that syntactic features did not distinguish between the fluent and nonfluent subtypes of PPA. In a follow-up study, we extracted acoustic features directly from the audio files of the spoken narratives. Classifiers trained on acoustic features performed worse than those trained on text features in general, while classifiers trained on a combination of text and acoustic features performed better than using text or acoustic features alone. We also examined the possibility of using automatic speech recognition (ASR) to fully automate the analysis process. Although the word error rate was high, we were still able to achieve relatively good classification accuracies. However, the introduction of an ASR component into our analysis pipeline opens up a number of research questions. In this proposal, I discuss some of these questions and outline a potential program for investigating their solutions. In particular, I discuss some challenges to measuring syntactic complexity of speech (including the problems of sentence boundary insertion and dysfluency removal), as well as how we might analyze the ASR errors and improve the recognition accuracy. | PT266 |
Sep. 18 | Abdel-Rahman Mohamed | Deep Neural Network acoustic models for ASR Abstract This thesis describes new acoustic models based on Deep Neural Networks (DNN) that have begun to replace GMMs. For ASR, the deep structure of a DNN as well as its distributed representations allow for better generalization of learned features to new situations, even when only small amounts of training data are available. Different input feature representations are analyzed to determine which one is more suitable for DNN acoustic models. Mel-frequency cepstral coefficients (MFCC) are infe- rior to log mel filter bank coefficients (FBANK) which help DNN models marginalize out speaker-specific information while focusing on discriminant phonetic features. Another deep acoustic model based on Convolutional Neural Networks (CNN) is also proposed. Rather than using fully connected hidden layers as in a DNN, a CNN uses a pair of convolutional and pooling layers as building blocks. The convolution operation scans the frequency axis using a learned local spectro-temporal filter while in the pooling layer a maximum operation is applied to the learned features utilizing the smoothness of the input FBANK features to eliminate speaker variations expressed as shifts along the frequency axis in a way similar to vocal tract length normalization (VTLN) techniques. We show that the proposed DNN and CNN acoustic models achieve significant im- provements over GMMs on various small and large vocabulary tasks. | BA5256 |
Oct. 2 | Varada Kolhatkar | Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns Abstract Interpreting anaphoric shell nouns (ASNs) such as "this issue" and "this fact" is essential to understanding virtually any substantial natural language text. One obstacle in developing methods for automatically interpreting ASNs is the lack of annotated data. We tackle this challenge by exploiting cataphoric shell nouns (CSNs) whose construction makes them particularly easy to interpret (e.g., "the fact that X"). We propose an approach that uses automatically extracted antecedents of CSNs as training data to interpret ASNs. We achieve precisions in the range of 0.35 (baseline = 0.21) to 0.72 (baseline = 0.44), depending upon the shell noun. | BA5256 |
Oct. 9 | Julian Brooke | Hybrid Models for Lexical Acquisition of Correlated Styles Abstract Automated lexicon acquisition from corpora represents one way that large datasets can be leveraged to provide resources for a variety of NLP tasks. Our work applies techniques popularized in sentiment lexicon acquisition and topic modeling to the broader task of creating a stylistic lexicon. A novel aspect of our approach is a focus on multiple related styles, first extracting initial independent estimates of style based on co-occurrence with seeds in a large corpus, and then refining those estimates based on the relationship between styles. We compare various promising implementation options, including vector space, Bayesian, and graph-based representations, and conclude that a hybrid approach is indeed warranted. | BA5256 |
Oct. 16 | Graeme Hirst | Finding positions in parliamentary text Abstract A goal of research on the computer processing of language is to interpret opinionated texts, such as parliamentary debates, to determine not just the position that the speaker or writer takes on a particular issue but also their reasons for doing so, the arguments they adduce, and the ideological framework within which their position is formed. I will discuss the long-term goals of this research, present some of our current results and their limitations, and describe how new methods in language processing may lead to significant improvements. The talk includes joint work with Yaroslav Riabinin, Jory Graham, Magali Boizot-Roche, Colin Morris, and Christopher Cochrane. | BA5256 |
Oct. 30 | Nona Naderi | Automated Extraction of Protein Mutation Impacts from the Biomedical
Literature Note special time 11h00 Abstract Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually reading through the rich and fast growing repository of biomedical literature is expensive and time-consuming. A number of manually curated databases, such as BRENDA (http://www.brenda-enzymes.org), try to index and provide this information; yet the provided data seems to be incomplete. Thus, there is a growing need for automated approaches to extract this information. In this work, we present a system to automatically extract and summarize impact information from protein mutations. Our system extraction module is split into subtasks: organism analysis, mutation detection, protein property extraction and impact analysis. Organisms, as sources of proteins, are required to be extracted to help disambiguation of genes and proteins. Thus, our system extracts organisms through a hybrid rule- based/machine learning approach and grounds them to NCBI. We detect mutation series to correctly ground our detected impacts. Our system also extracts the affected protein properties as well as the magnitude of the effects. The output of our system is populated to an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on both external and internal corpora and databases. The results show the reliability of the approaches. The Open Mutation Miner (OMM) system, including its components and resources, is published under the GNU Affero General Public License v3 (AGPL3) at http://www.semanticsoftware.info/open-mutation-miner. | BA5256 |
Nov. 15 | Haiying Li | A New Measure of Text Formality: An Analysis of Discourse of Mao Zedong Abstract Formality has long been of interest in the study of discourse, with periodic discussions of the best measure of formality and the relationship between formality and text categories. In this research, we explored what features predict formality as humans perceive the constructs. The corpus consisted of 1158 discourse samples published in the Collected/Selected Works of Mao Zedong, which were classified into the following categories: conversations, speeches, letters, comments, published articles, telegrams and official documents. Two formality models were constructed to measure formality: (1) 5 out of 7 factors extracted from 73 linguistic, discourse and psychological features through a principle components analysis and (2) 5 out of 8 word classes. Comparisons were made between component measures, word class measures, as well as other measures, when predicting human formality rating. Model-based formality scores had much higher correlations with human-perceived scores than traditional metrics of formality. The variations among text categories could be explained by the component scores of formality more than other metrics of formality. Keywords: formality, measure, text categories | BA5256 |
Nov. 27 | Karen Livescu | Learning from Speech Production for Improved Recognition Abstract Ideas from speech production research have motivated several lines of work in the speech recognition research community. Unfortunately, our understanding of speech articulation is still quite limited, and articulatory measurement data is scarce. How can we take advantage of the potential usefulness of speech production, without relying too much on noisy information? This talk will cover recent work exploring this area, with the theme of using machine learning ideas to automatically infer information where our knowledge and data are lacking. The talk will describe new techniques for deriving improved acoustic features using articulatory data in a multi-view learning setting. The techniques here are based on canonical correlation analysis and its nonlinear extensions, including our recently introduced extension using deep neural networks. Time permitting, the talk will also cover recent work using no articulatory data at all, but treating articulatory information as hidden variables in models for lexical access and spoken term detection. | BA5256 |
Dec. 18 | Pascal Poupart | Exact Bayesian Learning for Latent Dirichlet Allocation by Moment Matching Abstract Latent Dirichlet Allocation (LDA) is one of the most important models for the unsupervised discovery of latent components such as topics, entities and relations based on unlabeled corpora. The estimation of LDA parameters is challenging since maximum likelihood training leads to a non-convex optimization problem and Bayesian learning yields a posterior that consists of a mixture of exponentially many Dirichlet components. As a result, practitioners typically resort to approximations based on Gibbs sampling, variational Bayes or expectation propagation. While these approximations often work well in practice, it is not clear how to detect convergence in Gibbs sampling and local optima may trap both variational Bayes and expectation propagation. Hence, there is a need for more robust techniques. In this talk, I will describe a moment matching technique that performs exact Bayesian learning with respect to a special prior. The approach starts with a partially defined prior where only the first, second and third order moments are specified. The remaining moments of the prior are selected gradually on a need basis as the data of the corpus is processed. This lazy approach allows us to select moments that ensure tractability of the computation. As a result, the algorithm scales linearly with the amount of data and performs exact Bayesian learning with respect to this incrementally specified prior. The approach will be demonstrated on a topic modeling task with social media data. This is joint work with Farheen Omar and Han Zhao Biography Pascal Poupart is an Associate Professor in the David R. Cheriton School of Computer Science at the University of Waterloo, Waterloo (Canada). He received the B.Sc. in Mathematics and Computer Science at McGill University, Montreal (Canada) in 1998, the M.Sc. in Computer Science at the University of British Columbia, Vancouver (Canada) in 2000 and the Ph.D. in Computer Science at the University of Toronto, Toronto (Canada) in 2005. His research focuses on the development of algorithms for reasoning under uncertainty and machine learning with application to Assistive Technologies and Natural Language Processing. He is most well known for his contributions to the development of approximate scalable algorithms for partially observable Markov decision processes (POMDPs) and their applications in real-world problems, including automated prompting for people with dementia for the task of handwashing and spoken dialog management. Other notable projects that his research team are currently working on include chatbots for automated personalized conversations, a smart walker to assist older people and a prompting system to encourage activity participation in retirement living. Pascal Poupart received the Early Researcher Award, a competitive honor for top Ontario researchers, awarded by the Ontario Ministry of Research and Innovation in 2008. He was also a co-recipient of the Best Paper Award Runner Up at the 2008 Conference on Uncertainty in Artificial Intelligence (UAI) and the IAPR Best Paper Award at the 2007 International Conference on Computer Vision Systems (ICVS). He is the editor for collected works at AI Access (2013 - present). He also served on the editorial board of the Journal of Artificial Intelligence Research (JAIR) (2008 - 2011) and the Journal of Machine Learning Research (JMLR) (2009 - present). His research collaborators include Google, Intel, Kik Interactive, In the Chat, the Alzheimer Association, the UW-Schlegel Research Institute for Aging, Sunnybrook Health Sciences Centre and the Toronto Rehabilitation Institute. | BA5256 |
Summer 2013 | |||
Jul. 10 | Paul Cook |
User-level geolocation prediction in social media
Abstract Geolocation prediction is vital to geospatial applications like localised search and local event detection. Text-based social media geolocation models are often based on full text data, including common words with little geospatial dimension (e.g., "today") potentially hampering prediction and leading to slower and more memory-intensive models. In this talk, we first present methods for finding location indicative words (LIWs) via feature selection. Our results show that an information gain ratio-based approach surpasses other methods at LIW selection, and outperforms state-of-the-art geolocation prediction methods. The identified LIWs also reveal regional language differences, which could potentially be useful for lexicographers. We further formulate notions of prediction confidence and demonstrate that performance is even higher in cases where our model is more confident, striking a trade-off between accuracy and coverage. We then consider the incorporation of other sources of information, including user-declared meta-data, into our model using a stacking approach. We demonstrate that the stacking method substantially improves performance, achieving 49% accuracy on a benchmark dataset. We further evaluate our method on a recent crawl of Twitter data to investigate the impact of temporal factors on model generalisation. Our results suggest that user-declared location metadata is more sensitive to temporal change than the text of Twitter messages. Finally we present a web-based demo of our geolocation system. | |
Jun. 26 | Paul Cook |
Automatic identification of novel word-senses
Abstract Automatic lexical acquisition has been an active area of research in computational linguistics for over 20 years, but the automatic identification of lexical semantic change has only recently received attention. In this talk we first present a non-parametric Bayesian word-sense induction (WSI) method and its evaluation on several SemEval WSI tasks. We then apply this method to identify novel word-senses --- senses present in one corpus, but not another. One impediment to research on lexical semantic change has been a lack of appropriate evaluation resources. In this talk we further present the largest corpus-based dataset of diachronic sense differences to date. In experiments on two different corpus pairs, we show that our method is able to simultaneously identify: (a) types having taken on a novel sense over time, and (b) the token instances of such novel senses. We further show that the performance of this method can be improved through the incorporation of social knowledge about the likely topics of new word-senses. Finally, we present a lexicographer's assessment of our method in the context of updating a dictionary. | |
Winter 2013 | |||
Apr. 24 | Fraser Shein |
iWordQ: Pragmatics of word prediction to assist struggling and emerging readers and writers
Abstract Dr. Shein will present and discuss the word prediction technology used within Quillsoft's iWordQ iPad App that was designed to support struggling and emerging readers and writers. A particular focus will be on the pragmatic aspects that must be considered in delivering a commercial product to meet real-life needs. The current prediction model is based on bigram/unigram statistics derived from a billion-word blog corpus supported by up to 5-ngrams where prediction following function words is highly ambiguous. While relatively simple in concept, practical aspects such as memory management, look-up speed, and accuracy of spelling are very real determiners of usefulness. We also created Canadian, British, and American spelling dictionaries and removed noise and inappropriate words for final use by children. While seemingly simple, this has been the most significant effort. Improvements to the algorithm still remain related to handling poor spelling, punctuation, and contractions among other issues. Suggestions for future research will be discussed. | |
Feb. 19 | Hui Jiang |
Why Deep Neural Network (DNN) Works for Acoustic Modelling in Speech Recognition?
Note special time 10h00--11h30 Abstract Recently, deep neural network (DNN) has been combined with hidden Markov model (HMM) as basic acoustic models to replace the traditional Gaussian mixture model (GMM) in automatic speech recognition (ASR). When output nodes of DNN are expanded from a small number of phonemes into a large number of tied-states of triphone HMMs, it has been reported that the so-called context-dependent DNN/HMM hybrid model has achieved unprecedented performance gain in many challenging ASR tasks, including the well-known Switchboard task. At this point, it is interesting to investigate where the unprecedented gain comes from and how DNN has beaten GMMs in acoustic modelling for ASR. In this talk, I will start to report some experiments that may reveal some clues to answer these questions. Our experimental results suggest that DNN does not necessarily yield better modelling capability than the conventional GMMs for standard speech features but DNN is indeed very powerful in terms of leveraging highly correlated features. Experimental results on several large vocabulary ASR tasks (including Switchboard) have shown that the unprecedented gain of the context-dependent DNN/HMM model can be almost entirely attributed to DNN's input feature vectors that are concatenated from several consecutive speech frames within a relatively long context window. Based on these observations, I will present our recent research work to explore the concatenated features under the traditional GMM/HMM framework, where DNN is only used as a frond-end feature extractor to perform dimensional reduction. Moreover, I will also introduce a new training algorithm, called incoherent training, which attempts to explicitly de-correlate feature vectors in learning of DNN parameters. The proposed incoherent training relies on the idea of directly minimizing coherence of weight matrices of DNN during the normal back-propagation training process. Experimental results on several large-scale ASR tasks have shown that the discriminatively trained GMM/HMMs using feature vectors derived from incoherent training have consistently surpassed the state-of-the-art context-dependent DNN/HMMs in all evaluated cases. | |
Fall 2012 | |||
Sept. 10 | Coco Wang |
Interpreting Intentions of Speech
Note special time 10h00--11h30 Abstract Many scientists (Cohen et al., 1990) have tried to interpret intention through logical inference. However, lacking an effective means of semantic analysis, the results were not very satisfying. Vanderveken (1990) tried to construct a logic of illocutionary force. But he couldn't reveal the semantic implications, because he couldn't explain how different illocutionary forces compose as a whole. This paper's purpose is to reveal the mechanism of interpreting intentions of speech. Firstly, we present a grammar system for extracting the semantic structures of intentions. The grammar system includes a taxonomy of intentions of speech, which is based on Speech Act Theory (Searle, 1969; Grice, 1989) and Searle's philosophy about "social reality" (Searle, 1997), and a set of grammar rules. Then, we give a logic of semantic implication to explain how people understand and respond to the implicit meanings of complex intentions, such as an imperative hidden in a query, and a query embedded in an indirect speech. | |
Oct. 31 | Alistair Kennedy |
Measuring Semantic Relatedness Across Languages
Abstract Measures of Semantic Relatedness are well established in Natural Language Processing. Their purpose is to determine the degree of relatedness between two words, without specifying the nature of their relationship. One method of accomplishing this is to use a words distribution to determine its meaning. Distributional measures of semantic relatedness represent words as weighted vectors of the contexts in which that word appears. The relatedness between two words is determined by their vector distance. One limitation of distributional measures is that they are successful only between pairs of words in a single language, as contexts between two languages are not usually comparable. In this presentation I will describe a novel method of measuring semantic relatedness between pairs of words in two different languages, using distributional relatedness. This new cross-language measure uses paris of known translations to create a mapping between between distributional representations in two languages. I evaluated this new measure on two data sets. For the first I constructed a data set of cross-language word pairs, with similarity scores, from French and English versions of Rubenstein & Goodenough's data set. My cross-language measure was evaluated based on how closely it correlated to human assigned scores. The second evaluation was to use the cross-language measures to select the correct translation of a word from a set of two candidates. I found that the new cross-language measure outperformed a unilingual baseline on both experiments. | |
Nov. 14 | Barend Beekhuizen |
Learning relational meanings from situated caregiver-child
interaction: A computational approach
Abstract The difficulty of learning the relational meanings of words like verbs and prepositions has long been acknowledged (Gentner 1978; Gleitman 1990). This acquisition problem has been explored using human subjects (Hirsh-Pasek & Golinkoff 2006 and papers therein) and computational experiments (Siskind 1996, Alishahi & Stevenson 2008), and substantive progress has been made in understanding the acquisition of relational meaning. However, the nature of the available relational meaning in both approaches is to some extent artificial: in lab settings, the noise and variation is controlled and limited, while computational models often do not take the actual situational context into account (exceptions being Fleischman & Roy 2005 and Frank et al. 2009). In this talk, we discuss the acquisition problem using situational contexts from natural, interactional data and computational modeling techniques. We investigate the sources of the learning difficulty and discuss information that is known to affect the process. We believe this combination of situational data and computational techniques presents an important methodological direction for the (cognitive) linguistic enterprise, as we can approximate the source of the meaning closely. On the basis of natural data (video recordings of caregiver-child dyads playing a game), we first present the magnitude of the problem. In learning the mapping between a linguistic item L and a meaning M that is grounded in a part of a situation, the learner faces three (related) subproblems. First, it may be that in the situation co-occurring with L, the meaning M is absent. Second, it may be that in a lot of situations not co-occurring with L, the meaning M is present. Finally, we often find other aspects of the situation, relating to other, irrelevant, meanings to be systematically present in the situations co-occurring with L. Next, we describe a computational model of cross-situational word learning (Fazly et al. 2010), which has been shown to perform well on natural language data with synthetic meanings. Using the natural, situated data, we find that when the model's only source of information is the set of situations holding at the moment of speech, it will learn little about both the meanings of nouns and verbs. However, the child has more sources of information at its disposal and we discuss the effects of these. Here we consider the child's insight in typical social interactions (i.c., gamerelated intentions, Tomasello 2001), the emergent distributional knowledge of word classes (Fazly & Alishahi 2010; Mintz 2003) and the selective attention to different aspects of the perceived situation (Alishahi et al. 2012, Nematzadeh et al. 2012). Combining these, we arrive at a usage-based computational learner that uses cues from different domains, in line with the approach suggested by Hollich et al. (2000). Taking a computational modeling approach and using natural linguistic and situational data, we can show the extent to which each cue plays a role in learning different sorts of meaning (referring to objects, their properties, static relations and behavioural actions), thus extending our understanding of the driving factors behind the acquisition of word meanings. | |
Nov. 21 | Aditya Bhargava |
Leveraging supplemental transcriptions and transliterations via re-ranking
Note special time 11h00--12h00 Abstract Grapheme-to-phoneme conversion (G2P) and machine transliteration are important tasks in natural language processing. Supplemental data can often help resolve difficult ambiguities: existing transliterations of the same word can help choose among a G2P system's candidate output transcriptions; similarly, transliterations from other languages can help choose among candidate transliterations in a given language. Transcriptions can be leveraged in this way as well. In this thesis, I investigate the problem of applying supplemental data to improve G2P and machine transliteration results. I present a unified method for leveraging related transliteration or transcription data to improve the performance of a base G2P or machine transliteration system. My approach constructs features with the supplemental data, which are then used in an SVM re-ranker. This re-ranking approach is shown to work across multiple base systems and achieves error reductions ranging from 8% to 43% over state-of-the-art base systems in cases where supplemental data are available. | |
Nov. 28 | Rouzbeh Farahmand |
Flexible Structural Analysis of Near-Meet-Semilattices for Typed Unication-based Grammar Design
Note special time 11h00--12h00 Abstract We present a new method for directly working with typed unification grammars in which type unification is not well-defined. This is often the case, as large-scale HPSG grammars now usually have type systems for which many pairs do not have least upper bounds. Our method yields a unification algorithm that compiles quickly and yet is nearly as fast during parsing as one that requires least upper bounds. The method also provides a natural naming convention for unification results in cases where no user-defined type exists. | |
Dec. 5 | Frank Rudzicz |
Communicating with Machines: An Introduction to SPOClab
Abstract In this talk I introduce SPOClab (Signal Processing and Oral Communication), which bridges Computer Science at the University of Toronto with the Toronto Rehabilitation Institute. The goal of our lab is to produce software that helps to overcome challenges of communication including speech and language disorders. This will be organized into two co-dependent streams of research. First, we will embed control-theoretic models of speech production into augmented ASR systems using various machine-learning techniques. Second, these systems will be deployed in software that can be used in practice; this involves adjacent disciplines such as human-computer interaction and general natural language processing to design and study application interfaces for disabled users. | |
Winter 2012 | |||
Jan. 13 | Julia Hirschberg |
Entrainment in Prosody, Turn-taking, and Social Behaviors
Note special time: 9h30--11h00 Abstract When people speak together, they often adapt aspects of their speaking style based upon the style of their conversational partner. This phenomena goes by many names, including adaptation, alignment, and entrainment, inter alia. In this talk, I will describe experiments in prosodic entrainment in the Columbia Games Corpus, a larger corpus of speech recorded from subjects playing a series of computer games. I will discuss how prosodic entrainment is related to turn-taking behaviors, to several measures of task and dialogue success, and to perceived social behaviors. This is joint work with Stefan Benus, Agustín Gravano, Ani Nenkova, Rivka Levitan, and Laura Willson. | |
Jan. 18 | Heike Zinsmeister |
Towards Gold Corpora for Abstract Anaphora Resolution
Abstract Abstract anaphora refer to anaphoric elements, such as that or this issue, that refer to abstract referents such as facts or events. The antecedents of abstract anaphors are often realised as verbal or clausal categories as in example (1) adapted from Byron (2002), which poses problems for the automatic resolution of the anaphoric relation.
(1) Each Fall, penguins migrate to Fiji. [That]'s why I'm going there next month
The resolution problem can be split into three subtasks: (i) deciding whether an anaphoric element refers to an abstract or a concrete referent, (ii) identifying the antecedent string, (iii) inducing the abstract referent. When creating a gold standard in this domain, it is easy for human annotators to agree on the first task. It is much harder to get reliable data with respect to the other two tasks. I will present a survey on annotation projects and discuss how they approach this challenge. Furthermore, I will outline ongoing work on cross-linguistic annotation of abstract anaphora in a parallel corpus of English and German, that also addresses the question of the reliability of translated texts as a source for feature induction. | |
Feb. 3 | Ciprian Chelba |
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice
Note special time and location: 10h00--12h00 in BA5256 Abstract The talk presents key aspects faced when building language models (LM) for the google.com query stream, and their use for automatic speech recognition (ASR). Distributed LM tools enable us to handle a huge amount of data, and experiment with LMs that are two orders of magnitude larger than usual. An empirical exploration of the problem lead us to re-discovering a less known interaction between Kneser-Ney smoothing and entropy pruning, possible non-stationarity of the query stream, as well as strong dependence on various English locales---USA, Britain and Australia. LM compression techniques allowed us to use one billion n-gram LMs in the 1-st pass of an ASR system built on FST technology, and evaluate empirically whether a two pass system architecture has any loses over one pass. Confidence filtering on logs data provides us with enormous amounts of automatically transcribed speech data of near human transcriber quality, and enables us to train distributed LMs discriminatively. About the speaker Ciprian Chelba is a Research Scientist with Google. His research interests are in statistical modeling for natural language and speech. His recent projects include query language modeling for Google voice search, and indexing, ranking and snippeting for search in spoken content. Prior to Google he spent 6 years in Microsoft Research working with the speech technology group. He graduated from The Johns Hopkins University in 2000. His thesis work was on "Structured Language Modeling"---exploiting syntax for improved language modeling. | |
Feb 10 | Wael Khreich |
Adaptive Techniques for Statistical Machine Translation in Industrial Environment
Note special time and location: 10h00--12h00 in BA5256 Abstract This presentation highlights current results on domain adaptation techniques for statistical machine translation (SMT) systems, conducted during my postdoctoral research at NLP Technologies Inc. Since there are continuous requests for translation of new specific domains with limited amounts of parallel sentences, the adaption of current SMT systems to these domains would reduce the translation time and efforts. The performance of different domain adaptation techniques such as log-linear models and mixture models have been evaluated in the translation environment of NLP Technologies using legal corpora. Evaluation involved human post-editing effort and time as well as automated scoring techniques (BLEU scores). Results have shown that the domain adaptation techniques can yield a significant increase in BLEU score (up to four points) and a reduction in post-editing time of about one second per word. Future work involves the dynamic integration of post-editors feedback into the SMT system. Biography Postdoctoral Industrial R&D Fellow at NLP Technologies Inc., Montreal, Canada. PhD in Engineering from École de technologie supérieure, Montreal, Canada. Conducting research on adaptive methods for statistical machine translation. Other research interests include on-line and incremental learning, multiple classifier systems, decision fusion and novelty detection. | |
May 8 | Manfred Stede |
CANCELLED The structure of argument: Manual (and automatic) text annotation
Abstract While certain aspects of text coherence apply to almost any text, others are specific to the particular _text_type_ or _discourse_mode_ (descriptive, narrative, expository, instructive, argumentative). I describe an approach toward representing the "deep" structure of argumentative texts, which is inspired by Freeman (1991) but adds a number of modifications. After presenting some initial results on manual annotation, I will sketch our ongoing work aiming at automating this annotation process, i.e. the notion of "argument mining".
| |
May 31 | Eduard Hovy |
NLP: Its Past and 3½ Possible Futures
Note special time: 10h30--12h00 Abstract Natural Language text and speech processing (Computational Linguistics) is just over 50 years old, and is still continuously evolving — not only in its technical subject matter, but in the basic questions being asked and the style and methodology being adopted to answer them. As unification followed finite-state technology in the 1980s, statistical processing followed that in the 1990s, and large-scale processing is increasingly being adopted (especially for commercial NLP) in this decade, a new and quite interesting trend is emerging: a split of the field into three somewhat complementary and rather different directions, each with its own goals, evaluation paradigms, and methodology. The resource creators focus on language and the representations required for language processing; the learning researchers focus on algorithms to effect the transformation of representation required in NLP; and the large-scale hackers produce engines that win the NLP competitions. But where the latter two trends have a fairly well-established methodology for research and papers, the first doesn't, and consequently suffers in recognition and funding. In the talk, I describe each trend, provide some examples of the first, and conclude with a few general questions, including: Where is the heart of NLP? What is the nature of the theories developed in each stream (if any)? What kind of work should one choose to do if one is a grad student today?
|
|
August 15 | Mehdi Hafezi Manshadi |
Dealing with quantifier scope ambiguity in deep semantic analysis
Note special time: 10h30--12h00 Abstract Quantifier scope ambiguity is one of the most challenging problems in deep language understanding systems. In this talk, I briefly discuss Scope Underspecification, the most common way to deal with quantifier scope ambiguity in deep semantic representation. I will then discuss our efforts to build the first corpus of scope-disambgated english text in which there is no restriction on the number or the type of the scopal operators. I will explain some of the major difficulties in the hand-annotation of quantifier scoping and present our solutions to overcome those. Finally, I will explain a Maximum Entropy model adopted to do automatic scope disambiguation on the corpus, defining a baseline for future efforts.
|
|
Fall 2011 | |||
Sep. 20 | Dan Roth |
Learning from Natural Instructions
Note special time: 9h30--11h00 About the speaker Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the director of a DHS Center for Multimodal Information Access & Synthesis (MIAS) and also has faculty positions in Statistics, Linguistics and at the School of Library and Information Sciences. Roth is a Fellow of AAAI for his contributions to the foundations of machine learning and inference and for developing learning-centered solutions for natural language processing problems. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community. Prof. Roth has given keynote talks in major conferences, including AAAI, EMNLP and ECML and presented several tutorials in universities and conferences including at ACL and EACL. Roth was the program chair of AAAI'11, CoNLL'02 and of ACL'03, and is or has been on the editorial board of several journals in his research areas and has won several teaching and paper awards. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D in Computer Science from Harvard University in 1995. Abstract Machine learning is traditionally formalized as the study of learning concepts and decision functions from labeled examples. This requires representations that encode information about the target function's domain. We are interested in providing a way for a human teacher to interact with an automated learner using natural instructions communicating relevant domain expertise to the learner without necessarily knowing anything about the internal representation used in the learning process. The underlying problem becomes that of interpreting the natural language lesson in the context of the task of interest. This talk focuses on the machine learning aspects of this problem. The key challenge is to learn intermediate structured representation - natural language interpretations - without being given direct supervision at that level. We will present research on Constrained Conditional Models (CCMs), a framework that augments probabilistic models with declarative constraints in order to support learning such interpretations. In CCMs we formulate natural language interpretation problems as Integer Linear Programs, as a way to assign values to sets of interdependent variables and perform constraints-driven learning and global inference that accounts for the interdependencies. In particular, we will focus on new algorithms for training these global models using indirect supervision signals. Learning models for structured tasks is difficult partly since generating supervision signals is costly. We show that it is often easy to obtain a related indirect supervision signal, and discuss several options for deriving this supervision signal, including inducing it from the world's response to the model's actions, thus supporting Learning from Natural Instructions. We will explain and show the contribution of easy-to-get indirect supervision to other NLP tasks such as Information Extraction, Transliteration and Textual Entailment. | |
Oct. 5 | Kathleen Fraser |
Projected Barzilai-Borwein Method with Infeasible Iterates for Nonnegative Image Deconvolution
Abstract The Barzilai-Borwein (BB) method for unconstrained optimization has attracted attention for its "chaotic" behaviour and fast convergence on image deconvolution problems. However, images with large areas of darkness, such as those often found in astronomy or microscopy, have been shown to benefit from approaches which impose a nonnegativity constraint on the pixel values. I present a new adaptation of the BB method which enforces a nonnegativity constraint by projecting the solution onto the feasible set, but allows for infeasible iterates between projections. I show that this approach results in faster convergence than the basic Projected Barzilai-Borwein (PBB) method, while achieving better quality images than the unconstrained BB method. | |
Oct. 26 | Sravana Reddy |
Unsupervised Learning of Pronunciations
Note special time: 10h30--12h00 Abstract How well can we guess the sound of a word from its textual representation? Translating written language to its spoken form is a key component of speech technology. The standard problem of learning a model of letter-to-phoneme transformations from an existing lexicon is especially hard in writing systems like English where there is a non-trivial mapping between letters and phonemes. The problem becomes even more complex when we involve accents and dialects, or when we have no parallel training data. In this talk, I will present some of my research that involves learning pronunciations with various degrees of unsupervision. I will first describe two methods for learning the latent alignments between letters and phonemes from an existing pronunciation lexicon in order to build a letter-to-phoneme model. I will also present a method to augment letter-to-phoneme models with speech information -- specifically, speech recognition errors on out-of-vocabulary words. The talk will then discuss the problem of extracting rhyme and meter in an unsupervised way from written poetry, both of which provide major cues to historical and dialectical pronunciations. Finally, I will present ongoing work on learning pronunciations from speech when both the lexicon and the speech transcriptions are unknown, a novel problem that is potentially useful for low-resource languages and dialects. This talk covers joint work with John Goldsmith, Kevin Knight, Evandro Gouvea, and Karen Livescu. | |
Nov. 16 | Alistair Kennedy |
A Supervised Method of Feature Weighting for Measuring Semantic Relatedness
Note special time: 10h30--12h00 Abstract Clustering of related words is crucial for a variety of Natural Language Processing applications. A popular technique is to use the context that a word appears in to build vectors that represent a words meaning. Vector distance is then taken to determine whether two words have similar meanings. Usually these contexts are given weight based on some measure of association between the word and the context. These measures increase the weight of contexts where a word appears regularly but other words do not, and decrease the weight of contexts where many words may appear. Essentially, it is unsupervised feature weighting. I will present and discuss a method of supervised feature weighting. It identifies contexts shared by pairs of words known to be semantically related or unrelated, and then uses this information to weight these contexts on how well they indicate word relatedness. The system can be trained with data from resources such as WordNet or Roget's Thesaurus. This work is as a step towards adding new terms to Roget's Thesaurus automatically, and doing so with high confidence. | |
Nov. 23 | Ulrich Germann |
Resolving Word-Order Differences in German-to-English Machine Translation
Abstract Word-order differences between human languages pose one of the big challenges in automatic translation. The currently predominant translation paradigm, phrase-based statistical machine translation (PBSMT), does reasonably well at handling local word order changes, i.e., word order changes that occur within a small window of just a few words, but often fails to perform necessary large-scale re-arrangements. Translation models based on syntactic trees provide a good account for patterns of such large-scale re-ordering, but rarely outperform PBSMT in practice, as measured by standard evaluation metrics. In this talk, I'll explain why this is the case and present a hybrid method that combines information from source-side parse forests with the strengths of PBMST. | |
Dec. 7 | Atefeh Farzindar |
Trusted Automatic summarization and translation of Legal Information
Abstract NLP Technologies and RALI (Applied Research in Computational Linguistics, Université de Montréal) have developed a technology for automated analysis of legal information in order to facilitate the information research in banks of judgments published by legal information providers. During this seminar, Atefeh Farzindar will give a presentation on TRANSLI, a statistical machine translation system specifically designed for legal texts, and DecisionExpress, a supervised machine learning for the summaries of legal documents in three legal fields: immigration, tax and intellectual property. About the speaker Dr. Atefeh Farzindar is the founder of NLP Technologies Inc., a company specializing in Natural Language Processing, automatic summarization and statistical machine translation. Dr. Farzindar received her Ph.D. in Computer Science from the Université de Montréal and Paris-Sorbonne University. She is an adjunct professor at the Department of Computer Science at the Université de Montréal. Ms. Farzindar has made many contributions to research on the automatic summarization and content management system. As president of NLP Technologies, she has managed multiple collaborative R&D projects with various industry and university partners. She is the chair of the language technologies sector of the Language Industry Association (AILIA). Dr. Farzindar is a board member of the Language Technologies Research Centre, co-chair of the Canadian Conference on Artificial Intelligence 2010 and industry chair for Canadian AI'2011 and AI'2012. About NLP Technologies Inc. NLP Technologies is a specialized company and industry leader in the field of automatic summarization and statistical translation. Founded in February 2005, NLP developed and marketed a computer-generated summarization and statistical translation software stemming from our research, software tools, and related services. The company was founded in response to a specific need of the Canadian government: offering services that streamline the traditional cumbersome and time-consuming processes of reading, analyzing, and researching legal information, and a lack of skilled translators while facing an increasing amount of texts in foreign languages. | |
Winter 2011 | |||
Feb. 4 | Kinfe Tadesse Mengistu |
Adapting Acoustic and Lexical Models to Dysarthric Speech
Abstract Dysarthria is a condition in which speech is made unintelligible due to neurological damage to the part of the brain that controls the physical production of speech and is in part characterized by pronunciation errors that include deletions, substitutions, insertions, and distortions of phonemes. These errors follow consistent intra-speaker patterns that we exploit through acoustic and lexical model adaptation to improve automatic speech recognition (ASR) on dysarthric speech. We show that acoustic model adaptation yields an average relative word error rate (WER) reduction of 36.99% and that pronunciation lexicon adaptation (PLA) reduces the relative WER further by an average of 8.29% on a large vocabulary task of over 1500 words for 6 speakers with severe to moderate dysarthria. PLA also shows an average relative WER reduction of 7.11% on speaker-dependent models evaluated using 5-fold cross-validation. | |
Feb. 18 | Chris Parisien |
Finding Structure in the Muck: Bayesian Models of How Kids Learn to Use Verbs
Abstract Children are fantastic data miners. In the first few years of their lives, they discover a vast amount of knowledge about their native language. This means learning not just the abstract representations that make up a language, but also learning how to generalize that knowledge to new situations -- in other words, figuring out how language is productive. Given the noise and complexity in what kids hear, this is incredibly difficult, yet still, it seems effortless. In verb learning, a lot of this generalization appears to be driven by strong regularities between form and meaning. Seeing how a certain verb has been used, kids can make a decent guess about what it means. Knowing what a verb means can suggest how to use it. In this talk, I present a series of hierarchical Bayesian models to explain how children can acquire and generalize abstract knowledge of verbs from the language they would naturally hear. Using a large, messy corpus of child-directed speech, these models can discover a broad range of abstractions governing verb argument structure, verb classes, and alternation patterns. By simulating experimental studies in child development, I show that these complex probabilistic abstractions are robust enough to capture key generalization behaviours of children and adults. Finally, I will discuss some promising ways that the insights gained from modeling child language can benefit the development of a valuable large-scale linguistic resource, namely VerbNet. | |
Mar. 4 | Antti Arppe |
How to THINK in Finnish? -- Making sense of multivariate statistical analysis of linguistic corpora
I will discuss an overall methodological framework presented in my dissertation for studying linguistic alternations, focusing specifically on lexical variation in denoting a single meaning, that is, synonymy. As the practical example, I employ the synonymous set of the four most common Finnish verbs denoting THINK, namely 'ajatella, miettiä, pohtia, harkita', roughly corresponding 'think, reflect, ponder, consider'. As a continuation to previous work, I describe the extension of statistical methods from dichotomous linguistic settings (e.g., Gries 2003; Bresnan et al. 2007) to polytomous ones, that is, concerning more than two possible alternative outcomes. As the key multivariate method, I demonstrate the use polytomous logistic regression on studied phenomenon. The results of the various statistical analyses confirm that a wide range of contextual features across different categories are associated with the use and selection of the selected Finnish verbs of thinking, with the differences among them being subtle but systematic. Interestingly, many of the individual contextual preferences of these currently abstract verbs can be traced back to their etymological origins denoting concrete agricultural, hunting and fishing activities in early Finnish culture and society. In terms of overall performance of the multivariate analysis and modeling, the prediction accuracy seemingly reaches a ceiling at a Recall rate of roughly two-thirds of the sentences in the research corpus. The analysis of these results suggests a limit to what can be explained and determined within the *immediate sentential context* and applying the conventional descriptive and analytical apparatus based on currently available linguistic theories and models. Nevertheless, Inkpen and Hirst (2006) have reported over 90% accuracy in a similar synonym choice modeling task, but this required explanatory variables indicating "nuances" such as denotational microdistinctions as well as expressive ones concerning the speaker's intention to convey some attitude, in addition to the sought-after style, which are not necessarily explicitly evident in the immediate sentential context nor easily amenable to accurate automated extraction. The results also support Bresnan's (2007) and others' (e.g., Bod et al. 2003) probabilistic view of the relationship between linguistic usage and the underlying linguistic system, in which only a minority of linguistic choices are categorical, given the known context - represented as a feature cluster - that can be analytically grasped and identified. Thus, instead of viewing lexical choice in terms of context-specific rules with exceptions, we may rather interpret different contexts to exhibit degrees of variation as to their outcomes, observable as proportionate choices over longer stretches of usage in texts or speech. | |
Mar. 21 | Uzzi Ornan |
Semantic Search Engine
Note special place and time: PT266 from 10h30--12h00 Abstract: One of the obstacles to a complete and accurate search engine is that in every natural language there are many words with more than one meaning. If the search is done according to the form only, a significant portion of the results will be superfluous,and thus not accurate. The advice to add other word or words to the search may hurt the completeness. In our engine we want to find the intended meaning only of every word. To achieve it, the whole clause or even the sentence must be consulted first. We follow the ideas of early Fillmore's framework that an expression is built around a verb as its center. We have built 'conceptual' lexicons for verbs and for nouns with semantic distinctive features. It describes the real world, thus may fit to all languages. Of course usual morphological and syntactic tools for each language must be activated first. The engine identifies the verb of the syntactic unit, firstly eliminates syntactic improper NPs (for some languages word order is essential). Then it looks for semantic distinctive features demanded by the verb, and eliminates NPs that don' have the needed features. The result is that the program chooses the word with the requested meaning only. Building the engine is still in progress. We are manually enlarging the lexicons from 20,000 to 30,000 entries. At the beginning the Engine has been built for Hebrew, which includes special problems due to its script. Most of the vowels in Hebrew or Arabic script are not written, many particles are attached to the word without space, a double consonant is written with one letter, and some letters signify both vowels and consonants. Thus, almost every string of characters may designate several words (the average in Hebrew is almost three words). The program converts each string to all possible phonemic written words in Latin characters, thus have a plenty of possible words in the expression to consider. Now we choose the proper word in the whole sentence, as was described above. | |
Mar. 22 | Mark Hasegawa-Johnson |
Semi-Supervised Learning for Spoken Language User Interface
Note special place and time: PT266 from 11h00--13h00 About the speaker Mark is Associate Professor, Department of Electrical and Computer Engineering, at the University of Illinois, Urbana. His field of interest is speech production and recognition by humans and computers, including landmark-based speech recognition, integration of prosody in speech recognition and understanding, audiovisual speech recognition, computational auditory scene analysis, and biomedical imaging of the muscular and neurological correlates of speech production and perception. Abstract Speech is rhythm and melody, with perceptually salient pops and hisses inserted as necessary to optimize the channel capacity. Although speech is usually transcribed using a sequence of letters, it is rarely spoken using the sounds those letters represent. In this talk I will argue that babies, polyglots, and machine learning algorithms are best able to learn speech if they treat its associated text transcription as, at best, an untrustworthy indication of things that might have been contained in the utterance. Speech is primarily prosody; the pragmatics of an utterance govern its phrasing, and the phrasing of the utterance governs the coordination and strength of the articulatory gestures implementing any particular syllable. Fortunately, the phrasing of an utterance is one of its most perceptually salient characteristics, therefore prosody can be learned with good accuracy using semi-supervised machine learning techniques, including regularized Gaussian mixture modeling methods. Phonetic landmarks can also be learned using semi-supervised methods. The sequence of articulatory gestures is hard to learn using regularized semi-supervised methods, but can be predicted pretty well from first principles, and is therefore amenable to modeling using a finite state transducer or dynamic Bayesian network. With appropriate combination of landmark-based acoustic analysis, gesture-based pronunciation analysis, and prosody-based content analysis, it becomes possible to create human-computer interfaces that perform better than the state of the art in some challenging domains, e.g., in the domain of second-language pronunciation training, and in the domain of assistive and augmentative communication for talkers with Cerebral Palsy. | |
Apr. 1 | Vivian Tsang |
Error Recovery in Learning
Abstract: Garden path sentences are sentences that are grammatically correct but written in a way that the most likely interpretation is incorrect. These sentences highlight two interesting aspects of human communication: 1) humans have a tendency to make assumptions (predictions?) about the content before it is completely revealed and 2) when an assumption is broken, it is not easy to recover unless the person is willing to backtrack and start afresh. In communication, we posit that miscommunication occurs often due to incorrectly made assumptions about the content or the mental state of the other person. Miscommunication itself can be repaired as long as the participants involved are willing to clarify and repair. The repair may not be so straightforward when it is cast within the context of learning where there is a power differential between the learner and the authority, and the learner has to the rely on the authority as the "gold standard." We take a more nuanced view about learning in that a learner's error may not be entirely erroneous. For example, children are known to overgeneralize past-tense inflection, such as applying the -ed inflection on irregular verbs. This is indeed an error but also demonstrates an awareness of the regular inflection. How is the learner (and importantly, the authority) to recognize the error and yet to be cognizant of the partially correct aspect of the erroneous behaviour? (And how does one know if a correct behaviour is correct for the right reason?) We will describe our preliminary experimental setup to examine error recovery in human learning. | |
Fall 2010 | |||
Sept. 24 | Suresh Manandhar |
Graph based methods for inducing word senses
Note special place: BA5256 Abstract: Unsupervised learning of lexical semantics is an emerging area within NLP that poses interesting and challenging problems. The primary advantage of unsupervised and minimally supervised methods is that annotated data is not required or required only in small quantities. In this talk, I will present our current work on word sense induction. Unsupervised sense induction is the task of discovering all the senses of a given word from raw unannotated data. Our collocational graph based method achieves high evaluation scores while overcoming some of the limitations of existing methods. We show graph connectivity measures can be employed to avoid the need for supervised parameter tuning. And finally, hierarchical clustering and hierarchical random graphs can be employed for inducing concept hierarchies. | |
Oct. 25 | Yuji Matsumoto |
Japanese National Corpus Project and Corpus Tools
Note the atypical day and time: 25 Oct at 9h30. The meeting will be held in PT266. Abstract: We have been participating in the Japanese National Corpus Project, aiming at construction of 100 million word contemporary Japanese corpus. Our main tasks in this project are to develop corpus annotation tools such as POS taggers, chunkers, dependency parsers, and predicate-argument structure analyzers, and to implement corpus management tools for corpus retrieval and annotation error correction. After brief introduction of the project, I will mainly talk about the problems in syntactic annotation, especially the way we handle coordination structure analysis and its annotation scheme. | |
Oct. 25 | Eric Nichols |
Statement Map: Reducing Web Information Noise through Opinion Classification
Note the atypical day and time: 25 Oct at 9h30. The meeting will be held in PT266. Abstract: On the Internet, users often encounter noise in the form of spelling errors or unknown words, however, dishonest, unreliable, or biased information also acts as noise that makes it difficult to find credible sources of information. As people come to rely on the Internet for more and more information, reducing this credibility noise grows ever more urgent. The Statement Map project's goal is to help Internet users evaluate the credibility of information sources by mining the Web for a variety of viewpoints on their topics of interest and presenting them to users together with supporting evidence in a way that makes it clear how they are related. In this presentation, we show how a Statement Map system can be constructed by combining Information Retrieval (IR) and Natural Language Processing (NLP) technologies, focusing on the task of organizing statements retrieved from the Web by viewpoints. We frame this as a semantic relation classification task, and identify 4 semantic relations: [AGREEMENT], [CONFLICT], [CONFINEMENT], and [EVIDENCE]. The former two relations are identified by measuring semantic similarity through sentence alignment, while the latter two are identified through sentence-internal discourse processing. As a prelude to end-to-end user evaluation of Statement Map, we present a large-scale evaluation of semantic relation classification between user queries and Internet texts in Japanese and conduct detailed error analysis to identify the remaining areas of improvement. | |
Oct. 29 | Bob Carpenter |
Hierarchical Models of Data Coding: Inferring Ground Truth along with Annotator Accuracy, Bias, and Variability
Abstract: Supervised statistical models often rely on human-coded data. For instance, linguists might code Arabic text for syntactic categories or code newspaper titles for political bias. In epidemiology, doctors tag images or tissue samples with respect to patient disease status. Most commonly, their collective decisions are coerced by voting, adjudication, and/or censoring into a best-guess ``gold standard'' corpus, which is then used to evaluate model performance. In this talk, I'll introduce a generative hierarchical model and full Bayesian posterior inference for the annotation process for categorical data. Given a collection of annotated data, we can infer the true labels of items, the prevalence of some phenomenon (e.g. a given intonation or syntactic alternation or the disease prevalence in a population), the accuracy and category bias of each annotator, and the codability of the theory as measured by the hierarchical model of accuracy and bias of annotators and their variability. I'll demonstrate the efficacy of the approach using expert and non-expert pools of annotators for simple linguistic labelling tasks such as textual inference, morphological tagging, and named-entity extraction, as well as for dentists labeling X-rays for cavities. The model not only automatically adjusts for spam annotators, it infers more accurate gold-standard data than simpler approaches such as voting and censoring. I'll discuss applications such as monitoring an annotation effort, selecting items with active learning, and generating a probabilistic gold standard for model training and evaluation. I'll also discuss the challenge of estimating item difficulty effects, which are evident to annotators and also apparent through observed covariance among annotation decisions. | |
Nov. 12 | Tim Fowler | Parsing with categorical grammars | |
Nov. 26 | Tong Wang |
Associating Difficulty in Near-Synonymy Choice with Types of Nuance using Core Vocabulary
Abstract Stylistic variation among near-synonyms is an important dimension that has been frequently addressed in near-synonymy research. In this study, we hypothesize that the stylistic nature of nuances correlates to the degree of difficulty in choosing between near-synonyms. Contrasting some recent studies that focus on contextual preferences of synonyms (e.g., Arppe & Järvikivi 2007), we elect to investigate the internal features of near-synonym nuances. We adopt the notion of core vocabulary to associate stylistic variation in theory with the difficulty level of near-synonym choice in practice. To test our hypothesis, a near-synonym lexical choice task (Edmonds 1997) is employed to measure difficulty levels. Our study shows that variance of performance on this task is correlated with differing degrees of coreness of the near-synonyms, and in turn, different types of near-synonym variations. Counter to intuition, the seemingly subtle stylistic nuances are usually easier for subjects to distinguish than non-stylistic differences. | |
Dec. 10 | Vanessa (Wei) Feng |
Classifying arguments by scheme
Abstract Argumentation schemes are structures or templates for various kinds of arguments. The argumentation scheme classification system that I am going to present introduces a new task in this field. To the best of our knowledge, this is the first attempt to classify arguments into argumentation schemes automatically. Given the text of an argument with premises and conclusion identified, we classify it as an instance of one of five common schemes, using general features and other features specific to each scheme, including lexical, syntactic, and shallow semantic features. We achieve accuracies of 63-91% in one-against-others classification and 80-94% in pairwise classification (baseline = 50% in both cases). We design a pipeline framework whose ultimate goal is to reconstruct the implicit premises in an argument, and our argumentation scheme classification system is aimed to address the third component in this framework. While the first two portions of this framework can be fulfilled by work of other researchers, we propose a syntactic-based approach to the last component of this framework. The completion of the entire system will benefit many professionals in applications such as automatic reasoning assistance. | |
Winter 2010 | |||
Mar. 3 | Frank Rudzicz |
Adaptive kernel canonical correlation analysis for estimation of task dynamics from acoustics
Abstract: I present a method for acoustic-articulatory inversion whose targets are the abstract tract variables from task dynamic theory. Towards this end I construct a non-linear Hammerstein system whose parameters are updated with adaptive kernel canonical correlation analysis. This approach is notably semi-analytical and applicable to large sets of data. Training behaviour is compared across four kernel functions and prediction of tract variables is shown to be significantly more accurate than state-of-the-art mixture density networks. | |
Mar. 17 | Chris Parisien |
CANCELLED: Learning verb alternations in a usage-based Bayesian model
Abstract One of the key debates in language acquisition involves the degree to which children's early linguistic knowledge employs abstract representations. While usage-based accounts that focus on input-driven learning have gained prominence, it remains an open question how such an approach can explain the evidence for children's apparent use of abstract syntactic generalizations. We develop a novel hierarchical Bayesian model that demonstrates how abstract knowledge can be generalized from usage-based input. We demonstrate the model on the learning of verb alternations, showing that such a usage-based model must allow for the inference of verb class structure, not simply the inference of individual constructions, in order to account for the acquisition of alternations. | |
Apr. 14 | Aida Nematzadeh | TBD | |
Apr. 15 | Jackie C.K. Cheung |
Parsing German Topological Fields with Probabilistic Context-Free Grammars
Research in statistical parsing has produced a number of high-performance parsers using probabilistic context-free (PCFG) models to parse English text (Collins, 2003; Charniak and Johnson, 2005 inter alia). Problems arise, however, when applying these methods to freer-word-order languages. Such languages as Russian, Warlpiri, and German feature syntactic constructions that produce discontinuous constituents, directly violating one of the crucial assumptions of context-free models of syntax. While PCFG technologies may thus be inadequate for full syntactic analysis of all phrasal structure in these languages, clausal structure can still be fruitfully parsed with these methods. In this work, we apply a latent variable-based PCFG parser (Petrov et al., 2006) to extract the topological field structure of German. These topological fields provide a high-level description of the major sections of a clause in relation to the clausal main verb and the subordinating heads and appear in strict linear sequences amenable to PCFG parsing. They are useful for tasks such as deep syntactic analysis, part-of-speech tagging and coreference resolution. We perform a qualitative error analysis of the parser output, and identify constructions like ellipses and parentheticals as the chief sources of remaining error. This result is confirmed by a further experiment in which parsing performance improves after restricting the training and test set to those sentences without these constructions. We also explore techniques for further improving parsing results. For example, discriminative reranking of parses made by a generative parser could incorporate linguistic information such as those derived by our qualitative analysis. Another possibility is self-training, a semi-supervised technique which utilizes additional unannotated data for training. | |
Fall 2009 | |||
Sep. 1 | Barrou Diallo |
Research in Chinese machine translation at the European Patent Office
Note special time: 9h00 (not 9h10) to 10h00 About the speaker Barrou Diallo is the Head of Research at the European Patent Office and Advisor to the Information Retrieval Facility. His focus is on machine translation, data mining, and enterprise architecture. He holds a Ph.D. in computer sciences, an M.Sc. in biomathematics and an M.Sc. in law and cyberspace. He has published several papers on patent processing, computer graphics, 3D visualisation and database management. He was the project manager of the first real-time European machine translation system for patents at the EPO. Prior to these various positions at the EPO, Barrou Diallo had a chair as Professor in the Chamber of Commerce of Le Mans and Assistant Professor at the University of Compiègne. | |
Sep. 2 | Akira Ushioda |
MT research and development in Japan
Note special time and place: PT378 at 11h00 About the speaker Dr. Akira Ushioda obtained his Ph.D. from Carnegie Mellon University in 2000, and worked as a senior researcher (2000-2002), and director of Intelligent Systems Laboratory (2003-2004) at Fujitsu Laboratories Ltd.. He is currently a Research Fellow at Fujitsu Laboratories and Guest Associate Professor of Nara Institute of Science and Technology, Japan. Dr. Ushioda's research interests cover a range of topics in the area of Natural Language Processing and statistical learning, including a lexical statistical parser, integration of SMT and RBMT, automatic clustering of words and phrases, and statistical word sense disambiguation. Abstract: The market size of private cramming schools and preparatory schools in Japan is 10 billion dollars and more than a third of the market is comprised of language schools, mostly English language schools. The Japanese people are thus enthusiastic about learning English, and yet the TOEIC report on test-takers worldwide shows that the average TOEIC score of Japanese test-takers is ranked 25th out of the 27 countries with most active test-takers. The awareness of poor performance makes them more desperate to learn English. Poor human performance, on the other hand, makes relative performance of, and expectation for, MT higher. Japan has been thus quite actively engaged in developing machine translation technology both at the government level and on the private-sector level. EDR (Electronic Dictionary Research) project, a government-led electronic dictionary research project, for example, began in 1986, and continued for a decade with a total budget of 150 million dollars. The participants in the project from the private-sector include major Japanese electronics companies, such as Hitachi, Toshiba, Panasonic, Sharp, NEC and Mitsubishi Electric Corp. Fujitsu Laboratories, also a participant in the EDR project, began developing English-to-Japanese and Japanese-to-English MT systems in early 80's. Unlike other Japanese MT makers Fujitsu employs an interlingua-oriented translation scheme which makes difference in concept representation between Japanese and English easier to overcome. The deeper semantic representation, on the other hand, makes the grammar rule set somewhat harder to maintain and grow. Instead of further modifying the rule-based scheme, we are investigating the way to incorporate SMT framework into the existing scheme. One of the issues at hand is how to bridge the gap between the RBMT ``phrases'' and SMT ``phrases.'' This talk will provide a background and an overview of MT development in Japan, describes Fujitsu's MT research and development, and discusses future direction of MT research and major challenges. | |
Sep. 16 | CL Group | Fall 2009 welcoming meeting | |
Sep. 23 | Varada Kolhatkar |
An extended analysis of a method of all words sense disambiguation
One of the central problems in processing a natural language is ambiguity. In every natural language there are many potentially ambiguous words. Humans are fairly adept at solving ambiguity by drawing on context and their knowledge of the world. However, it is not so easy for machines to understand the intended meaning of a word in a given context. Word Sense Disambiguation (WSD) is the process of selecting the correct sense of a word in a specific context. It is often useful to generalize the problem of disambiguating a single word to that of disambiguating all content words in a given text. This generalized problem is referred to as all-words sense disambiguation. The long history of WSD research includes many different supervised, unsupervised and knowledge-based approaches. But the reality is that current state-of-the-art accuracy in WSD remains a long way off far from natural human abilities. We present our analysis of some of the components that might be contributing to the level of error currently plaguing all-words sense disambiguation. Our analysis makes use of WordNet::SenseRelate::AllWords, an unsupervised knowledge-based system for all-words sense disambiguation, which is freely available on the Web as a perl Module. The system assigns a WordNet sense to each word in a text using measures of semantic similarity and relatedness. We find that the degree of difficulty in disambiguating a word is proportional to the number of senses of that word (polysemy). The experimental evidence indicates that a significant percentage of word sense disambiguation error is caused by a relatively small number of highly frequent word types. We also demonstrate that part-of-speech tagged text will be disambiguated more accurately than raw text. We show that expanding the context window helps in terms of coverage but doesn’t improve disambiguation. Finally we find that if the answer is not the most frequent sense, disambiguation turns out to be a hard problem even for an unsupervised system which doesn’t use any information about sense distribution. | |
Oct. 7 | Mohamed Attia |
Automatic full phonetic transcription of Arabic script
Abstract: Handling most of the non-trivial NLP tasks via rule-based (i.e. language factorizing) methods typically ends up with multiple possible solutions/analyses. After exhausting all the known/applicable rule-based methods, statistical methods are one of the most effective, feasible, and widely adopted approaches to automatically resolve that ambiguity. Many researchers, however, argue that if statistical disambiguation is eventually deployed to get the most likely analysis/sequence of analyses, why do not we go fully statistical (i.e. non factorizing) from the very beginning and give up the burden of rule-based methods? In our attempt to get the best performance of automatic full phonetic transcription of open-domain Arabic script, which is a tough industrial problem vital for applications like Arabic TTS systems, building Arabic ASR training corpora ... etc., one fundamental design task was to decide whether to go with the former design architecture (language factorization, then statistical disambiguation) or with the latter one (statistical disambiguation on un-factorized tokens). While our years-long research on ``automatic Arabic phonetic transcription'' ended up with the experimentally evident best performing system reported so far in the scientific literature (as per the mid. of 2009), the winning architecture has interestingly been neither of the two abovementioned options alone but a hybrid of both! While the non-factorizing architecture is more computationally economic and easier to implement, the language factorizing one overcomes the severe problem of coverage that emerges with the non-factorizing one. While both approaches asymptote to the same ceiling of accuracy, the former has a faster learning curve than the latter. So, the best hybrid architecture starts with trying the non-factorizing method on the input raw Arabic string. Only if a mis-coverage happens, it switches (backs-off) to factorizing method. While these conclusions have been obtained on the specific problem of ``Automatic Full Phonetic Transcription of Arabic Script'', we think that many other problems - where selecting between going factorizing or non-factorizing is an issue - may also benefit from this experience. | |
Oct. 21 | Julian Brooke |
A semantic approach to automated text sentiment analysis
The identification and characterization of evaluative stance in written language poses a unique set of cross-disciplinary challenges. Beginning with a review of relevant literature in linguistics and psychology, I trace recent interest in automated detection of author opinion in online product reviews, focusing on two main approaches: the semantic model, which is centered on deriving the semantic orientation (SO) of individual words and expressions, and machine learning classifiers, which rely on statistical information gathered from large corpora. To show the potential long-term advantages of the former, I describe the creation of an SO Calculator, highlighting relevant linguistic features such as intensification, negation, modality, and discourse structure, and devoting particular attention to the detection of genre in movie reviews, integrating machine classifier modules into my core semantic model. Finally, I discuss sentiment analysis in languages other than English, including Spanish and Chinese. | |
Nov. 4 | Paul Thompson |
Semantic Hacking
About the speaker Paul Thompson is Chief Computational Linguist, Text Exploitation and Decision Support at General Dynamics Advanced Information Systems, Buffalo. Abstract Forensic linguistics, or the use of linguistic analysis techniques to interpret evidence, e.g., authorship attribution, is an established discipline. In this talk I will describe research on the application of forensic linguistic techniques to computer security in the context of the Semantic Hacking project at Dartmouth College's Institute for Security Technology Studies. I will also discuss related research projects, including research on the detection of deception in text and in computer-mediated communication. | |
Nov. 18 | Gabriel Murray |
Summarizing Conversations in Various Modalities
Abstract: In recent years, summarization research has extended beyond the extractive summarization of well-structured documents such as newswire and journal articles to consider corpora such as meeting transcripts, web-logs, lectures and emails. In many of these domains, researchers have found evidence that domain-specific features can yield additional improvement beyond the performance provided by standard text summarization algorithms. For example, prosodic features can be extracted from the speech signal to aid meeting and lecture summarization, while emails contain useful header information such as the number of recipients and the presence of attachments. In our research we investigate whether these conversational domains can be treated similarly, using a unified conversation feature set for extractive summarization. We show that this novel conversation summarization approach can perform on par with domain-specific approaches for meeting and email data, while being flexible enough to apply to many other conversation domains. This talk will also include a description of subjectivity detection and its application to conversation summarization, as well as an overview of our current approach which moves beyond extractive summarization. | |
Dec. 2 | Daphna Heller |
The use of common ground information in real-time comprehension and production
It is well known that the appropriateness of utterances depends on contextual information, but since contextual information is extremely varied in nature and has to be gathered from multiple sources, it remains an open question whether interlocutors can, in fact, use contextual information in real-time comprehension and production. In this talk, I focus on perspective information: what information is assumed to be shared among interlocutors and what information is privileged to one interlocutor but not the other. I present two psycholinguistic experiments investigating the ability of interlocutors to use the distinction between shared and privileged information in the earliest moments of comprehension and production. Experiment 1 uses the 'visual world' eye-tracking paradigm to study the comprehension of definite descriptions containing scalar adjectives when the visual perspectives of the interlocutors differ. Experiment 2 examines the production of artificial names for novel shapes in cases where the speaker learned more names than the addressee. The results demonstrate that perspective information is used from the earliest moments, of both comprehension and production highlighting interlocutors impressive ability to use contextual information in real time. | |
Winter 2009 | |||
Jan. 16 | Shalom Lappin |
Expressiveness and Complexity in Underspecified Semantics
Today's speaker is a visiting professor from the Department of Philosophy at King's College London. Abstract: In this paper we address an important issue in the development of an adequate formal theory of underspecified semantics. The tension between expressive power and computational tractability poses an acute problem for any such theory. Generating the full set of resolved scope readings from an underspecified representation produces a combinatorial explosion that undermines the efficiency of these representations. Moreover, Ebert (2005) shows that most current theories of underspecified semantic representation suffer from expressive incompleteness. In previous work we present an account of underspecified scope representations within Property Theory with Curry Typing (PTCT), an intensional first-order theory for natural language semantics. We review this account, and we show that filters applied to the underspecified-scope terms of PTCT permit expressive completeness. While they do not solve the general complexity problem, they do significantly reduce the search space for computing the full set of resolved scope readings in non-worst cases. We explore the role of filters in achieving expressive completeness, and their relationship to the complexity involved in producing full interpretations from underspecified representations. | |
Feb. 27 | Canceled |
Graduate Visit Day
Abstract: | |
Mar. 13 | Shane Bergsma |
Web-Scale Models of Natural Language
Today's speaker is visiting from the University of Alberta. Abstract: The World Wide Web has had an enormous impact on Natural Language Processing (NLP) research, both as a source of data and as a stimulus for new language technology. In this talk, I describe several recent NLP systems that use web-scale statistics to achieve superior performance. These systems employ supervised machine learning as a simple but powerful mechanism for integrating web-scale data. I present the evolution of using the Internet for language research: from the initial enthusiasm for search-engine page counts to the more scientifically-sound usage of web-scale text databases. | |
Mar. 20 | Yang Liu |
Extractive summarization and keyword extraction using meeting transcripts
Abstract: Meeting corpus is much more challenging than written text (such as news article) for various language processing tasks. In this talk, I will discuss some research we have done in the past two years on meeting understanding, specifically, extractive meeting summarization and keyword extraction. When using a supervised learning framework for summarization, to address the imbalanced data problem and human annotation disagreement, we propose different sampling methods and a regression model. I will present improved results using these methods for meeting summarization, as well as studies on the correlation of the automatic ROUGE measures and human evaluation for summarization. I will also show various results for keyword extraction, comparing supervised and unsupervised approaches, and how to leverage summaries for keyword extraction. | |
Mar. 27 | Abdel-Rahman Mohamed |
Hafss, A Computer Aided Pronunciation Learning system
Abstract: In this talk, I will describe a speech-enabled Computer Aided Pronunciation teaching (CAPT) system HAFSS. This system was developed for teaching Holy Qur'an recitation rules and Arabic pronunciations. HAFSS uses a state of the art speech recognizer to detect errors in user recitation. One important point that is critical in any practical language learning system that exploits ASR technology is the user enrollment time (the time needed to train the system to the user's voice). I will talk about the enrollment process in Hafss and I will discuss methods that was found helpful in reducing the total enrollment time needed by the system. I will also introduce one experiment that measures the usefulness of the system to a novice user and another one that measures the correlation between the judgments of HAFSS system and the judgments of four human experts. | |
Apr. 3 | Tong Wang |
Extracting Synonyms from Dictionary Definitions
Abstract: Many research efforts have been spent in extracting words of different lexical semantic relations from various resources; the extraction of synonyms, however, is proved to be nontrivial due to the difficulty of coming up with features that are exclusive for synonymy. I will talk about two rule-based approaches for extracting synonyms from dictionary definitions: by building an inverted index and by bootstrapping and matching against regex patterns. In one of the two evaluation schemes I used, these seemingly simple approaches actually outperform the best reported lexicon-based method by a large margin. | |
Fall 2008 | |||
Sep. 19 | Naishi Liu |
A Reduced Graph Model of Jokes
Today's speaker is a visiting scholar from Shanghai Jio Tong University. Abstract: The talk is an introduction to a graph-theoretic model for the understanding of verbal humor (especially jokes). It follows the tradition of CL and is based on the previous linguistic researchers making use of the graph elements such as vertices, edges, and subgraphs. The result is an interpretation model that accounts for how we understand humor, based on which algorithms may be designed to facilitate automatic humor processing. Warning: The presentation may contain some sexually oriented or sexist data. | |
Oct. 3 | Anatoliy Gruzd |
Name Networks: A Content-Based Method for Automated Discovery of Social
Networks to Study Collaborative Learning
Today's speaker is a PhD student at the Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Abstract As a way to gain greater insight into the operation of e-learning communities, the presented work applies automated text mining techniques to text-based communication to identify, describe and evaluate underlying social networks. The research demonstrates that the resulting social networks can be used by members of e-learning communities to improve the learning experience. While faculty and administration can use them to understand online learning processes and to develop more appropriate and effective programs for the next generation of learners. | |
Oct. 17 | Prof. Iryna Gurevych |
Putting the "Wisdom-of-Crowds" to Use in NLP: Collaboratively Constructed Semantic Resources on the Web
About the Speaker: Iryna Gurevych is Director of the Ubiquitous Knowledge Processing (UKP) Lab at the Technical University of Darmstadt. She is a receiver of the Young Excellence Emmy-Noether Award by the German Research Foundation (DFG) and a Lichtenberg-Professorship Award by the Volkswagen-Foundation. Iryna is currently Principal Investigator of the projects ``Semantic Information Retrieval'', funded by the German Research Foundation (DFG), ``Mining Lexical-Semantic Knowledge from Dynamic and Linguistic Sources and Integration into Question Answering for Discourse-Based Knowledge Acquisition in eLearning'', funded by the DFG, and THESEUS ``TEXO - Future Business Value Networks: Business Web'', funded by the German Ministry of Economics and Technology. She is a lecturer and scientific advisor in the research training program ``Quality Enhancement in eLearning through Regenerative Processes'', funded by the DFG. s Lab conducts research in the areas of lexical semantic processing with the focus on Web-based semantic resources, integrating lexical semantic knowledge in information retrieval and question answering, and text mining with the focus on sentiment analysis. Further information: http://www.ukp.tu-darmstadt.de/ Abstract: The rise of Web 2.0 and the so called Socio-Semantic technologies in recent years has led to huge amounts of user generated content produced by ordinary users on the Web. This content called for user-generated tagging to enable better information navigation and retrieval. Therefore, semantically tagged collaboratively constructed knowledge repositories emerged that represent a novel type of Web-originated resources - we call them collaboratively constructed semantic resources (CCSR). Example instances of CCSR are collaboratively constructed and semantically enriched multilingual online encyclopedias, such as Wikipedia, or collaboratively constructed online multilingual dictionaries, such as Wiktionary. NLP researchers have started to employ CCSRs as substitutes for conventional lexical semantic resources and repositories of world knowledge, such as thesauri, machine readable dictionaries, or wordnets. In overcoming the limitations of existing resources, such as their coverage gaps, significant construction and maintenance costs, and restricted availability, there is now a hope to significantly enhance the performance of numerous algorithms by utilizing the so called ``wisdom-of-crowds'' in broad coverage NLP systems. Combining CCSRs with statistical measures resulting in the shallow, approximative semantic knowledge has already demonstrated excellent results in some NLP tasks. The talk will present some of the recent work done at the Ubiquitous Knowledge Processing Lab that had a significant impact in the above outlined area. In the first part, a set of semantic relatedness measures operating on various datasets and utilizing either conventional wordnets or CCSRs will be examined. In the second part, the knowledge in Wikipedia and Wiktionary is employed in domain-specific information retrieval and yields significant improvements. The talk will be concluded with some remarks on the interoperability of conventional knowledge resources and CCSRs. | |
Oct. 31 | Fraser Shein |
WordQ and SpeakQ software: Writing made easier
About the speaker Fraser Shein is a new faculty member in the CL group. He is a senior rehabilitation engineer at Bloorview Kids Rehab where his research interests include advanced computer accessibility technology, natural language processing as applied to writing software, speech recognition, and consumer-driven reporting of assistive technology experiences. He is also the President and CEO of Quillsoft Ltd., which produces software to help individuals write text using technologies such as natural sounding text-to-speeech, contextual word prediction, and speech recognition. His profile at Bloorview is here. Abstract: This presentation will discuss and demonstrate how WordQ/SpeakQ software (both Windows and Mac OS X) helps you write more easily. Both were developed at Bloorview Kids Rehab (Toronto). As you type, WordQ continuously presents a list of relevant correctly spelled words using word prediction. When the desired word is shown, you can choose it with a single keystroke. High quality text-to-speech feedback enables you to more easily choose words and to identify mistakes. SpeakQ plugs into WordQ and adds simple speech recognition. You can then benefit from a combination of word prediction, speech output and speech input to generate text when stuck with spelling and word forms, identifying errors, proofreading and editing. Current research at Bloorview relating to syntactical and semantic knowledge in word prediction will also be discussed. | |
Nov. 7 | Libby Barak |
Keyword based Text Categorization
Abstract Text Categorization (TC) task is mostly approached via supervised or semi-supervised methods. These solutions require excessive manual labor in order to annotate text samples as training data, which is not always feasible. In this work we investigate Keyword-based Text Categorization using as input only a taxonomy of the category names. The TC method uses a novel combination of Textual Entailment based categorization and Latent Semantic Analysis (LSA) based categorization to create an initial set of unsupervised classified documents. The initial classified set is then used as input for standard supervised categorization method. The proposed method shows promising initial results and reveals interesting phenomena as a basis for further research. | |
Nov. 28 | TBD | To be determined | |
Dec. 5 | Hani Safadi |
Crosslingual implementation of linguistic taggers using parallel corpora
Abstract: The talk addresses the problem of creating linguistic taggers for resource-poor languages using existing taggers in resource rich languages. Linguistic taggers are classifiers that map individual words or phrases from a sentence to a set of tags. Part of speech tagging and named entity extraction are two examples of linguistic tagging. Linguistic taggers are usually trained using supervised learning algorithms. This requires the existence of labeled training data, which is not available for many languages. We describe an approach for assigning linguistic tags to sentences in a target (resource-poor) language by exploiting a linguistic tagger that has been configured in a source (resource-rich) language. The approach does not require that the input sentence be translated into the source language. Instead, projection of linguistic tags is accomplished through the use of a parallel corpus, which is a collection of texts that are available in a source language and a target language. The correspondence between words of the source and target language allows us to project tags from source to target language words. The projected tags are further processed to compute the final tags of the target language words. A system for part of speech (POS) tagging of French language sentences using an English language POS tagger and an English/French parallel corpus has been implemented and evaluated using this approach. | |
Dec. 9 | Dan Jurafsky |
Distinguished Lecture Series Colloquium
Note special time and place: 11:00-13:00, Bahen 1180 About the speaker Dan Jurafsky works at the nexus of language and computation, focusing on statistical models of human and machine language processing. Recent topics include the induction and use of computational models of meaning, the automatic recognition and synthesis of speech, and the comprehension and production of dialogue. He is the recipient of the MacArthur Fellowship and an NSF CAREER award. His most recent book is the second edition of his widely-used textbook with Jim Martin, Speech and Language Processing. | |
Winter 2008 | |||
Jan. 18 | Frank Rudzicz |
Speech Recognition and Computational Linguistics: How to wreck a nice beach whenever a wand Aztecs
Speech and language research is big. Very big. You just won't believe how vastly, hugely, mind- bogglingly big it is. I mean, you may think it's a long way down the road to the chemist's, but that's just peanuts to speech and language research! Listen! And so on... | |
Jan. 30 | Rada Mihalcea |
Linking Documents to Encyclopedic Knowledge: Using Wikipedia as a Source of Linguistic Evidence
Note special time and place: 10:30-12:00, Pratt 266 Wikipedia is an online encyclopedia that has grown to become one of the largest online repositories of encyclopedic knowledge, with millions of articles available for a large number of languages. In fact, Wikipedia editions are available for more than 200 languages, with a number of entries varying from a few pages to more than one million articles per language. In this talk, I will describe the use of Wikipedia as a source of linguistic evidence for natural language processing tasks. In particular, I will show how this online encyclopedia can be used to achieve state-of-the-art results on two text processing tasks: automatic keyword extraction and word sense disambiguation. I will also show how the two methods can be combined into a system able to automatically enrich a text with links to encyclopedic knowledge. Given an input document, the system identifies the important concepts in the text and automatically links these concepts to the corresponding Wikipedia pages. Evaluations of the system showed that the automatic annotations are reliable and hardly distinguishable from manual annotations. Additionally, an evaluation of the system in an educational environment showed that the availability of encyclopedic knowledge within easy reach of a learner can improve both the quality of the knowledge acquired and the time needed to obtain such knowledge. This is joint work with Andras Csomai. | |
Feb. 15 | Graeme Hirst |
Real-word spelling correction with trigrams: A reconsideration of the Mays, Damerau, and Mercer model
The trigram-based noisy-channel model of real-word spelling-error correction that was presented by Mays, Damerau, and Mercer in 1991 has never been adequately evaluated or compared with other methods. We analyze the advantages and limitations of the method, and present a new evaluation that enables a meaningful comparison with the WordNet-based method of Hirst and Budanitsky. The trigram method is found to be superior, even on content words. We then show that optimizing over sentences gives better results than variants of the algorithm that optimize over fixed-length windows. This talk represents collaborative work between Amber Wilcox-Hearn, Graeme Hirst, and Alexander Budanitsky | |
Feb. 29 | Cancelled | Graduate Visit Day | |
Mar. 14 | (Afra Alishahi || Afsaneh Fazly) | A Probabilistic Incremental Model of Word Learning in the Presence of Referential Uncertainty We present a probabilistic incremental model of word learning in children. The model acquires the meaning of words from exposure to word usages in sentences, paired with appropriate semantic representations, in the presence of referential uncertainty. A distinct property of our model is that it continually revises its learned knowledge of a word's meaning, but over time converges on the most likely meaning of the word. Another key feature is that the model bootstraps its own partial knowledge of word--meaning associations to help more quickly learn the meanings of novel words. Results of simulations on naturalistic child-directed data show that our model exhibits behaviours similar to those observed in the early lexical acquisition of children, such as vocabulary spurt and fast mapping. | |
Mar. 28 | Chris Parisien |
An Incremental Bayesian Model for Learning Syntactic Categories
Abstract: I present a method for the unsupervised learning of syntactic categories from text. The method uses an incremental Bayesian clustering algorithm to find groups of words that occur within similar syntactic contexts. The model draws information from the distributional cues of words within an utterance, while explicitly bootstrapping its development on its own partial knowledge of syntactic categories. Using a corpus of child-directed speech, we demonstrate the benefit of a syntactic bootstrap for an incremental categorization model. The model is robust to the noise in real language data, manages lexical ambiguity, and shows learning behaviours similar to what we observe in children. | |
Apr. 11 | Tim Fowler |
Navigating the parsing landscape
Abstract: We will introduce context free grammars (CFGs) and combinatory categorial grammars (CCGs) with a focus on how these formalisms deal with semantics. The known differences between the formalisms will be discussed and the Lambek calculus will be introduced as an ideal comparison point between the two. To do this, we will need to consider the formal language class of natural language. A recent polynomial time parsing result for the Lambek calculus will be introduced and we will discuss possible future research opened up by this result. | |
Fall 2007 | |||
Sept. 14 | CL Group | Fall 2007 Welcoming Meeting | |
Sept. 28 | Gerald Penn |
The Quantitative Study of Writing Systems
Abstract: If you understood all of the world's languages, you would still not be able to read many of the texts that you find on the world wide web, because they are written in non-Roman scripts -- often ones that have been arbitrarily encoded for electronic transmission in the absence of an accepted standard. This very modern nuisance reflects a dilemma as ancient as writing itself: the association between a language as it is spoken and its written form has a sort of internal logic to it that we can comprehend, but the conventions are different in every individual case --- even among languages that use the same script, or between scripts used by the same language. This conventional association between language and script, called a writing system, is indeed reminiscent of the Saussurean conception of language itself, a conventional association of meaning and sound, upon which modern linguistic theory is based. Despite linguists' reliance upon writing to present and preserve linguistic data, however, writing systems were a largely forgotten corner of linguistics until the 1960s, when Gelb presented their first classification. This talk will describe recent work that aims to place the study of writing systems upon a sound computational and statistical foundation. While archaeological decipherment may eternally remain the holy grail of this area of research, it also has applications to speech synthesis, machine translation, and multilingual document retrieval. | |
Oct. 12 | Paul Cook |
Pulling their Weight: Exploiting Syntactic Forms for the Automatic
Identification of Idiomatic Expressions in Context
Abstract: Much work on idioms has focused on type identification, i.e., determining whether a sequence of words can form an idiomatic expression. Since an idiom type often has a literal interpretation as well, token classification of potential idioms in context is critical for NLP. We explore the use of informative prior knowledge about the overall syntactic behaviour of a potentially-idiomatic expression (type-based knowledge) to determine whether an instance of the expression is used idiomatically or literally (token-based knowledge). We develop unsupervised methods for the task, and show that their performance is comparable to that of standard supervised techniques. | |
Oct. 26 | Cancelled | Cancelled | |
Nov. 9 | Graeme Hirst |
Views of Text-Meaning in Computational Linguistics
Abstract: Three views of text-meaning compete in the philosophy of language: objective, subjective, and authorial -- "in" the text, or "in" the reader, or "in" the writer. Computational linguistics has ignored the competition and implicitly embraced all three, and rightly so; but different views have predominated at different times and in different applications. Contemporary applications mostly take the crudest view: meaning is objectively "in" a text. The more-sophisticated applications now on the horizon, however, demand the other two views: as the computer takes on the user's purpose, it must also take on the user's subjective views; but sometimes, the user's purpose is to determine the author's intent. Accomplishing this requires, among other things, an ability to determine what could have been said but wasn't, and hence a sensitivity to linguistic nuance. It is therefore necessary to develop computational mechanisms for this sensitivity. | |
Nov. 23 | Diana Raffman |
Psychological Hysteresis and the Nontransitivity of Insignificant Differences
Abstract: Vague words in natural language cause semantic and logical problems in a variety of disciplines. An especially persistent problem has to do with the nontransitivity of insignificant differences. For example, if eating one candy won't make me fat, then eating two won't; but if eating two won't, then eating three won't; and so on. It seems to follow that eating a thousand pieces of candy won't make me fat. This paradoxical result shows that the word 'fat' is vague. Similarly, if Hillary Clinton is a person, then she was a person one second ago; and if she was a person one second ago, then she was a person two seconds ago; etc. It seems to follow that the conceptus from which Hillary Clinton developed was also a person. The word 'person' is vague. Clearly there is something wrong with this paradoxical form of reasoning, but a satisfactory diagnosis has not been found. In this talk I will propose a diagnosis that appeals to the hysteretical nature of our judgments involving vague words. To that end I will present preliminary results of a psychological study of our use of vague words. | |
Dec. 7 | TBD | TBD |