[an error occurred while processing this directive]

Thesis abstract

We describe a novel method for automatically generating hypertext links within and between newspaper articles. The method is based on lexical chaining, a technique for extracting the sets of related words that occur in texts. Links between the paragraphs of a single article are built by considering the distribution of the lexical chains in that article. Links between articles are built by considering how the chains in the two articles are related. By using lexical chaining we mitigate the problems of synonymy and polysemy that plague traditional information retrieval approaches to automatic hypertext generation.

In order to motivate our research, we discuss the results of a study that shows that humans are inconsistent when assigning hypertext links within newspaper articles. Even if humans were consistent, the time needed to build a large hypertext and the costs associated with the production of such a hypertext make relying on human linkers an untenable decision. Thus we are left to automatic hypertext generation.

Because we wish to determine how our hypertext generation methodology performs when compared to other proposed methodologies, we present a study comparing the hypertext linking methodology that we propose with a methodology based on a traditional information retreival approach. In this study, subjects were asked to perform a question-answering task using a combination of links generated by our methodology and the competing methodology. The result is that links between articles generated using our methodology have a significant advantage over links generated by the competing methodology. We show combined results for all subjects tested, along with results based on subjects' experience in using the World Wide Web.

We detail the construction of a system for performing automatic hypertext generation in the context of an online newspaper. The proposed system is fully capable of handling large databases of news articles in an efficient manner.

Download:  pdf file (440 Kb); gzipped PostScript file (258 Kb); uncompressed PostScript file (1026 Kb).
Request paper copy: Send request with postal address to gh@cs.toronto.edu. [an error occurred while processing this directive] [an error occurred while processing this directive]