1. What is WebCat?

The problem that we address is that of automating the process of searching through several collections of HTML pages, in which pages are interrelated by hyperlinks and the collections have a common domain. We are interested in only those collections for which the common domain can be ordered according to some kind of hierarchy.

In this work, we define such collections as being Web catalogs and we present WebCat, a paradigm to integrate such Web catalogs. In addition, we present an implementation for this paradigm for the domain of clothing catalogs. We refer to a collection of integrated catalogs as a virtual catalog. A virtual catalog acts like a collection of real catalogs; for the user the location of real catalogs, and the classification criteria used by each individual catalog are totally transparent.

The WebCat method is inspired by several research projects in the areas of information integration and wrapper generation for Web sources. The novelty of our method consists in:

 


© 1998 University of Toronto