2. Web Catalogs

At the core of the Web catalog concept is its data model: WebCat Data Model (WDM). In this data model, the information is organized as a directed tree. That is, a directed graph, with no cycles, with exactly one vertex of zero in-degree, called the root of the tree and all other vertices have in-degree equal to 1. Arcs are labeled with atomic values. The labels for arcs that stem from the same vertex have distinct labels. Vertices are labeled, not with atomic values, but with records. Each record has a unique ID that is built using a specific rule. We present the WDM in Figure 1.

Figure 1: WebCat data model

Vertices in WDM are classified as root, branches and leaves. webcat_new.htmlA leaf is a vertex with zero-out degree. A branch is a vertex that is not root or leaf.

The records associated with labels have different schema for different vertex categories. The record schema associated with branches is:

[ID, label, URL].

The root has associated a similar record:

["root","root",URL]

For the leaves the associated record schema is:

[ID, label, URL, attr1, attr2]

Arcs are labeled with atomic values that are instances of string. Moreover the value of the arc’s label is the same as the value of the label attribute from the record associated with the vertex to which the arc points.

Vertices are situated on different levels. We say that a set of vertices belong the same level if they are situated at the same distance from the root. By distance we mean the number of vertices that have to be traversed in order to reach the vertex from the root.

We present next an instance of the WebCat data model, the data schema of Eddie-Bauer (EB) clothing catalog (Figure 2). In this schema each vertex is associated with an HTML file. The root vertex is the home page of the EB catalog, each branch represents a classification criterion, and a leaf is a product.

Figure2: Instance of WebCat data model, the data schema of Eddie-Bauer catalog

By Web catalog, we mean a collection of interrelated HTML pages connected by hyperlinks according to a logical hierarchy. Therefore, a Web catalog has an entry point from which the whole hierarchy can be explored. We call this entry point the catalog home page. A Web catalog can be a subpart of a larger Web catalog.

Thus, a web catalog is a collection of HTML pages that can accommodate a WebCat data model. By accommodate we mean that starting with the initial collection of HTML pages, using a small number of transformations, we can obtain a new collection which respects the WebCat data model.


© 1998 University of Toronto