In these pages we present the results of nine different link-analysis
web page ranking algorithms, for the eight queries shown above.
The algorithms operate on a collection of pages that is created
following the guidelines of Kleinberg [1]. The search engine
AltaVista is queried for each of the queries shown above (when a query
consists of more than one word, we put the `+' symbol in front,
so as to ensure that all pages contain the query terms).
The first 200 pages returned by AltaVista form the Root Set .
For each page in the Root Set, we store all the out-links of that
page, and the first 50 in-links, in the order
they are returned by AltaVista.
We then expand the Root Set into the
Base Set by including the in-links, and out-links of
the pages in the Root Set. Given the Base Set, we construct the
underlying graph, induced by this set of pages: we include a node
for each page, and a (directed) edge for each link between two pages.
We remove edges that connect two nodes within the same domain since they
usually serve navigation purposes, and we delete isolated nodes.
The final graph is given as input to the link-analysis algorithms.
There are two parameters that determine the final graph:
We have produced three different datasets:
For each of these datasets we run nine different algorithms:
For part (a) our implementation offers two options; either include
in the Base Set only the first 50 out-links of each page in the Root Set
(which restricts somewhat the growth of the Base Set),
or include in the Base Set all out-links of the pages in the Root Set
(which complies with the specifications of Kleinberg).
For the intra-domain link detection, we have implemented
an algorithm that detects intra-domain links using the IP address,
and a more refined algorithm that uses the IP addresses
of the two pages, as well as some heuristic that examines the
actual web addresses of the pages.
The WWW10 paper [3] contains the experiments for the expanded datasets.
When selecting a query, the viewer can browse through the results
of the nine algorithms for each of the datasets. Clicking on the name
of an algorithm brigs up the ranking of this algorithm for this
specific dataset. Clicking on the name of the dataset, brings up the
top ten results of all the algorithms for this dataset. Clicking on
the "Comparisons" link, brings up an "intersection table" which
provides for each pair of algorithms the number of pages that appear
in the top-ten of both algorithms. Finally, the link
"Dataset Comparisons" brings up a table that presents
for each pair of datasets the
intersection between the top ten pages reported by the same
algorithm. The purpose of this table is to
examine the effect of the change of the dataset to the results of the
algorithm.