Queries
In these pages we present the results of nine different link-analysis
web page ranking algorithms, for the eight queries shown above.
The algorithms operate on a collection of pages that is created
following the guidelines of Kleinberg [1]. The search engine
AltaVista is queried for each of the queries shown above (when a query
consists of more than one word, we put the `+' symbol in front,
so as to ensure that all pages contain the query terms).
The first 200 pages returned by AltaVista form the Root Set .
For each page in the Root Set, we store all the out-links of that
page, and the first 50 in-links, in the order
they are returned by AltaVista.
We then expand the Root Set into the
Base Set by including the in-links, and out-links of
the pages in the Root Set. Given the Base Set, we construct the
underlying graph, induced by this set of pages: we include a node
for each page, and a (directed) edge for each link between two pages.
We remove edges that connect two nodes within the same domain since they
usually serve navigation purposes, and we delete isolated nodes.
The final graph is given as input to the link-analysis algorithms.
There are two parameters that determine the final graph:
-
The number of out-links included when expanding the Root Set.
- The algorithm used for detecting the intra-domain links.
For part (a) our implementation offers two options; either include
in the Base Set only the first 50 out-links of each page in the Root Set
(which restricts somewhat the growth of the Base Set),
or include in the Base Set all out-links of the pages in the Root Set
(which complies with the specifications of Kleinberg).
For the intra-domain link detection, we have implemented
an algorithm that detects intra-domain links using the IP address,
and a more refined algorithm that uses the IP addresses
of the two pages, as well as some heuristic that examines the
actual web addresses of the pages.
We have produced three different datasets:
- Regular Dataset: The Base Set is constructed by including
only the first 50 out-links of each Root page. We use the simple algorithm
for detecting intra-domain links.
- Refined Dataset: The Base Set is constructed by including
only the first 50 out-links of each Root page. We use the refined
algorithm for deleting intra-domain links.
- Expanded Dataset: The Base Set is constructed by
including all out-links of each page in the Root Set. We use the
refined algorithm for detecting intra-domain links.
The WWW10 paper [3] contains the experiments for the expanded datasets.
For each of these datasets we run nine different algorithms:
- Kleinberg: The hubs and authorities algorithm as
described by Kleinberg [1].
- pSALSA: The pSALSA algorithm as described
in our paper [3].
- HubAvg: The Hub-Averaging Kleinberg algorithm as
described in our paper [3].
- AThresh: The Authority Threshold Kleinberg algorithm
as described in our paper [3].
- HThresh: The Hub Threshold Kleinberg algorithm as
described in our paper [3].
- FThresh: The Full Threshold Kleinberg algorithm as
described in our paper [3].
- BFS: The BFS algorithm as described in our paper [3].
- SBayesian: The Simplified Bayesian algorithm as
described in our paper [3].
- Bayesian: The Bayesian algorithm as described in our
paper [3].
When selecting a query, the viewer can browse through the results
of the nine algorithms for each of the datasets. Clicking on the name
of an algorithm brigs up the ranking of this algorithm for this
specific dataset. Clicking on the name of the dataset, brings up the
top ten results of all the algorithms for this dataset. Clicking on
the "Comparisons" link, brings up an "intersection table" which
provides for each pair of algorithms the number of pages that appear
in the top-ten of both algorithms. Finally, the link
"Dataset Comparisons" brings up a table that presents
for each pair of datasets the
intersection between the top ten pages reported by the same
algorithm. The purpose of this table is to
examine the effect of the change of the dataset to the results of the
algorithm.
Begin by clicking on one of the queries!
- J. Kleinberg. Authoritative Sources in a
Hyperlinked environment. Journal of ACM (JASM), 46, 1999.
- R Lempel and S. Moran. The stochastic approach for
link-structure analysis (pSALSA) and the TKC effect. In
9th International World Wide Web Conference, May 2000.
- A. Borodin, G. O. Roberts, J. S. Rosenthal, P. Tsaparas.
Finding Authorities and Hubs from Link Structures on the World Wide Web.
In 10th International World Wide Web Conference, May 2000.