Queries


Computational Complexity Death Penalty Abortion Gun Control
Computational Geometry Movies Net Censorship Genetic



In these pages we present the results of nine different link-analysis web page ranking algorithms, for the eight queries shown above. The algorithms operate on a collection of pages that is created following the guidelines of Kleinberg [1]. The search engine AltaVista is queried for each of the queries shown above (when a query consists of more than one word, we put the `+' symbol in front, so as to ensure that all pages contain the query terms). The first 200 pages returned by AltaVista form the Root Set . For each page in the Root Set, we store all the out-links of that page, and the first 50 in-links, in the order they are returned by AltaVista. We then expand the Root Set into the Base Set by including the in-links, and out-links of the pages in the Root Set. Given the Base Set, we construct the underlying graph, induced by this set of pages: we include a node for each page, and a (directed) edge for each link between two pages. We remove edges that connect two nodes within the same domain since they usually serve navigation purposes, and we delete isolated nodes. The final graph is given as input to the link-analysis algorithms.

There are two parameters that determine the final graph:

  1. The number of out-links included when expanding the Root Set.
  2. The algorithm used for detecting the intra-domain links.
For part (a) our implementation offers two options; either include in the Base Set only the first 50 out-links of each page in the Root Set (which restricts somewhat the growth of the Base Set), or include in the Base Set all out-links of the pages in the Root Set (which complies with the specifications of Kleinberg). For the intra-domain link detection, we have implemented an algorithm that detects intra-domain links using the IP address, and a more refined algorithm that uses the IP addresses of the two pages, as well as some heuristic that examines the actual web addresses of the pages.

We have produced three different datasets:

The WWW10 paper [3] contains the experiments for the expanded datasets.

For each of these datasets we run nine different algorithms:

When selecting a query, the viewer can browse through the results of the nine algorithms for each of the datasets. Clicking on the name of an algorithm brigs up the ranking of this algorithm for this specific dataset. Clicking on the name of the dataset, brings up the top ten results of all the algorithms for this dataset. Clicking on the "Comparisons" link, brings up an "intersection table" which provides for each pair of algorithms the number of pages that appear in the top-ten of both algorithms. Finally, the link "Dataset Comparisons" brings up a table that presents for each pair of datasets the intersection between the top ten pages reported by the same algorithm. The purpose of this table is to examine the effect of the change of the dataset to the results of the algorithm.

Begin by clicking on one of the queries!


  1. J. Kleinberg. Authoritative Sources in a Hyperlinked environment. Journal of ACM (JASM), 46, 1999.
  2. R Lempel and S. Moran. The stochastic approach for link-structure analysis (pSALSA) and the TKC effect. In 9th International World Wide Web Conference, May 2000.
  3. A. Borodin, G. O. Roberts, J. S. Rosenthal, P. Tsaparas. Finding Authorities and Hubs from Link Structures on the World Wide Web. In 10th International World Wide Web Conference, May 2000.