Data Sets for Link Analysis Ranking Experiments

Here you can download the data that were used for the experiments. Clicking on any of the query links will download a compressed tar.Z file. The tar file contains a directory with the name of the query (plus some underscores), in which there is a directory "graph" that contains three files: "nodes", "adj_list", "inv_adj_list". The files are explained below the table with the links. The link "All datasets" will download a tar file with all directories for all queries. The link "list2matrix.c" downloads C code for making the adjacency list into an adjacency matrix.

If there are any problems or if you need any further assistance feel free to e-mail me at tsap @ cs.toronto.edu




All datasets









abortion affirmative action alcohol amusement parks architecture automobile industries
armstrong basketball blues cheese classical guitar complexity
computational complexity computational geometry death penalty genetic geometry globalization
gun control iraq war jaguar jordan moon landing movies
national parks net censorship randomized algorithms recipes roswell search engines

shakespeare table tennis vintage cars weather








list2matrix.c




The Nodes file: The file nodes.txt is formatted as follows. First there is an entry that gives the number of pages in the graph. Then there is a list of the page entries. An example of a page entry is the following

34 (67) [R]
http://www.ece.wpi.edu/~jinlee/events/wave/sld024.htm
Accuracy & Computational Complexity
0 1
 

The first number is the page id, a unique indetifier for each page. The second number is an id assigned to the page, when it is first entered in the base set (this can be ignored). The character associated with each page describes the type of the page. The character R is for the pages in the Root set, the character O is for the pages that are pointed to by a page in the root set, and character I is for the pages that point to a page in the root set. The following line is the http address of the page, and the next the title of the page. The two numbers in the last line are the in and out degree of the node.

The Adjancency List file: Stores the adjacency list of the undelying graph of the pages. Each entry of the list is in the form

pid: pid1,pid2,.....,pidN,-1

which means that the page with id pid, points to the pages with ids pid1,pid2,.....,pidN.

Click here for the C code for making the adjacency list into a matrix.

The Inverted Adjancency List file: Stores the inverted adjacency list of the undelying graph of the pages. Each entry of the list is in the form

pid: pid1,pid2,.....,pidN,-1

which means that the page with id pid, is pointed to by the pages with ids pid1,pid2,.....,pidN.