Research Center for my thesis


Characterizing and Mining the citation graph
of computer science literature

Yuan An (Supervisors:Prof.Evangelos E.Milios, Prof. Jeannette Janssen)commenced on:Aug.20,2000

This topic involves characterizing the citation graph of computer science extracted from Citeseer in the same way that the Web has been characterized. Citeseer (or ResearchIndex, http://citeseer.nj.nec.com/cs) is supposed to contain more than 50,000 computer science papers. Their references form the citation graph, a graph with unknown properties. It is much smaller than the Web, but it may not be complete (i.e. some references are not in the collection), and it can be extracted (in part or in whole) by querying ResearchIndex. As a result, techniques used to characterize the Web, for example: http://www.almaden.ibm.com/almaden/webmap_press.html, http://www.almaden.ibm.com/cs/k53/www9.final/ and the references in the "Web Science" section of: http://www.cs.dal.ca/~eem/webRobots.html may need to be extended in nontrivial ways for characterizing the citation graph. Variations of the problem include characterizing the citation graph of a subarea of computer science, which may be possible to extract in full from ResearchIndex. The project will have some amount of web programming (a necessary step but not the focus of the thesis), and some serious thought about time and space requirements and data structures for storing the citation graph. A statistical sampling of the citation graph may have to be designed, in the same style as the Web papers above. The back end will be the application of various graph metrics to characterize the graph.


keywords:Artificial Intelligence,Graph Theory,Machine Learning, Web Search Engine,Citation Graph..
  • DB references
  • Information overload references
  • Cybermetrics
  • Bibliometrics
  • Google Toronto Rental
  • Hiring story
  • References in the Web Science in Dr.Milios's Web
  • Useful links in Dr.Milios's Web
  • Dr.Janssen's Web
  • Dr.Lawrence's Web
  • Citeseer
  • CORA search engine
  • IBM Research Almaden News:Researchers map the web
  • Graph structure in the web:IBM
  • Algorithms and Complexity
  • Dr.Kleinberg's webpage
  • perl archive
  • perl libwww-perl
  • Math Concepts
  • WWW Consortium
  • Visualising Web(a paper)
  • OMG
  • JAVA products
  • MSDN library
  • A Java development tools:Together
  • A development enviroment:Sniff+
  • A debugger:Metamata debugger
  • Free Software Foundation:Gnu
  • Gnu-unix workalike tools for Win
  • CVS source code control tool
  • IBM Alphaworks Jikes compiler
  • advanced JAVA for Enterprise App.
  • The Elements of Style:writing in English
  • Great books online
  • EJB,JSP
  • Java 2 Docs
  • Java Servlets's Doc
  • JNI doc
  • JNI tutorial
  • Forte's Doc
  • Bibliometrics of the World Wide Web
  • Dr.Ray R.Larson's webpage:School of Info. Sys.Manag.in UC.Berkley
  • Cybermetrics,bibliometrics,scientometrics
  • EUGENE GARFIELD, Ph.D.
  • The Collection of Computer Science Bibliographies
  • Researches of Barabasi
  • S. Redner's webpage
  • Reference on Zipf's law
  • Java Zone
  • IBM clever searching
  • 70's book:information retrieval
  • Networked Computer Science Technical Reference Library
  • Publications of Graph and Application
  • DB2 V7 Text book
  • DB2 software
  • LEDA:graph algorithms
  • Latex commands
  • How to use Latex
  • Latex help 1.1
  • Math symbols in Latex
  • web trawling
  • Internet Requests for Comments(RFC)
  • Robots
  • CORA
  • Course:Information retrieval,digital libraries and the web
  • Dr.Giles's homepage
  • Econophysics biblio.
  • Java RegExp
  • Unicode regexp
  • LEDA sources code download site
  • LEDA guide
  • LEDA guide download
  • Java net tut
  • Java Net FAQ,JN FAQ
  • Good Java net FAQ
  • RFCs
  • Power law Distribution is Real and Virtual Worlds
  • Bibliography of CS
  • LEDA object
  • WEKA package
  • STL and Quick Reference
  • Information about C++
  • Best C++ practices
  • C++ slides
  • C++ examples
  • C reference
  • C/C++ Library reference
  • Code example from book:Professional Java server programming
  • STL programmer's guide
  • GTK+-the GIMP Tookkit
  • Complete FAQ List of JAVA Techs
  • C++ FAQ lite
  • C++libarary
  • CORBA FAQ
  • Tutorial on CORBA
  • Popular FAQ
  • Stanford Digital Libraries
  • Prof. Mendelzon
  • Linux Sources
  • Math contest problem
  • HRCanada
  • Interview tips
  • Thinking in C++(volume 1)
  • Thinking in C++(volume 2)
  • C++ goodies
  • Bjarne Stroustrup:C++
  • Scott Meyers is one of the world's foremost experts on C++ software development
  • Computer Networks resouces
  • repeater,bridge,router,switch
  • Dr.Combinatorial
  • Dr.Approximation algorithm
  • Dr.graph decomposition
  • KL algorithm for graph partitioning
  • Dr. Kleinberg
  • GTL software
  • Graph partitioning course
  • IBM Fast and effective algorithms for graph partitioning and sparse-matrix ordering