I am mainly
interested in the practical and theoretical
aspects information retrieval and web search/mining.
- Collecting/Analyzing/Searching the
viral word-of-mouth segment of the Web (e.g. blogs/forums
data) : On-going
project. [Details
will be provided soon]
- Web Search Personalization: The goal of web search
personalization is to allow the user to perform
web search according to his/her personal search preference or context.
It has recently gained a significant attention due to the large number
of users with different search intentions. There is no general
consensus on exactly what web search personalization means, and
moreover, there has been no general criteria for evaluating
personalized search algorithms. In joint work
with A. Borodin, we are pursuing to find a framework,
which is general enough to cover many real application scenarios, and
yet amenable to different analysis approaches. [Send me an e-mail for
more details on this work]
- Local Search: The goal of local search is to
allow the user to perform web search taking into account the provided
geographical context associated with the query. I am
fortunate to have the chance of being involved with the development of
a real local search
engine at Genieknows. I spent almost 2 years on the project
and I really enjoyed working with great people. As part of
the project, we proposed a novel crawling method for collecting
geographically-sensitive web data [WWW2006].
Recently, in joint work with R.J. Miller and H.F. Liu, we studied how
to extend previous link analysis algorithms for ranking
geographically-sensitive web data [WI2007].
- Web communities:
Previous web community extraction algorithms have mainly focused on
linkage relations. In joint work with A. Borodin and L. Goldsmith, we
studied the problem of discovering community structures from web
graphs, taking into account the semantics of pages as well as the
linkage relations amongst them. We consider two possible scenarios for
communit discovery: (1) when the input consists of seed nodes (both
good and bad), and (2) when in addition to seed nodes the input also
has representative keywords for the community. For both
cases, Random Field Ising Model (RFIM) applied to a suitably-weighted
graph was used to
extract a community structure from the given set of pages in a
highly-efficient manner. One additional consequence of our web
community discovery process is that it allows us to define a natural
ranking scheme, FlowRank,
for pages in the community using the
flow values derived from the extraction process. Through experimental
results, we validate the practical feasibility of our
approach. [Send
me an e-mail for more details on this work]
- Stability
Aspect of Web Mining Algorithms: In joint work
with A. Borodin, we studied another qualitative aspect of link analysis
algorithms: stability aspect of link analysis algorithms [COCOON2003]. We also
studied this aspect in the context of web search personalization and
web community extraction.