Although database systems are not in the scope of this thesis, many of the techniques developed here are connected to the database area. Since the subject of indexing in databases is very vast, just a few essential bibliographical pointers are mentioned in this section.
Databases can store large amounts of data. Usually, each stored entity is a complex structure, called a record (similar to a feature structure). Records are indexed based on the values of certain fields (features). The retrieval is usually not limited to a query where specific values are requested for a field, but must support other types of queries (such as interval queries - where the values should belong to a given interval). An interesting research topic in the area of indexing are the self-adaptive indexing methods, where the indexing can be (semi-)automatically configured. One of the first published work on this topic is [Hammer and Chan1976].
Most of the available database textbooks (such as [Elmasri and Navathe2000]) have chapters dedicated to indexing. Recent research papers on indexing can be found in Kluwer Academic Publishers' series ``Advances in Database Systems'': [Bertino et al.1997], [Manolopoulos et al.1999], or [Mueck and Polaschek1997].
A major difference between indexing in databases and indexing in a
TFSG parser should be noted. Typically, a database consists of a large
collection of objects, and the indexing scheme is designed to improve
retrieval times. It is expected that databases are persistent, with
fewer deletions and insertions than retrievals. From this point of
view, parsing can be seen as managing a volatile database, that is
always empty at start-up. The ratio between insertions and retrievals
in a database application is very small (even equal to 0 when used
only to retrieve data). For indexed parsing, this ratio is much higher
and depends on the structure of grammar rules. For this reason
(similar to those discussed in Section 5.2.3),
indexing methods such as B-trees (commonly used in databases), where
the retrieval can be performed in
operations, but the insertion
needs
operations [Ramesh et al.2001] (where
is
the number of nodes in the B-tree and
the number of index keys
assigned to a node), are not recommended for indexed parsing.