next up previous contents
Next: Statistical vs. Non-Statistical Methods Up: A Discussion About Optimization Previous: A Discussion About Optimization   Contents


Indexing vs. Filtering

Most of the research conducted on improving parsing times for TFSGs is motivated by the observation that unifying two typed feature structures is the most time-consuming operation in the parsing process. Several approaches were taken, such as improving the encoding of TFSs (in order to reduce their size or to speed up the unification process), implementing abstract machines for rule daughter matching, or filtering out unifications that are known to fail, based on fast checks before the unification is attempted. Among these, the last approach is closest to indexing.

In database systems, the de facto standard for retrieving entities of any type is through indexing or hashing [Elmasri and Navathe2000,Bertino et al.1997]. Filtering is accomplished by traversing the entire database and deciding for each entity whether it should be retrieved or not (according to a specific criterion). By indexing, only a restricted number of entities is visited. The different retrieval methods in indexing and filtering are a consequence of the fundamental difference between indexing and filtering: while indexing considers the database as a structured set, filtering assumes the database is not structured [Belkin and Croft1992].

The advantage of using indexing is the support for searches based on multiple criteria (an important aspect, as it will be shown in Section 6.2). Using multiple criteria with filtering means testing all criteria for each entity in the database, while in indexing, each criterion places a restriction on the search space. Another important advantage of using indexing instead of filtering is the support for complex queries that are not limited to membership checking [Manolopoulos et al.1999].


next up previous contents
Next: Statistical vs. Non-Statistical Methods Up: A Discussion About Optimization Previous: A Discussion About Optimization   Contents