The Empty-First-Daughter (EFD) parsing method is employed to support the indexing techniques introduced in this thesis. It is also used as the baseline parser in comparison with the improved, indexed, parser. This section outlines the principles of this algorithm; a detailed presentation can be found in [Penn1999c] and [Penn and Munteanu2003]
An important issue in bottom-up parsing, when implemented in Prolog, is the amount of copying. In standard Prolog parsers, each time an edge is matched with a daughter (regardless of the success of the matching operation), the edge is copied from the assertional database onto the heap. This copying can significantly slow down the parsing process when categories are large (such as typed feature structures).
A first approach to solving the copying problem was Carpenter's algorithm, which is used in ALE [Carpenter and Penn2001]. The algorithm traverses the input sentence breadth-first and from right to left. The grammar rules are traversed depth-first, from left to right, in a failure-driven loop. As a result, Carpenter's algorithm does not require active edges, since only one rule at a time is in the process of being completed. This is in contrast to a breadth-first traversal of rules, where a record of all rules that have matched partially with edges in the chart (but are not completed yet) must be kept.
The EFD parsing algorithm brings further improvements to Carpenter's algorithm, by limiting the number of copying operations to a maximum of two per edge. Apart from the benefit of reducing copying, the EFD parsing algorithm offers a better support for indexing, especially for Prolog implementations. The chart is not stored in the assertional database, but on the heap. Thus, indexing can be applied without copying edges.
The EFD algorithm is based on the assumption that the grammar is EFD-closed: all rules in the grammar have at least one daughter, and the leftmost daughter of each rule is blocked from deriving an empty category. The algorithm for transforming a phrase-structure grammar into an EFD-closed grammar is presented in [Penn and Munteanu2003]. During parsing, after an edge is added to the chart, a failure driven-loop visits the rules in the grammar and selects the ones for which the first (the leftmost) daughter unifies with the newly created edge. The chosen rule is then extended from left to right, starting with its second daughter, by matching the daughters with edges in the chart and empty categories.