next up previous contents
Next: Rule Filtering Up: Previous Approaches to Indexing Previous: HPSG Parsing with CFG   Contents


Quick Check

Perhaps the most well known method used in speeding up the parsing times of TFSGs is the quick check filter. Initially mentioned in [Kiefer et al.1999], a detailed presentation of quick check is found in [Malouf et al.2000].

Quick check is an empirical method that tries to reduce the burden placed on parsing by the unification operation. Its goal is to quickly identify, before the unification of two feature structures is attempted, whether the unification will fail.

The unification between two typed feature structures is successful if all the equivalent paths in the two TFSs are compatible. The compatibility is determined by the type unification operation between feature values at the end of these paths. Quick check relies on the empirical observation (made by parsing a corpus of sentences) that only a small amount of paths are likely to cause the majority of unification failures.

In order to identify the failure-causing paths, a training phase is required. The parser records, for each failed unification between two TFSs, the path responsible for the failure. A corpus of sentences is parsed, and a set containing the paths that are most probable to cause failures is determined.

After training, when parsing sentences in the test corpus, the parser associates each category in the grammar with a quick check vector. The vector contains the types of the feature values extracted from the ends of the paths selected during training. For a given grammar, a position $ i$ in a quick check vector will always be filled with the type extracted from the same path $ \pi_i$ for any feature structure. Before a unification is attempted, the quick check vectors of the two typed feature structures are matched. Only if their values are compatible is the full unification performed.

For example, assume that after training paths $ F:G$ and $ H:G:I$ are selected as the most probable to cause unification failure. The quick check vector is of the form $ \langle v_1, v_2 \rangle$, where the type extracted from path $ F:G$ will fill the value $ v_1$ and the type extracted from path $ H:G:I$ will fill the value $ v_2$ for any feature structure. For the two feature structures presented in Figure 4.2, their corresponding quick check vectors will be $ \langle t_5, t_5 \rangle$ for $ F_1$ and $ \langle
\bot, t_6 \rangle$ for $ F_2$. Before unifying $ F_1$ and $ F_2$, their quick check vectors are compared. The unification is attempted only if $ t_5 \sqcup t_6 \downarrow$.

Figure 4.2: Quick check vectors
\begin{figure}\begin{displaymath}F_1 = \left[\begin{array}{ll}
t_1 & \\
F: & ...
... quick check vector}: \langle \bot, t_6 \rangle
\end{displaymath}
\end{figure}

Quick check proves to be efficient when evaluated on a large-scale TFSG (an average improvement of 34% is reported in [Malouf et al.2000]). However, its success depends on the degree of similarity between the training corpus and the test corpus. Another requirement is that only a small number of paths should be accountable for the unification failures. It is not desirable that the the distribution of probabilities of paths to cause a failure should be flat.


next up previous contents
Next: Rule Filtering Up: Previous Approaches to Indexing Previous: HPSG Parsing with CFG   Contents