next up previous contents
Next: Evaluation on Other UBGs Up: Experiments Previous: Evaluation using the constrained   Contents


Comparison between statistical and non-statistical optimizations


Table 7.3: The set-up times for non-statistically indexed parsers and statistically optimized parsers for MERGE gramamr.
  Positional Path Quick
  Indexing Indexing Check
Compiling type unification 5'6"
Compiling grammar rules 25"
Compiling positional index 2" 2" -
Compiling static cut - 1'6" -
Compiling indexing paths - 6'45" -
Run 300-sentence training - - 42'41"
Total set-up time 5'33" 13'24" 48'12"


Figure 7.5: Parsing times for EFD and EFD with quick-check applied to the unconstrained MERGE grammar. The sentence numbers are the same as those used in Figure 7.3.
\includegraphics[width=\textwidth]{merge_new_tests_quick.eps}

Figure 7.6: Parsing times for EFD and EFD with quick-check applied to the constrained MERGE grammar. The sentence numbers are the same as those used in Figure 7.4.
\includegraphics[width=\textwidth]{merge_new_tests_quick_constr.eps}

It is not the purpose of this thesis to demonstrate the superiority of non-statistical optimizations over statistical ones. Non-statistical optimizations can be seen as a first step toward a highly efficient parser, while statistical optimization can be applied as a second step. Future work will investigate combinations of statistical and non-statistical methods to improve parsing times.

One of the purposes of non-statistical indexing is to alleviate the burden of training a statistically optimized parser by offering comparable improvements in parsing times with significantly lower set-up times. Therefore, a quick-check parser was also built and evaluated. A comparison between the set-up times for the indexed parsers and the quick-check parser was conducted in order to better reflect the advantage of non-statistical optimizations. The set-up times are presented in Table 7.3. For quick-check, training times are computed for a 300-sentence subset of the test corpus (as prescribed in [Malouf et al.2000]). The average parse time was 8.53 seconds per sentence, roughly the same average as for the entire test corpus.

The advantage of faster set-up times for positional and path indexing can be overshadowed by slower parsing times. However, as can be seen7.3 from Figure 7.5 and Figure 7.6, the average improvements brought by quick-check are less than (for the unconstrained MERGE) or comparable to (for the constrained MERGE) those of non-statistical indexing. For the unconstrained MERGE, quick-check improved the non-indexed EFD parsing times by only 6% (with a maximum of 42%), a value significantly lower than for the positional or path indexing. For the constrained MERGE, the average improvements of quick-check are 6% (with a maximum of 39%), which is with only 1% better than positional or path indexing. It should be mentioned that the quick-check evaluation presented in [Malouf et al.2000] uses only sentences with a length of at most 10 words, and the authors report only a mean figure for the parsing times (while not reporting any values for the set-up times). Also, the aforementioned paper does not specify if the training sentences are included in the test corpus; in the quick-check evaluation presented in this section, the test corpus contains the training sentences, thus offering an advantage to quick-check.

Although the number of failed unifications is smaller for the quick-check parser than for the path indexing parser (as shown in Table 7.4), the average parsing times are faster for path indexing than for quick-check in the unconstrained case. This is explained by the lower costs of managing the path index: verifying the unification of path index vectors is faster than verifying the unification of quick-check vectors. The cause for this difference resides in the static nature of non-statistical indexing: the path index vector is statically determined off-line or at compile time, and therefore the nodes at the end of the indexing paths are reachable before parsing starts, saving the time spent at run-time to fill the values in a template each time an edge is added to the chart or when a daughter is searched for in the chart. With quick-check, this is not the case - although the paths are computed off-line, many of them may not even be defined for a given feature structure.


Table 7.4: The number of successful and failed unifications for the non-indexed, path-indexed, and quick-check parsers over the unconstrained MERGE grammar. The sentence numbers are the same as those used in Figure 7.3 and in Table 7.1.
Sentence Successful Failed unifications
number unifications EFD EFD with EFD with
      path index quick-check
101 29 217 119 26
851 95 643 282 169
1357 355 2983 1594 748
1816 557 8626 4676 1821



next up previous contents
Next: Evaluation on Other UBGs Up: Experiments Previous: Evaluation using the constrained   Contents