|
|
It is not the purpose of this thesis to demonstrate the superiority of non-statistical optimizations over statistical ones. Non-statistical optimizations can be seen as a first step toward a highly efficient parser, while statistical optimization can be applied as a second step. Future work will investigate combinations of statistical and non-statistical methods to improve parsing times.
One of the purposes of non-statistical indexing is to alleviate the burden of training a statistically optimized parser by offering comparable improvements in parsing times with significantly lower set-up times. Therefore, a quick-check parser was also built and evaluated. A comparison between the set-up times for the indexed parsers and the quick-check parser was conducted in order to better reflect the advantage of non-statistical optimizations. The set-up times are presented in Table 7.3. For quick-check, training times are computed for a 300-sentence subset of the test corpus (as prescribed in [Malouf et al.2000]). The average parse time was 8.53 seconds per sentence, roughly the same average as for the entire test corpus.
The advantage of faster set-up times for positional and path indexing can be overshadowed by slower parsing times. However, as can be seen7.3 from Figure 7.5 and Figure 7.6, the average improvements brought by quick-check are less than (for the unconstrained MERGE) or comparable to (for the constrained MERGE) those of non-statistical indexing. For the unconstrained MERGE, quick-check improved the non-indexed EFD parsing times by only 6% (with a maximum of 42%), a value significantly lower than for the positional or path indexing. For the constrained MERGE, the average improvements of quick-check are 6% (with a maximum of 39%), which is with only 1% better than positional or path indexing. It should be mentioned that the quick-check evaluation presented in [Malouf et al.2000] uses only sentences with a length of at most 10 words, and the authors report only a mean figure for the parsing times (while not reporting any values for the set-up times). Also, the aforementioned paper does not specify if the training sentences are included in the test corpus; in the quick-check evaluation presented in this section, the test corpus contains the training sentences, thus offering an advantage to quick-check.
Although the number of failed unifications is smaller for the quick-check parser than for the path indexing parser (as shown in Table 7.4), the average parsing times are faster for path indexing than for quick-check in the unconstrained case. This is explained by the lower costs of managing the path index: verifying the unification of path index vectors is faster than verifying the unification of quick-check vectors. The cause for this difference resides in the static nature of non-statistical indexing: the path index vector is statically determined off-line or at compile time, and therefore the nodes at the end of the indexing paths are reachable before parsing starts, saving the time spent at run-time to fill the values in a template each time an edge is added to the chart or when a daughter is searched for in the chart. With quick-check, this is not the case - although the paths are computed off-line, many of them may not even be defined for a given feature structure.