Before presenting (in the following section) the experimental evaluation of the proposed indexing methods, the data structure used to encode TFSs in Prolog is introduced. It should be mentioned that the experiments are highly dependent on the chosen data structure, which is unrelated to the proposed indexing method. Therefore, details about the TFS encoding are given here, rather than in earlier chapters.
The feature structures are encoded as Prolog terms. From the existing
methods that efficiently encode TFSs ([Mellish1988],
[Gerdemann1995], [Penn1999a]), an optimized encoding
similar to the one presented in [Penn1999a] was chosen. As is
shown in the aforementioned paper, if the feature graph is
-colourable (but not
-colourable), the least number of
argument positions in a flat encoding is
. The feature graph is an
undirected graph where each vertex represents an introduced feature,
and each edge, a pair of features that are appropriate to a common
type [Penn1999a]. Each position
in a term represents a
feature with colour
in the feature graph. The encoding introduced
in this thesis extends the encoding from [Penn1999a] by
allowing for the enforcement of type constraints during unification of
typed feature structures.
Figure 7.1 presents the structure used to encode TFSs
as Prolog terms. The variable Type encodes the type of the
feature structure, while each variable F
represents a
feature value assigned to position
(coloured as
in the feature
graph). The arity of feats is constant, and is the minimum
number of colours
. Each of feats's arguments is either a
singleton variable (meaning that particular feature value is not
defined for the current type or is
), a shared variable
(representing structure sharing), or an embedded tfs term. It
is possible for a tfs term to have a variable as its second
argument (for structure sharing).
Types were encoded using the attributed variables library in SICStus Prolog [SICS2003]. While existing systems (such as ALE[Carpenter and Penn2001]) use a reference-based encoding for types, attributed variables are more suitable for indexing. Performing de-referencing each time a type is needed for the indexing key during parsing would result in slower parsing time. Attributed variables offer direct access to the encoded type.
In the Prolog encoding of a TFS (tfs(Type,Feats,TypeRef)) shown in Figure 7.1, the variable Type carries the term type(TypeValue,TypeRef,FeatRef) as an attribute. TypeValue represents the actual type value, while FeatRef is a pointer to the features arguments Feats in the TFS encoding tfs/3. TypeRef is a trigger variable (it has no value associated with it) used to enforce the type constraints. It is pointed to by the TypeRef variable from the tfs/3 term. Its purpose is to delay the activation of the constraint enforcing mechanism until the unification of all features in Feats is completed (considering that typical Prolog systems perform the unification of arguments in a term from left to right).
TFS for mary: tfs(Type1,Feats1,TypeRef1). Type1 attribute: type(sign,TypeRef1,Feats1). Feats1=feats(tfs(Type2,Feats2,TypeRef2),_). Type2 attribute: type(cat,TypeRef2,Feats2). Feats2=feats(tfs(Type3,Feats3,TypeRef3),tfs(Type4,Feats4,TypeRef4)). Type3 attribute: type(noun,TypeRef3,Feats3). Type4 attribute: type(head,TypeRef4,Feats4). Feats3=_, Feats4=_. |
Figure 7.2 presents the Prolog encoding for a typed feature structure representing the lexical entry for mary, given a type hierarchy and its coloured feature graph.