% ALE:  The Attribute Logic Engine  /  User's Guide Version 3.2.1
% Bob Carpenter, SpeechWorks Research
% Gerald Penn, Department of Computer Science, University of Toronto
% Process in LaTeX

% STYLE
% ==============================================================================

\documentclass[fleqn,leqno,11pt,twoside]{report}
\usepackage{epsf}
\pagestyle{headings}

\setlength{\oddsidemargin}{0.625in}
\setlength{\evensidemargin}{0.625in}
\setlength{\textwidth}{5.5in}

\setlength{\topmargin}{-0.25in}
\setlength{\textheight}{9.0in}


\setlength{\headsep}{0.275in}            

\sloppy

\raggedbottom


% AV-Logic Commands
% -----------------
\newcommand{\type}[1]{\mbox{\bf #1}}

% DAGs
% -------------
\newcommand{\feat}[1]{\mbox{{\sc #1}}}
\newcommand{\ind}[1]{\setlength{\fboxsep}{2pt}\fbox{\scriptsize #1}}
\newcommand{\sdg}[2]{\left[ 
                         \! \!
                         \begin{array}{l}
                           #1 \\
                           #2
                         \end{array}
                         \! \!
                       \right] }
\newcommand{\sdi}[2]{#1 \fv \! #2 \\[2pt]}
\newcommand{\fv}{\colon}

% Text Commands
% -------------

\newcommand{\acronym}[1]{{\sc #1}\index{{\sc #1}}}

% Picture Commands
% ----------------
\newcommand{\picpar}[1]{\begin{minipage}[t]{0pt}
                        \begin{tabbing}#1\end{tabbing}
                        \end{minipage}}
\newcommand{\puttop}[2]{\put#1{\makebox(0,0)[t]{#2}}}
\newcommand{\putleft}[2]{\put#1{\makebox(0,0)[l]{#2}}}
\newcommand{\putcenter}[2]{\put#1{\makebox(0,0){#2}}}

% References
% ----------
\newcommand{\annotate}[1]{\begin{quote}
                          \vspace{-8pt} {\small #1}
                          \end{quote}}

% Error Messages
% --------------
\newcommand{\errorcond}[2]{\mbox{ } \\[-6pt]
                           {\tt #1}
                           \begin{quote} \vspace{-4pt} #2 \end{quote}}

% Sections
% --------

\newcommand{\mychapter}[1]{\chapter{#1}}
\newcommand{\mychapterstar}[1]{\chapter*{#1}}
\newcommand{\mysection}[1]{\section{#1}}
\newcommand{\mysectionstar}[1]{\section*{#1}}
\newcommand{\myappendix}[1]{\chapter{#1}}
\newcommand{\mysubsection}[1]{\subsection{#1}}
\newcommand{\mysubsubsection}[1]{\subsubsection*{#1}}


% DOCUMENT
% ===================================================



\begin{document}

\begin{titlepage}
\begin{center}
\mbox{ } \\[.75in]
\mbox{ }\fbox{\Huge\bf ALE}\mbox{ }
\mbox{ } \\[.5in]
{\Huge\bf The Attribute Logic Engine} \\[.25in]
{\huge\bf User's Guide} \\[1.0in]
{\LARGE Version 3.2.1} \\[6pt]
{\Large December 2001} \\[1.5in]
% {\Large Bob Carpenter} \qquad \qquad {\Large Gerald Penn} \\[3pt]
\hspace*{0.7cm}
\begin{tabular}{lll}
{\Large Bob Carpenter} & \hspace*{2cm} & {\Large Gerald Penn} \\[3pt]
{\large SpeechWorks Research} & & {\large Department of Computer Science}
 \\[3pt]
{\large 55 Broad St.} & & {\large University of Toronto} \\[3pt]
{\large New York, NY 10004} & & {\large 10 King's College Rd.} \\[3pt]
{\large USA} & & {\large Toronto M5S 3G4} \\[3pt]
& & {\large Canada} \\[3pt]
{\large carp{\normalsize @}colloquial.com} & & {\large gpenn{\normalsize @}cs.toronto.edu} \\[1in]
\end{tabular}
\end{center}

\begin{center}
\begin{tabular}{l}
\copyright 1992--1995, Bob Carpenter and Gerald Penn\\
\copyright 1998, 1999, 2001, Gerald Penn
\end{tabular}
\end{center}
\end{titlepage}


\pagenumbering{roman}

\tableofcontents

\mychapterstar{Preface -- Version 3.0}

{\sc ale} 3.0 is completely compatible with {\sc ale} 2.0 grammars,
and adds the following new features:
%
\begin{itemize}
\item A semantic-head-driven generator, based on the algorithm
  presented in Shieber et al. (1990).  The generator was adapted to
  the logic of typed feature structures by Octav Popescu in his
  Carnegie Mellon Master's Thesis, Popescu (1996).  Octav also wrote
  most of the generation code for this release.  Grammars can be
  compiled for parsing only, generation only, or both.  Some glue-code
  is also available from the {\sc ale} homepage, to parse and generate
  with different grammars through a unix pipe.
\item A source-level debugger with a graphical XEmacs interface.
  This debugger works only with SICStus Prolog 3.0.6 and higher.  A
  debugger with reduced functionality will be made available to SWI
  Prolog users in a later release.  This debugger builds on, and
  incorporates the functionality of the code for the SICStus
  source-level debugger, written by Per Mildner at Uppsala University.
\item Functional descriptions.  Instead of binding variables in a
  description and calling a procedural attachment, e.g., {\tt a cons
  f:X,g:Y,h:Z goal append(X,Y,Z)}, it is now possible to incorporate
  certain functional relations into descriptions themselves, e.g., {\tt
  a cons f:X,g:Y,h:append(X,Y)}.
\item {\tt a\_/1} atoms.  There are now an infinite number of atoms
  (types with no appropriate features), implicitly declared in every
  signature.  These atoms can be arbitrary Prolog terms, including
  unbound variables, and can be used wherever normal {\sc ale} types
  can, e.g., {\tt f:(a\_ p(3.7))}.  {\tt a\_/1} atoms are extensional
  as Prolog terms, i.e., are taken to be identical according to the
  Prolog predicate, {\tt ==/2}.  In particular, this means that ground
  atoms behave exactly as {\sc ale} extensional types.
\item Optional edge subsumption checking.  For completeness of
  parsing, one only needs to ensure that, for every pair of nodes in
  the chart, the most general feature structure spanning those nodes
  is stored in the chart.  This can reduce the number of edges in many
  domains.
\item An autonomous {\tt intro/2} operator.  Features can now be
  declared on their own in a separate part of the grammar.
\item Default specifications for types.  These are NOT default types.
  If a type appears on the right-hand side of a {\tt sub/2} or {\tt
    intro/2} specification, but not on the left-hand side of one, {\sc
    ale} will assume this type is maximal, i.e., assume the
  specification, {\tt $Type$ sub [].}  Similarly, if it occurs on a
  left-hand side, but not on a right-hand side, {\sc ale} will assume
  the type is immediately subsumed by {\tt bot}, the most general
  type.  In both cases, {\sc ale} will announce these assumptions
  during compilation.
\item Several bug corrections and more compile-time warning and error
  messages.
\item An SWI Prolog 2.9.7 port.  Version 3.0 will be the last version
  for which we will port the system to Quintus Prolog.  We will now
  support {\sc ale} for SICStus Prolog and SWI Prolog.  The SICStus
  version of the system works for SICStus Prolog 3.0.5 and higher,
  except for the source-level debugger, which requires version 3.0.6.
\end{itemize}

We would like to thank all of the users who have supplied us with
feedback and suggestions over the past few years for how to improve
{\sc ale}.  In particular, we would like to thank Ion Androutsopoulos,
Mike Calcagno, Mats Carlsson, Frederik Fouvry, Gertjan van Noord,
Peter van Roy, Margriet Verlinden, and Jan Wielemaker for their
patient assistance.  This release incorporates many, but unfortunately
not all of those changes.  Quite a few more will be made in the next
several releases, along with many performance improvements.
\\
\mbox{ } \hfill Bob Carpenter and Gerald Penn
\\
\mbox{ } \hfill Murray Hill and Tuebingen, March 1998


\mychapterstar{Preface -- Version 2.0}

{\sc ale} 2.0 is a proper extension of version 1.0.  Specifically,
version 2.0 will run any grammar that will run under version 1.0.  But
version 2.0 includes many extensions to version 1.0, including the
following.
%
\begin{itemize}
\item  Inequations
\item  Extensionality
\item  General Constraints on Types
\item  Mini-interpreter
\item  Error-suppression
\end{itemize}
%
Inequations allow inequational constraints to be imposed between two
structures.  Extensionality allows structures of specified types to be
identified if they are structurally identicial.  Together, these
provide the ability to simulate Prolog II programs (Colmerauer 1987).
{\sc ale} 2.0 also allows general constraints to be placed on types,
using arbitrary descriptions from the constraint language, including
path equations, inequations and disjunctions, and procedural
attachments.  It also has a mini-interpreter, which allows the user to
traverse and edit an {\sc ale} parse tree.  Error messages for
incompatible descriptions are now automatically disabled during
lexicon and empty category compilation.

The second release of {\sc ale}, Version 2.0, is based on an extension of
the first version of {\sc ale}, that was completed for Gerald Penn's (1993)
MS Project in the Computational Linguistics Program at Carnegie Mellon
University.

There are many people whom we would like to thank for their comments
and feedback on version 1.0 and $\beta$-versions of 2.0.  These people
have actually used the system in their research and have thus had the
best opportunity to provide us with practical feedback.  First, we
would like to thank the first group of users, housed at Sharp
Laboratories of Europe, located in Oxford, England, including Pete
Whitelock, Antonio Sanfillipo, and Osamu Nishida.  They not only used
the system but provided feedback on the code.  Secondly, the group at
University of T\"ubingen, who are developing a competing system,
Troll, have rigorously tested existing systems, including {\sc ale}, both
for their ability to express grammars naturally and for efficiency.
Specifically, we would like to thank Detmar Meurers, Dale Gerdemann,
Thilo G\"otz, Paul King, John Griffith, and Erhard Hinrichs.  John and
Thilo also provided the changes necessary for the system to run
directly in Quintus Prolog.  This group is undoubtedly the best
informed when it comes to implemented grammar formalisms.  We would
also like to thank the grammar development group at Stanford
University, including Ivan Sag, Chris Manning, Suzanne Riehemann.  We
would further like to thank Bob Kasper, Carl Pollard, and Andreas
Kathol of the Ohio State University, for a great deal of feedback on
the design of HPSG grammars in general, and {\sc ale} implementations of
them in particular.  Chris Manning, in addition, found a bug in
SICStus Prologs prior to 2.1.8, which prevented cyclic structures from
being used in completed chart edges, a bug found by both Steven Bird
of Edinburgh and C.~J. Rupp of IDSIA.  Their feedback on Bob
Carpenter's prototype implementation of HPSG for English led to the
design of Gerald Penn's much more comprehensive implementation of HPSG
and was the primary impetus for the importation of general type
constraints into version 2.0.  Next, we would like to thank Claire
Gardent, who has been using {\sc ale} to develop discourse grammars in
Amsterdam.  We should also thank Carsten Guenther and Markus Walther,
of the Universities of Hamburg and D\"usseldorf, respectively, who
have used the system to develop phonological grammars.  Finally, we
should thank Michael Mastroianni, who implemented a comprehensive
approach to constraint-based phonology in {\sc ale} (Mastroianni 1993).  He
suffered through early, buggy versions of the system, thus sparing the
rest of us much of that pain. The feedback we received from these
users was invaluable.

We would like to thank EAGLES, the European Advisory Group on
Linguistic Engineering Standards, for allowing us to present our
system at a meeting in Saarbr\"ucken in March 1993 of the European
Expert Group on Linguistic Formalisms devoted to implemented
formalisms.  We learned a great deal from the other participants in
the workshop including especially Jochen D\"orre, Michael Dorna, and
Martin Emele, of Stuttgart, and Andreas Podelski, then associated with
the Digital Equipment Paris Research Lab.  We also benefitted from
discussions with Hans Uszkoreit, Rolf Backofen, and Uli
Krieger, of Saarbr\"ucken,  Steve Pulman from SRI in Cambridge, and
C.~J.~Rupp and Graham Russell, of ISSCO in Switzerland.

We had many discussions of the {\sc ale} formalism at the HPSG workshop
running concurrently with the LSA Linguistic Institute in Columbus.
We would especially like to thank Gregor Erbach for comments on our
system, including benchmark test results.  We would also like to thank
Hiroshi Tusda, of the Institute for New Gernateion Computer
Technology, for discussion of our systems and comparisons to his
system, cu-Prolog.  We also discussed {\sc ale} heavily during the
workshop on implementations of attribute-value logics, during the 1993
Summer School on Logic, Language, and Information in Lisbon, Portugal.
We especially benefitted from discussions with Suresh Manandhar, of
the University of Edinburgh, and Gerrit Rentier of Tilburg University,
and Gert Webelhuth of the University of North Carolina, among those we
have not already thanked.  We also benefitted from discussions with Ed
Stabler and Mark Johnson, and from sitting in on their class on the
implementation of constraint-based grammars.

We would also like to thank Ann Copestake and Ted Briscoe, of the
Cambridge Computing Laboratory, for feedback on the design of the
system.  

We would like to thank Richard O'Keefe, who provided some invaluable
feedback on coding style.  Of course, any glitches or failure to
follow his excellent example are our own.

We would also like to thank Elizabeth Hinkelman, who runs the Software
Registry, and Mark Kantrowitz, who administers the Prolog Resource
Guide and the Prime Time Freeware for AI CD-ROM.  They have helped in
publicizing the system description as well as providing access.  

The extensions we have
not made, though would like to, include the addition of:
\begin{itemize}
\item  Primitive, Atomic Data Types
\item  Parametric Types
\item  Partial Type Inference
\item  Assert Mode in Compiler
\item  Peephole Code Optimization
\item  Subsumption Checking of Chart Edges
\end{itemize}
The incorporation of Prolog data types such as Real, Integer,
Character, and String, is straightforward theoretically, but not so
straightforward in terms of {\sc ale}.  The same goes for parametric
polymorphism at the type level.  Partial type inference could provide
a great deal of optimization in some circumstances.  We could not
figure out how to incoprorate these three changes without drastically
modifying the underlying representations and algorithms.  The
remaining changes are lying dormant because we have other obligations.

The next wave of development of attribute-logic grammars should not be
in Prolog, but rather through the use of a direct abstract machine.
Bob Carpenter has worked on an abstract machine with Yan Qu, in the
context of her MS project in the Carnegie Mellon Computational
Linguistics Program, and with Shuly Wintner, of the Technion, in
Haifa, Israel, who is writing a PhD dissertation on the topic.  Such
an undertaking is also underway among the LIFE community, led by
Hassan A\"{\i}t-Kaci, Andreas Podelski, and Peter van Roy.

We would like to thank a number of people for discovering bugs and
providing comments on Version 2.0:  Ingo Schroeder, Frank Morawietz,
Detmar Meurers, Rob Malouf, Frederik Fouvry, Jo Calder, and Suresh
Manandhar.

Finally, we would like to thank Jo Calder,Chris Brew, Kevin Humphreys,
and Mike Reape, who developed the Pleuk grammar development
environment as well as interfacing it to {\sc ale}.  Details of that system
can be found in the appropriate Appendix.  

This material is based upon work supported under a National Science
Foundation Graduate Research Fellowship (for Gerald Penn).  Any
opinions, findings, conclusions or recommendations expressed in this
publication are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation.
\\
\mbox{ } \hfill Bob Carpenter and Gerald Penn
\\
\mbox{ } \hfill Pittsburgh,  August 1994 


\mychapterstar{Preface -- Version Beta}

A number of people have asked me to make this system, along with its
documentation, available to the public.  Now that it's available, I
hope that it's useful.  But a word of caution is in order.  The system
is still only a prototype, hence the label ``version $\beta$.''  

Any bug reports would be greatly appreciated.  But what I'd really
like is comments on the functionality of the system, as well as on the
utility of its documentation.  I am also interested in hearing of any
applications that are made of the system.  I would also be glad to
answer questions about the system.  I have tried to document the
strategies used by {\sc ale} in this guide.  I have also tried to
comment the code to the point where it might be adaptable by others.
I would, of course, be interested in any kind of improvements or
extensions that are discovered or developed, and would like to have
the chance to incorporate any such improvements in future versions of
this package.

In the implementation, I have endeavored to follow the logic
programming methodology laid out by O'Keefe (1990), but there are many
spots where I have fallen short.  Thus the code is not as fast as it
could be, even in Prolog.  But I view this system more as a prototype,
indicating the utility of a typed logic programming and grammar
development system.  Borrowing techniques from the {\sc wam} directly,
implementing an abstract machine C, would lead to roughly a 100-fold
speedup, as there is no reason that {\sc ale} should be slower than
Prolog itself.

I would like to acknowledge the help of Gerald Penn in working through
many implementation details of a general constraint resolver, which
was the inspiration for this implementation.  This version of the
system is a great improvement on the last version due to Gerald's work
on the system.  Secondly, I would like to thank Michael Mastroianni,
who has actually used the system to develop grammars for phonology.
Finally, I would like to thank Carl Pollard and Bob Kasper for looking
over a grammar of {\sc hpsg} coded in {\sc ale} and providing the
impetus for the inclusion of empty categories and lexical rules.

The system is available without charge from the author.  It is
designed to run in either SICStus or Quintus Prologs.  
\\[4pt]
\mbox{ } \hfill Bob Carpenter \\
\mbox{ } \hfill Pittsburgh, 1992







\cleardoublepage
\pagenumbering{arabic}
\setcounter{page}{1}

\mychapter{Introduction}

This report serves as an introduction to both the {\sc ale} formalism
and its Prolog implementation.  {\sc ale} is an integrated phrase
structure parsing and definite clause logic programming system in
which the terms are typed feature structures.  Typed feature
structures combine type inheritance and appropriateness specifications
for features and their values.  The feature structures used in {\sc
ale} generalize the common feature structure systems found in the
linguistic programming systems {\sc patr-ii} and {\sc fug}, the
grammar formalisms {\sc hpsg} and {\sc lfg}, as well as the logic
programming systems Prolog-II and {\sc login}.  Programs in any of
these languages can be encoded directly in {\sc ale}.  

Terms in grammars and logic programs are specified in {\sc ale} using
a typed version of Rounds and Kasper's attribute-value logic with
variables.  At the term level, we have variables, types, feature value
restrictions, path equations, inequations, general constraints, and
disjunction.  The definite clause programs allow disjunction, negation
and cut, specified with Prolog syntax.  Phrase structure grammars are
specified in a manner similar to {\sc dcg}s, allowing definite clause
procedural attachment.  The grammar formalism also fully supports
empty categories.  Lexical development is supported by a very general
form of lexical rule which operates on both categories and surface
strings.  Macros are available to help organize large descriptions,
either in programs or in grammars.  Both definite clause programs and
grammars are compiled into abstract machine instructions.  These
instructions are then interpreted by an emulator compiled from the
type specifications.  Like Prolog compilers, a structure copying
strategy is used for matching both definite clauses and grammar rules.

For parsing, {\sc ale} compiles from the grammar specification a
Prolog-optimized bottom-up, dynamic chart parser.  Definite clauses
are also compiled into Prolog.  As it stands, the current version of
{\sc ale}, running the feature-structure-based naive reverse of a
30-element list on top of the SICStus 3.7 native code compiler, runs
at roughly 152,300 logical inferences per second (LIPS) on a Sun Ultra
Enterprise 450.  This is slightly faster than the speed of the SICStus
3.7 interpreter on the Prolog naive reverse, 60\% as fast as SWI
Prolog 3.2.7, and about 9\% as fast as the SICStus 3.7 compact-code
compiler.  The definite clause compiler performs last call
optimization, but does not index first arguments or use specialised
list cells at the WAM level.  Thus it will perform relatively well
versus non-optimized interpreters, but lag further behind compiled
grammars when programs are written in a more sophisticated manner than
naive reverse.

Full details of the theory behind {\sc ale} can be found in Carpenter
(1992).  

The user who is only interested in definite clause programming can
skip the material on phrase structure grammars, while those interested
in only grammars without procedural attachments may skip the material
in the section on definite clauses.



\mychapter{Prolog Preliminaries}

While it is not absolutely necessary, some familiarity with logic
programming in general, and Prolog in particular, is helpful in
understanding the definite clause portion of {\sc ale}.  Similarly,
experience with unification grammar systems such as {\sc patr-ii},
{\sc dcg}s, or {\sc fug} is helpful in understanding the phrase
structure component of the system.  In particular, writing efficient
programs and grammars in {\sc ale} involves the same kinds of
strategies necessary for writing efficient programs in Prolog or {\sc
patr-ii}.  For those not familiar with Prolog, the sequence of two
books by Sterling and Shapiro (1986) and by O'Keefe (1990) are
excellent general introductions to the theory and practice of logic
programming.  For those not familiar with unification-based grammar
formalisms, Shieber (1986), Gazdar and Mellish (1987) and Pereira and
Shieber (1987) are useful resources.

For those not familiar with Prolog, we need to point out the salient
features of the language which will be assumed throughout this report.
This section contains all of the information necessary about Prolog
required to run {\sc ale}.  


\mysection{Terms}

A Prolog {\em constant} is composed of either a sequence of
characters and/or under-scores, beginning with a lower case letter, a
number, or any sequence of symbols surrounded by apostrophes.  So,
{\tt abc, johnDoe, b\_17, 123, 'JohnDoe', '65\$'}, and {\tt '\_65a.'}
are constants, but {\tt A19, JohnDoe, B\_112, \_au8}, and {\tt [dd,e]}
are not (although see the warning at the end of this section).  
 A variable, on the other hand, is any string of letters,
underscores or numbers beginning with a capital letter.  Thus {\tt C,
C\_foo,} and {\tt TR5ab} are variables, but {\tt 1Xa, aXX}, and {\tt
\_Xy}%
\footnote{Technically, a variable may begin with an underscore, but
such variables, said to be {\em anonymous}, have a very different
status than those which begin with a capital letter.  The use of
anonymous variables is discussed later.}
%
are not.  

In general, it is a bad idea to have constants or variables
which are only distinguished by the capitalization of some of their
letters.  For instance, while {\tt aBa} and {\tt aba} are different
constants, they should not both be used in one program.  One reason
for this in the context of {\sc ale} is that the output routines adopt
standard capitalization conventions which hide the differences between
such constants.  

{\bf Warning:}  As pointed out to us by Ingo Schroeder, constants or
atoms beginning with a capital letter are not treated properly by the
compiler.  Thus constants such as {\tt 'Foo'} should not be used.  



\mysection{Space and Comments}

In your own program and grammar files, extra {\em whitespace} between
symbols beyond that needed to separate constants or variables is
ignored.  Whitespace consists of either spaces, blank lines or line
breaks are ignored.  This allows you to format your programs in a
manner that is readable.  Furthermore, any symbols on a line appearing
after a {\tt \%} symbol are treated as {\em comments} and ignored.


\mysection{Running Prolog}

To fire up Prolog locally, you should contact your systems
administrator.  You should have either SICStus or SWI Prolog, or a
Prolog compiler compatible with one of these.  Once Prolog is fired
up, you will see a {\em prompt}.  The Prolog prompt should look like:
%
\begin{verbatim}
  | ?-
\end{verbatim}
%

It is important that Prolog be invoked from a directory for which the
user has write permission.  {\sc ale}, in the process of compiling
user programs, writes a number of local files.

\mysection{Queries}

What you type after the prompt is called a {\em query}.  Queries
should always end with a period and be followed by a carriage return.
In fact, all of the grammar rules, definite clauses, macros and
lexical entries in your programs should also end with periods.  Most
of the interface in {\sc ale} is handled directly by top-level Prolog
queries.  Many of these will return {\tt yes} or {\tt no} after they
are called, the significance of which within {\sc ale} is explained on
a query by query basis.


\mysection{Running ALE}

To run {\sc ale}, it is only necessary to type the following query:
%
\begin{quote}
  {\tt | ?- compile({\it File}).}\label{COMPILE}  
\end{quote}
%
where {\it File} is the file in which the file {\tt ale.pl} resides.
SWI Prolog users must type:
%
\begin{quote}
  {\tt | ?- consult({\it File}).}\label{CONSULT}
\end{quote}
%
Note that {\it File} does not have to be local to the directory from
which Prolog was invoked.  

In SICStus Prolog, when {\sc ale} is loaded, it turns on Prolog
  character escapes.  {\sc ale} will not be able to generate code
  properly during compilation without this.

\mysection{Exiting Prolog and Breaking}

To exit from Prolog, you can type {\tt halt}\label{HALT} at any prompt
(followed by a period, of course).

If you find Prolog hanging at some point, and you are working on a
standard Unix implementation, typing {\tt control-c}\label{CTRLC}
should produce something like the following message:
%
\begin{verbatim}
  Prolog interruption (h for help)?
\end{verbatim}
%
You should reply with the character {\tt a}, with or without a
following period, followed by a carriage return.  If this doesn't
work, typing {\tt control-z}\label{CTRLZ} should take you out of
Prolog altogether.


\mysection{Saved States}

All information concerning an {\sc ale} state is encoded in the
current Prolog state.  Thus, any options presented by the local system
to save Prolog states should be able to save {\sc ale} states.  



\mychapter{Feature Structures, Types and Descriptions}

This section reviews the basic material from Carpenter (1992), Chapters
1--10, which is necessary to use {\sc ale}.  


\mysection{Inheritance Hierarchies}

{\sc ale} is a language with strong typing.  What this means is that
every structure it uses comes with a type.  These types are arranged
in an inheritance hierarchy, whereby type constraints on more general
types are inherited by their more specific subtypes, leading to what
is known as {\em inheritance-based polymorphism}.  Inheritance-based
polymorphism is a cornerstone of object-oriented programming.  In this
section, we discuss the organization of types into an inheritance
hierarchy.  Thus many types will have {\em subtypes}, which are more
specific instances of the type.  For instance, {\tt person} might have
subtypes {\tt male} and {\tt female}.  

{\sc ale} does much of its processing of types at compile time, as it
is reading and processing the grammar file.  Thus the user is required
to declare all of the types that will be used along with the subtyping
relationship between them.  An example of a simple
{\sc ale} type declaration is as follows:
%
\begin{verbatim}
  bot sub [b,c].          % two basic types -- b and c
    b sub [d,e].
      d sub [g,h].        
      e sub [].
    c sub [d,f].          % b and c unify to d
      f sub [].
\end{verbatim}
%
There are quite a few things to note about this declaration.  The
types declared here are \label{BOT}{\tt bot, b, c, d, e, f} and {\tt
  g}.  Note that each type that is mentioned gets its own
specification.  Of course, the whitespace is not important, but it is
convenient to have each type start its own line.  A simple type
specification consists of the name of the type, followed by the
keyword {\tt sub}\label{SUB}, followed by a list of its subtypes
(separated by whitespace).  In this case, {\tt bot} has two subtypes,
{\tt b} and {\tt c}, while {\tt f, d} and {\tt e} have no subtypes.
The subtypes are specified by a Prolog list.  In this case, a Prolog
list consists of a sequence of elements separated by commas and
enclosed in square brackets.  Note that no whitespace is needed
between the list brackets and types, between the types and commas, or
between the final bracket and the period.  Whitespace is only needed
between constants.  The extra whitespace on successive lines is
conventional, indicating the level in the ordering for the user, but
is ignored by the program.  Also notice that there are comments on two
of the lines; recall that comments begin with a {\tt
  \%}\label{PERCENT} sign and continue the length of the line.  Every
type (except {\tt a\_/1} atoms, discussed below) must have at most one
{\tt sub} declaration, i.e., all of the immediate subtypes must be
declared in one declaration.

The subtyping relation is only specified by {\it immediate} subtyping
declarations; but subtyping itself is transitive.  Thus, in the
example, {\tt d} is a subtype of {\tt c}, and {\tt c} is a subtype of
{\tt bot}, so {\tt d} is also a subtype of {\tt bot}.  The user only
needs to specify the direct subtyping relationship.  The transitive
closure of this relation is computed by the compiler.  While redundant
specifications, such as putting {\tt d} directly on the subtype list
of {\tt bot}, will not alter the behavior of the compiler, they are
confusing to the reader of the program and should be avoided.  In
addition, the derived transitive subtyping relationship must be
anti-symmetric.  In particular, this means that there should not be
two distinct types each of which is a subtype of the other.

There are two additional restrictions on the inheritance hierarchy
beyond the requirement that it form a partial order.  First, there is
a special type {\tt bot}, which must be declared as the unique most
general type.  In other words, every type must be a subtype of {\tt
bot}.  If a type is used on the left-hand side of a {\tt sub}
declaration, but never declared as a sub-type of anything else, it is
assumed that this type is an immediate subtype of {\tt bot}.
Similarly, {\sc ale} assumes that all types for which no subtypes are
declared are maximal, i.e., have no subtypes.

The second and more subtle restriction on type hierarchies is that
they be {\em bounded complete}.  Since type declarations must be
finite, this amounts to the restriction that every pair of types which
have a common subtype have a unique most general common subtype.  In
the case at hand, {\tt b} and {\tt c} have three common subtypes, {\tt
d, g}, and {\tt h}.  But these subtypes of {\tt b} and {\tt c} are
ordered in such a way that {\tt d} is the most general type in the
set, as both {\tt g} and {\tt h} are subtypes of {\tt d}.  An example
of a type declaration violating this condition is:
%
\begin{verbatim}
  bot sub [a,b].
    a sub [c,d].
      c sub [].
      d sub [].
    b sub [c,d].
\end{verbatim}
%
The problem here is that while {\tt a} and {\tt b} have two common
subtypes, namely {\tt c} and {\tt d}, they do not have a most general
common subtype, since {\tt c} is not a subtype of {\tt d}, and {\tt d}
is not a subtype of {\tt c}.  In general, a violation of the bounded
completeness condition such as is found in this example can be patched
without destroying the ordering by simply adding additional types.
In this case, the following type hierarchy preserves all of the
subtyping relations of the one above, but satisfies bounded
completeness:
%
\begin{verbatim}
  bot sub [a,b].
    a sub [e].
      e sub [c,d].
        c sub [].
        d sub [].
    b sub [e].
\end{verbatim}
%
In this case, the new type {\tt e} is the most general subtype of {\tt
a} and {\tt b}.  

This last example brings up another point about inheritance
hierarchies.  When a type only has one subtype, the system provides a
warning message (as opposed to an error message).  This condition will
not cause any compile-time or run-time errors, and is perfectly
compatible with the logic of the system.  It is simply not a
very good idea from either a conceptual or implementational point of
view.  For more on this topic, see Carpenter (1992:Chapter 9).  
 

\mysection{Feature Structures}

The primary representational device in {\sc ale} is the {\em typed
feature structure}.  In phrase structure grammars, feature structures
model categories, while in the definite clause programs, they serve
the same role as first-order terms in Prolog, that of a universal data
structure.  Feature structures are much like the frames of AI systems,
the records of imperative programming languages like C or Pascal, and
the feature descriptions used in standard linguistic theories of
phonology, and more recently, of syntax.

Rather than presenting a formal definition of feature structures,
which can be found in Carpenter (1992:Chapter 2), we present an
informal description here.  In fact, we begin by discussing feature
structures which are not necessarily well-typed.  In the next section,
the type system is presented.

A feature structure consists of two pieces of information.  The first
is a type.  Every feature structure must have a type drawn from the
inheritance hierarchy.  The other kind of information specified by a
feature structure is a finite, possibly empty, collection of
feature/value pairs.  A feature value pair consists of a feature and a
value, where the value is itself a feature structure.  The difference
between feature structures and the representations used in phonology
and in {\sc gpsg}, for instance, is that it is possible for two different
substructures (values of features at some level of nesting) to be
token identical in a feature structure.  Consider the following
feature structure drawn from the lexical entry for {\em John} in the
categorial grammar in the appendix, displayed in the output notation
of {\sc ale}:
%
\begin{verbatim}
  cat
  QSTORE e_list
  SYNSEM basic
         SEM j  
         SYN np
\end{verbatim}
%
The type of this feature structure is {\tt cat}, which is interpreted
to mean it is a category.  It is defined for two features, {\tt
QSTORE} and {\tt SYNSEM}.  As can be seen from this example, we follow
the {\sc hpsg} notational convention of displaying features in all caps,
while types are displayed in lower case.  Also note that features and
their values are printed in alphabetic order of the feature names.  
In this case, the value of the {\tt QSTORE} feature is
the simple feature structure of type {\tt e\_list},%
%
\footnote{Set values, like those employed in {\sc hpsg}, are not
supported by {\sc ale}.  In the categorial grammar in the appendix,
they are represented by lists and treated by attached procedures for
union and selection.}
%
which has no feature
values.  On the other hand, the feature {\tt SYNSEM} has a complex
feature as its value, which is of type {\tt basic}, and has two feature
values {\tt SEM} and {\tt SYN}, both of which have simple values.  

This last feature structure doesn't involve any structure sharing.
But consider the lexical entry for {\em runs}:
%
\begin{verbatim}
  cat
  QSTORE e_list
  SYNSEM backward
         ARG basic
             SEM [0] individual
             SYN np
         RES basic
             SEM run
                 RUNNER [0] 
             SYN s
\end{verbatim}
%
Here there is structure sharing between the path {\tt SYNSEM ARG SEM}
and the path {\tt SYNSEM RES SEM RUNNER}, where a {\em path} is simply
a sequence of features.  This structure sharing is indicated by the
{\em tag} {\tt [0]}.  In this case, the sharing indicates that the
semantics of the argument of {\em runs} fills the runner role in the
semantics of the result.  Also note that a shared structure is only
displayed once;  later occurrences simply list the tag.  Of course,
this example only involves structure sharing of a very simple feature
structure, in this case one consisting of only a type with no
features.  In general, structures of arbitrary complexity may be
shared, as we will see in the next example. 

{\sc ale}, like Prolog II and {\sc hpsg}, but unlike most other systems,
allows cyclic structures to be processed and even printed.  For
instance, consider the following representation we might use for the
liar sentence {\em This sentence is false}:
%
\begin{verbatim}
  [0] false
      ARG1 [0]
\end{verbatim}
%
In this case, the empty path and the feature {\tt ARG1} share a value.
Similarly, the path {\tt ARG1 ARG1 ARG1} and the path {\tt ARG1 ARG1},
both of which are defined, are also identical.  But consider a
representation for the negation of the liar sentence, {\em It is false
that this sentence is false}:
%
\begin{verbatim}
  false
  ARG1 [0] false
       ARG1 [0]
\end{verbatim}
%
Unlike Prolog II, {\sc ale} does not necessarily treat these two feature
structures as being identical, as it does not conflate a cyclic
structure with its infinite unfolding.  We take up the notion of
token identical structures in the section below on extensionality.

It is interesting to note that with typed feature structures, there is
a choice between representing information using a type and
representing the same information using feature values.  This is a
familiar situation found in most inheritance-based representation
schemes.  Thus the relation specified in the value of the path {\tt
SYNSEM RES SEM} is represented using a type, in:
%
\begin{verbatim}
  SEM run
      RUNNER [0]
\end{verbatim}
%
An alternative encoding, which is not without merit, is:
%
\begin{verbatim}
  SEM unary_rel
      REL run
      ARG1 [0]
\end{verbatim}
%
In general, type information is processed much more efficiently than
feature value information, so as much information as possible should
be placed in the types.  The drawback is that type information must be
computed at compile-time and remain accessible at run-time.  More
types simply require more memory.%
%
\footnote{In general, the amount of memory required to represent $n$
types is proportional to the number of pairs of consistent types.  
In the worst case, this is ${\cal O}(n^2)$ in the number of types.}


\mysection{Subsumption and Unification}

Feature structures are inherently partial in the information they
provide.  Based on the type inheritance ordering, we can order feature
structures based on how much information they provide.  This ordering
is referred to as the {\em subsumption} ordering.  The notion of
subsumption, or information containment, can be used to define the
notion of unification, or information combination.  Unification
conjoins the information in two feature structures into a single
result if they are consistent and detects an inconsistency otherwise.


\mysubsection{Subsumption}

We define subsumption, saying that $F$ {\it subsumes} $G$, if and only if: 
%
\begin{itemize}
\item
  the type of $F$ is more general than the type of $G$
\item
  if a feature $f$ is defined in $F$ then $f$ is also defined in $G$
such that the value in $F$ subsumes the value in $G$
\item
  if two paths are shared in $F$ then they are also shared in $G$
\end{itemize}
%
Consider the following examples of subsumption, where we let {\tt <}
stand for subsumption:
%
\begin{verbatim}
  agr         <  agr
  PERS first     PERS first
                 NUM plu
\end{verbatim}
%
\begin{verbatim}
  sign                phrase                 
  SUBJ agr         <  SUBJ agr 
       PERS pers           PERS first
                           NUM plu
\end{verbatim}
%
\begin{verbatim}
  sign                sign
  SUBJ agr            SUBJ [0] agr
       PERS first          PERS first
       NUM plu     <       NUM plu
  OBJ agr             OBJ [0]
      PERS first
      NUM plu
\end{verbatim}
%
\begin{verbatim}
  false               false              [1] false
  ARG1 false       <  ARG1 [0] false  <  ARG1 [1]
       ARG1 false          ARG1 [0]
\end{verbatim}
%
Note that the second of these subsumptions holds only if {\tt pers} is
a more general type than {\tt first}, and {\tt sign} is a more general
type than {\tt phrase}.  It is also important to note that the feature
structure consisting simply of the type {\tt bot} will subsume every
other structure, as the type {\tt bot} is assumed to be more general
than every other type.

\mysubsection{Unification}

Unification is an operation defined over pairs of feature structures that
combines the information contained in both of them if they are
consistent and fails otherwise.  In {\sc ale}, unification is very
efficient.%
%
\footnote{Using a typed version of the Martelli and Montanari (1982)
algorithm, which was adapted to cyclic structures by Jaffar (1984),
unification can be performed in what is known as quasi-linear time in
the size of the input structures, where in this case, quasi-linear in
$n$ is defined to be ${\cal O}(n \cdot \mbox{\it ack}^{-1}(n))$, where
${\it ack}^{-1}$ is the inverse of Ackermann's function, which will
never exceed 4 or 5 for structures that can be represented on existing
computers.  There is also a factor in the complexity of unification
stemming from the type hierarchy and appropriateness conditions, which
we discuss below.}
%
Declaratively, unifying two feature structures computes a result which
is the most general feature structure subsumed by both input
structures.  But the operational definition is more enlightening, and
can be given by simple conditions which tell us how to unify two
structures.  We begin by unifying the types of the structures in the
type hierarchy.  This is why we required the bounded completeness
condition on our inheritance hierarchies; we want unification to
produce a unique result.  If the types are inconsistent, unification
fails.  If the types are consistent, the resulting type is the
unification of the input types.  Next, we recursively unify all of the
feature values of the structures being unified which occur in both
structures.  If a feature only occurs in one structure, we copy it
over into the result.  This algorithm terminates because we only need
to unify structures which are non-distinct and there are a finite
number of nodes in any input structure.

Some examples of unification follow, where we use {\tt +} to represent
the operation:
%
\begin{verbatim}
  agr          +  agr      =  agr
  PERS first      NUM plu     PERS first
                              NUM sing
\end{verbatim}
%
\begin{verbatim}
  sign              sign             sign
  SUBJ agr       +  SUBJ [0] bot  =  SUBJ [0] agr
       PERS 1st     OBJ [0]               PERS first
  OBJ agr                                 NUM plu
      NUM plu                        OBJ [0]
\end{verbatim}
%
\begin{verbatim}
  t           t           t
  F [0] t  +  F t      =  F [1] t
  G [0]         F [1]       F [1]
              G [1]       G [1]
\end{verbatim}
%
\begin{verbatim}
  agr         +  agr          =  *failure*
  PERS first     PERS second    
\end{verbatim}
%
\begin{verbatim}
  e_list  +  ne_list     =  *failure*
             HD a
             TL e_list
\end{verbatim}
%
Note that the second example respects our assumption that the type
{\tt bot} is the most general type, and thus more general than {\tt
agr}.  The second example illustrates what happens in a simple case of
structure sharing:  information is retrieved from both the {\tt SUBJ}
and {\tt OBJ} and shared in the result.  The third example shows how
two structures without cycles can be unified to produce a structure
with a cycle.  Just as the feature structure {\tt bot} subsumes every
other structure, it is also the identity with respect to unification;
unifying the feature structure consisting just of the type {\tt bot}
with any feature structure $F$ results simply in $F$.  The last two
unification attempts fail, assuming that the types {\tt first} and
{\tt second} and the types {\tt e\_list} and {\tt ne\_list} are
incompatible. 

\mysection{Inequations}

Feature structures may also incorporate inequational constraints
following (Carpenter 1992), which is in turn based on the notion of
inequation in Prolog II (Colmerauer 1987).  For instance, we might
have the following representation of the semantics of a sentence:
%
\begin{verbatim}
  SEM binary_rel
      REL know
      ARG1  [0] referent
            GENDER masc
            PERS third
            NUM sing
      ARG2  [1] referent
            GENDER masc
            PERS third
            NUM sing
  [0] =\= [1]
\end{verbatim}
%
Below the feature information, we have included the constraint that
the value of the structure {\tt [0]} is not identical to that of
structure {\tt [1]}.  As a result, we cannot unify this structure with
the following one:
%
\begin{verbatim}
  REL know
  ARG1 [2]
  ARG2 [2]
\end{verbatim}
%
Any attempt to unify the structures {\tt [0]} and {\tt [1]} causes
failure.  

\mysection{Type System}

As we mentioned in the introduction, what distinguishes {\sc ale} from
other approaches to feature structures and most other approaches to
terms, is that there is a strong type discipline enforced on feature
structures.  We have already demonstrated how to define a type
hierarchy, but that is only half the story with respect to typing.
The other component of our type system is a notion of feature {\em
appropriateness}, whereby each type must specify which features it can
be defined for, and furthermore, which types of values such features
can take.  The notion of appropriateness used here is similar to that
found in object-oriented approaches to typing.  For instance, if a
feature is appropriate for a type, it will also be appropriate for all
of the subtypes of that type.  In other words, appropriateness
specifications are inherited by a type from its supertypes.
Furthermore, value restrictions on feature values are also inherited.
Another important consideration for {\sc ale}'s type system is the
notion of type inference, whereby types for structures which are
underspecified can be automatically inferred.  This is a property our
system shares with the functional language {\sc ml}, though our notion
of typing is only first-order.  To further put {\sc ale}'s type system
in perspective, we note that type inheritance must be declared by the
user at compile time, rather than being inferred.  Furthermore, types
in {\sc ale} are semantic, in Smolka's (1988b) terms, meaning that
types are used at run-time.  Even though {\sc ale} employs semantic
typing, the type system is employed statically (at compile-time) to
detect type errors in grammars and programs.

As an example of an appropriateness declaration, consider the simple
type specification for lists with a head/tail encoding:
%
\begin{verbatim}
  bot sub [list,atom].
    list sub [e_list,ne_list].
      e_list sub [].
      ne_list sub []
              intro [hd:bot,
                     tl:list].
  atom sub [a,b].
    a sub [].
    b sub [].
\end{verbatim}
%
This specification tells us that a list can be either empty ({\tt
e\_list}) or non-empty ({\tt ne\_list}).  It implicitly tells us that
an empty list cannot have any features defined for it, since none are
declared directly or inherited from more general types.  The
declaration also tells us that a non-empty list has two features,
representing the head and the tail of a list, and, furthermore, that
the head of a list can be anything (since every structure is of type
{\tt bot}), but the tail of the list must itself be a list.  Note that
features must also be Prolog constants, even though the output
routines convert them to all caps.  The appropriateness declaration,
{\tt intro}\label{INTRO}, can be specified along with subsumption, as
shown above, or separately; but for any given type, all features must
be declared at once.  If no {\tt intro} declaration is given for a
type, it is assumed that that type introduces no appropriate features.
If an {\tt intro} declaration is made for a type that does not occur
on either side of a {\tt sub} declaration, that type is assumed to be
an immediate subtype of {\tt bot} with no subtypes of its own.  If a
value restrictor (such as {\tt list} above for feature {\tt tl}) does
not occur on either side of a {\tt sub} declaration, it too is assumed
to be maximal and an immediate subtype of {\tt bot}.

In {\sc ale}, every feature structure must respect the appropriateness
restrictions in the type declarations.  This amounts to two
restrictions.  First, if a feature is defined for a feature structure
of a given type, then that type must be appropriate for the feature.
Furthermore, the value of the feature must be of the appropriate type,
as declared in the appropriateness conditions.  The second condition
goes the other way around: if a feature is appropriate for a type,
then every feature structure of that type must have a value for the
feature.  A feature structure respecting these two conditions is said
to be {\em totally well-typed} in the terminology of Carpenter (1992,
Chapter 6).%
%
\footnote{The choice of totally well-typed structures was motivated by
the desire to represent feature structures as records at run-time,
without listing their features.  Internally, a feature structure is
represented as a term of the form {\tt Tag-Sort(V1,...,VN)} where {\tt
Tag} represents the token identity of the structure using a Prolog
variable, {\tt Sort} is the type of structure, and {\tt V1} through
{\tt VN} are the values of the appropriate features, in alphabetical
order of the features' names, which are themselves left implicit.
Furthermore, the {\tt Tag} is used for forwarding and dereferencing
during unification.}
%
For instance, consider the following feature structures:
%
\begin{verbatim}
  list
  HD a
  TL bot
\end{verbatim}
%
\begin{verbatim}
  ne_list
  HD bot
  TL ne_list
     HD atom
     TL list
\end{verbatim}
%
\begin{verbatim}
  ne_list
  HD [0] ne_list
     HD [0]
     TL [0]
  TL e_list
\end{verbatim}
%
The first structure violates the typing condition because the type
{\tt list} is not appropriate for any features, only {\tt ne\_list} is.
But even if we were to change its type to {\tt ne\_list}, it would
still violate the type conditions, because {\tt bot} is not an
appropriate type for the value of {\tt TL} in a {\tt ne\_list}.  On
the other hand, the second and third structures above are totally
well-typed.  Note that the second such structure does not specify what
kind of list occurs at the path {\tt TL TL}, nor does it specify what
the {\tt HD} value is, but it does specify that the second element of
the list, the {\tt TL HD} value is an {\tt atom}, but it doesn't
specify which one.  

To demonstrate how inheritance works in a simple case, consider the
specification fragment from the categorial grammar in the appendix:
%
\begin{verbatim}
  functional sub [forward,backward]
             intro [arg:synsem,
                    res:synsem].
    forward sub [].
    backward sub [].
\end{verbatim}
%
This tells us that {\tt functional} objects have {\tt ARG} and {\tt
RES} features.  Because {\tt forward} and {\tt backward} are subtypes
of {\tt functional}, they will also have {\tt ARG} and {\tt RES}
features, with the same restrictions.  

There are a couple of important restrictions placed on appropriateness
conditions in {\sc ale}.  The most significant of these is the
acyclicity requirement.  This condition disallows type specifications
which require a type to have a value which is of the same or more
specific type.  For example, the following specification is {\em not}
allowed:
%
\begin{verbatim}
  person sub [male,female]
         intro [father:male,
                mother:female].
    male sub [].
    female sub [].
\end{verbatim}
%
The problem here is the obvious one that there are no most general
feature structures that are both of type {\tt person} and totally
well-typed.%
%
\footnote{The only finite feature structures that could meet this type
system would have to be cyclic, as noted in Carpenter (1992).  The
problem is that there is no most general such cyclic structure, so
type inference cannot be unique.}
%
This is because any person must have a father and mother feature,
which are male and female respectively, but since male and female are
subtypes of person, they must also have mother and father values.  It
is significant to note that the acyclicity condition does not rule out
recursive structures, as can be seen with the example of lists.  The
{\tt list} type specification is acceptable because not every list is
required to have a head and tail, only non-empty lists are.  The
acyclicity restriction can be stated graph theoretically by
constructing a directed graph from the type specification.  The nodes
of the graph are simply the types.  There is an edge from every type
to all of its supertypes, and an edge from every type to the types in
the type restrictions in its features.  Type specifications are only
acceptable if they produce a graph with no cycles.  One cycle in the
{\tt person} graph is from {\tt male} to {\tt person} (by the
supertype relation) and from {\tt person} to {\tt male} (by the {\tt
FATHER} feature).  On the other hand, there are no cycles in the
specification of {\tt list}.

The second restriction placed on appropriateness declarations is
designed to limit non-determinism in much the same way as the
bounded completeness condition on the inheritance hierarchy.  This
second condition requires every feature to be {\em introduced} at a unique
most general type.  In other words, the set of types appropriate for a
feature must have a most general element.  Thus the following type
declaration fragment is {\em invalid}:
%
\begin{verbatim}
  a sub [b,c,d].
    b sub []
      intro [f:w,
             g:x].
    c sub []
      intro [f:y,
             h:z].
    d sub [].
\end{verbatim}
%
The problem is that the feature {\tt F} is appropriate for types {\tt
b} and {\tt c}, but there is not a unique most general type for which
it's appropriate.  In general, just like the bounded completeness
condition, type specifications which violate the feature introduction
condition can be patched, without violating any of their existing
structure, by adding additional types.  In this case, we add a new type
between {\tt a} and the types {\tt b} and {\tt c}, producing the
equivalent well-formed specification:
%
\begin{verbatim}
  a sub [e,d].
    e sub [b,c]
      intro [f:bot].
      b sub []
        intro [f:w,
               g:x].
      c sub []
        intro [f:y,
               h:z].
    d sub [].
\end{verbatim}
%
This example also illustrates how subtypes of a type can place
additional restrictions on values on features as well as introducing
additional features.

As a further illustration of how feature introduction can be obeyed in
general, consider the following specification of a type system for
representing first-order terms:
%
\begin{verbatim}
  sem_obj sub [individual,proposition].
    individual sub [a,b].
      a sub []. 
      b sub [].
  proposition sub [atomic_prop,relational].
    atomic_prop sub [].
    relational_prop sub [unary_prop,transitive_prop]
                    intro [arg1:individual].
      unary_prop sub [].
      transitive_prop sub [binary_prop,ternary_prop]
                      intro [arg2:individual].
        binary_prop sub [].
        ternary_prop sub []
                     intro [arg3:individual].
\end{verbatim}
%
In this case, unary propositions have one argument feature, binary
propositions have two argument features, and ternary propositions have
three argument features, all of which must be filled by individuals.


\mysection{Extensionality}

{\sc ale} also respects the distinction between {\em intensional} and
{\em extensional} types (see Carpenter (1992:Chapter 8).  The concept
of extensional typing has its origins in the assumption in standard
treatments of feature structures, that there can only be one copy of
any {\em atom} (a feature structure with no appropriate features) in a
feature structure.  Thus, if path $\pi_{1}$ leads to atom {\em a}, and
path $\pi_{2}$ leads to atom {\em a}, then the values for those two
paths are token-identical.  Token-identity refers to an identity
between two feature structures as objects, as opposed to {\em
  structure-identity}, which refers to an identity between two feature
structures that contain the same information.

Smolka (1988a) partitioned his atoms according to whether more than
one copy could exist or not.  In {\sc ale}, following Carpenter
(1992), this notion of copyability has been extended to arbitrary
types -- loosely speaking, those types which are copyable we call {\em
intensional}, and those which are not we call {\em extensional}.
Thus, it is possible to have two feature structures of the same
intensional type which, although they may be structure-identical, are
not token-identical.  Formally:
%
\begin{quote}
  Given the set of types, {\sf Type}, defined by an {\sc ale}
  signature, we designate a subset, ${\sf ExtType} \subseteq {\sf
    Type}$, as the set of extensional types.  With the exception of
  {\tt a\_/1} atoms, discussed below, this set consists only of {\em
    maximally specific types}, i.e., for each $\sigma \in {\sf
    ExtType}$, there is no type $\tau$ such that $\sigma$ subsumes
  $\tau$.
\end{quote}
%
The restriction of {\sf ExtType} to maximally specific types is
peculiar to {\sc ale}, and is levied in order to reduce the
computational complexity of enforcing extensionality.%
%
\footnote{In theory (Carpenter 1992), this set is only required to be
  {\em upward closed}, which means that if $\sigma \in {\sf ExtType}$,
  and $\sigma$ subsumes $\tau$, then $\tau \in {\sf ExtType}$.  This
  relaxation of our requirement that extensional types be maximal
  would actually not be too difficult to implement.}

We need one more definition to formally state the effect which an
extensional type has on feature structures in {\sc ale}.
%
\begin{quote}
Given a set of extensional types, {\sf ExtType}, we define an
equivalence relation, $\asymp$, the {\em collasping relation}, on
well-typed feature structures, such that $F_{1} \asymp F_{2}$ for
$F_{1} \neq F_{2}$ only if:
\begin{itemize}
\item $F_{1}$ has the same type, $\sigma$, as $F_{2}$, and $\sigma \in
{\sf ExtType}$, and
\item for every feature, $f$, appropriate to $\sigma$,
$F_{1}^{f}$, the value of $f$ in $F_{1}$, and $F_{2}^{f}$, the value
of $f$ in $F_{2}$, are defined, and $F_{1}^{f} \asymp F_{2}^{f}$.
\end{itemize}
\end{quote}
%
In {\sc ale}, all feature structures behave as if they are what
Carpenter (1992) referred to as {\em collapsed}.
That is, the only collapsing relation that exists between any two
feature structures is the trivial collapsing relation, namely:
%
\begin{quote}
$F_{1} \asymp F_{2}$ if and only if $F_{1}$ is token-identical to $F_{2}$.
\end{quote}
%
In the case of acyclic feature structures, this definition is
equivalent to saying that two feature structures of the same
extensional type are token-identical if and only if, for every feature
appropriate to that type, their respective values on that feature are
token-identical.  For example, supposing that we have a signature
representing dates, then the two substructures representing dates in
the following structure must be token identical.
%
\begin{verbatim}
  married_person
  BIRTHDAY [1] date
           DAY  12
           MONTH nov
           YEAR 1971
  SPOUSE BIRTHDAY [1] date
                  DAY 12
                  MONTH nov
                  YEAR 1971
\end{verbatim}
%
In other words, this represents a person born on 12 November 1971, who
is married to a person with the same birthdate.

Now consider a slightly more complex example, which employs the
following signature.
%
\begin{verbatim}
  bot sub [a,b,c,g].
    a sub []
      intro [f:b,g:c].
    b sub [].
    c sub [].
    g sub []
      intro [h:a,j:a].
\end{verbatim}
%
If {\tt a}, {\tt b}, and {\tt c} are extensional, then the values of
{\tt H} and {\tt J} in {\tt g} are always token-identical, i.e., every
feature structure of type {\tt g} satisfies:
%
\begin{verbatim}
  g
  H [0] a
    F b
    G c
  J [0]
\end{verbatim}
%
But if only {\tt a}, and {\tt b} are extensional, and {\tt c} is
intensional, then the values of {\tt H} and {\tt J} are not
necessarily token-identical, although they are always
structure-identical:
%
\begin{verbatim}
  g
  H a
    F [1] b
    G c
  J a
    F [1]
    G c
\end{verbatim}
%

To cite an earlier example, suppose we were to specify that the type
{\tt false}, used in the liar sentence and its negation, were
extensional.  Now the liar sentence's representation is:
%
\begin{verbatim}
  [0] false
      ARG1 [0]
\end{verbatim}
%
as before, but the negation of the liar sentence would also be
represented by:
%
\begin{verbatim}
  [0] false
      ARG1 [0]
\end{verbatim}
%
since if were still represented by:
%
\begin{verbatim}
  [1] false
  ARG1 [0] false
       ARG1 [0]
\end{verbatim}
%
then we could cite a non-trivial collasping relation, $\asymp$, in
which $[1] \asymp [0]$.  

As a related example, consider:
%
\begin{verbatim}
  s
  A [0] t
    C [0]
  B [1] t
    C [1]
\end{verbatim}
%
Assuming that {\tt t} is extensional and only appropriate for the
feature {\tt C}, then the structures {\tt [0]} and {\tt [1]} in the
above structure would be identified.  

Extensionality allows the proper representation of feature structures
and terms in both PATR-II, the Rounds-Kasper system, and in Prolog and
Prolog II.  For PATR-II and the Rounds-Kasper system, all atoms (those
types with no appropriate features) are assumed to be extensional.
Furthermore, in the Rounds-Kasper and PATR-II systems, which are
monotyped, there is only one type that is appropriate for any
features, and it must be appropriate for all features in the grammar.
In Prolog and Prolog II, the type hierarchy is assumed to be flat, and
every type is extensional.  

Just as with implementations of Prolog, collapsing is only performed
as it is needed.  As shown by Carpenter (1992), collapsing can be
restricted to cases where inequations are tested between two
structures, with exactly the same failure behavior.  It turns out to
be less efficient to collapse structures before asserting them into
{\sc ale}'s parsing chart, primarily because the time to test arbitrary
structures for collapsibility is at least quadratic in the size of the
structures being collapsed.  See the section below on inequations for
further discussion.  Currently, extensionality is only enforced before
the answer to a user query is given.

Extensional types in {\sc ale} are specified all at once in
a list: 
%
\begin{quote}
  ${\tt ext([}ext_{1}, \ldots, ext_{n}{\tt ]).}$\label{EXT}
\end{quote}
%
in the same file in which the subsumption relation is defined.  All
types that are not mentioned in the {\tt ext} specification are
assumed to be intensional, except {\sc ale}'s {\tt a\_/1} atoms,
discussed below, which have the same extensionality as Prolog terms,
i.e., if they are ground or have the same variables in the same
positions.%
%
\footnote{This is given by the {\tt ==} operator in Prolog.}
%
These do not need to be declared as such.  If more than
one {\tt ext} specification is given, the first one is used.  If no
{\tt ext} specification is given, then the specification:
%
\begin{verbatim}
  ext([]).
\end{verbatim}
%
is assumed.  If a type occurs in {\tt ext/1}, but does not appear on
the left or right-hand side of a {\tt sub} declaration, it assumed to
be maximal, and immediately subsumed by {\tt bot}.

Of course, collapsing is only enforced between feature structures
whose life-spans in {\sc ale}%
%
\footnote{The life-span of a feature
structure in {\sc ale} is the period from its creation to the point
when the user command currently being executed finishes, unless
that feature structure is asserted as an edge in {\sc ale}'s chart
parser.  In this case, the life of the feature structure ends when the
edge is removed.  Every new request for a parse to {\sc ale} removes
all of the current edges.} 
%
overlap.  So, for example, if one request
is made for the representation of the liar sentence:
%
\begin{verbatim}
  [0] false
      ARG1 [0]
\end{verbatim}
%
and then another is made for that of its negation, the output is not:
\begin{verbatim}
  [0]   
\end{verbatim}
(referring to the same token above) but rather:
%
\begin{verbatim}
  [0] false
      ARG1 [0]
\end{verbatim}
%
Every time a new context arises, numbering of structures begins again
from {\tt [0]}.

\mysection{{\tt a\_/1} Atoms}

{\sc ale} also provides an infinite collection of atoms.
These are of the form:\label{ATOM}
\begin{quote}
{\tt a\_} {\it Term}
\end{quote}
where {\it Term} is a prolog term.  Two {\tt a\_/1} atoms subsume each
other if and only if their terms subsume each other as prolog terms.
As a result, no two different, ground atoms subsume each
other; and the most general atom of this collection is {\tt a\_ \_}.
They implicitly exist in every type hierarchy, with {\tt a\_ \_} being
immediately subsumed by {\tt bot}, and with every ground
atom being maximal.  {\tt a\_/1} atoms are extensional; and non-ground
{\tt a\_/1} atoms are extensionally identical as Prolog terms, i.e.,
if they have the same variables in the same positions.  For example,
{\tt a\_ f(X)} and {\tt a\_ g(X)} are not taken to be the same atom,
nor are {\tt a\_ f(X)} and {\tt a\_ f(f(X)}, nor are {\tt a\_ f(X)}
and {\tt a\_ f(Y)}.  But {\tt a\_ f(X)} and {\tt a\_ f(X)} are.  Their
status in the type hierarchy should not be explicitly declared, nor
should the fact that they bear no features, nor should their
extensionality.

Some care must be exercised when using non-ground atoms in chart
edges.  {\sc ale}'s chart parser copies edges, including the Prolog
variables inside {\tt a\_/1} atoms.  When these variables are copied,
identity among variables within a single edge is preserved, but
identity among variables between different edges may be lost.  Because
{\sc ale} delays the enforcement of extensional type checking, this
could result in {\sc ale} losing a path equation between two atoms.
The best way to avoid this is always to use ground atoms in chart
edges.  Otherwise, the user should at least avoid relying on
extensional identity when writing grammars by not using the {\sc ale}
built-in {\tt @=} or inequations between non-ground atoms from
different edges.

If the user requires intensional atoms, they must be explicitly
declared.  There must also be no user-defined type, {\tt a\_}, in the
type hierarchy.  Certain arguments to {\tt a\_/1} cannot be used, such
as {\tt a\_} itself, and other prolog reserved words, such as {\tt
  mod}, unless they are used with the proper operator precedence and
proper number of arguments to be parsed by Prolog.

Otherwise, {\tt a\_/1} atoms can be used wherever a normal type can.
They are particularly useful as members of large domains that are too
tedious to define, such as phonology attributes in natural language
grammars, or to pass extra-logical information around a parse tree,
such as numbers representing probabilities.  To declare a feature's
value as any {\tt a\_/1} atom, use {\tt a\_ \_}:
%
\begin{verbatim}
sign intro [phon:(a_ _)].
\end{verbatim}
%
The parentheses are recommended for readability, but not necessary.
Because subsumption among {\tt a\_/1} atoms mirrors subsumption as
prolog terms, one can also declare features appropriate for only
certain kinds of atoms.  For example:
%
\begin{verbatim}
sign intro [phon:(a_ phon(_))].
\end{verbatim}
%
declares {\sc phon} appropriate to any atom whose term's functor is
{\tt phon/1}.

Structure-sharing between {\tt a\_/1} prolog terms in feature
appropriateness declarations is ignored by {\sc ale}.  For example,
the declaration:
%
\begin{verbatim}
foo intro [f:(a_ X),g:(a_ X)].
\end{verbatim}
%
is treated as:
%
\begin{verbatim}
foo intro [f:(a_ _),g:(a_ _)].
\end{verbatim}
%
{\sc ale} does respect structure-sharing between {\tt a\_/1} prolog
terms in descriptions.

\mysection{Attribute-Value Logic}

Now that we have seen how the type system must be specified, we turn
our attention to the specification of feature structures themselves.
The most convenient and expressive method of describing feature
structures is the logical language developed by Kasper and Rounds
(1986), which we modify here in three ways.  First, we replace the
notion of path sharing with the more compact and expressive notion of
variable due to Smolka (1988a).  Second, we extend the language to
types, following Pollard (in press). Finally, we add inequations.

The collection of {\em descriptions} used in {\sc ale} can be described by
the following {\sc bnf} grammar:
%
\begin{verbatim}
  <desc> ::= <type>
           | <variable>
           | (<feature>:<desc>)
           | (<desc>,<desc>)
           | (<desc>;<desc>)
           | (=\= <desc>)
\end{verbatim}
%
As we have said before, both types and features are represented by
Prolog constants.  Variables, on the other hand, are represented by
Prolog variables.  As indicated by the {\sc bnf}, no whitespace is
needed around the feature selecting colon, conjunction
comma\label{COMMADESC} and disjunction semi-colon\label{SEMIDESC}, but any
whitespace occurring will be ignored.

These descriptions are used for picking out feature structures that
satisfy them.  We consider the clauses of the definition in turn.  A
description consisting of a type picks out all feature structures of
that type.  A variable can be used to refer to any feature structure,
but multiple occurrences of the same variable must refer to the same
structure.  A description of the form {\tt
  (<feature>:<desc>)}\label{COLON} picks out a feature structure whose
value for the feature satisfies the nested description.  An inequation
{\tt =$\backslash$= <desc>}{\label{INEQ}} is satisfied by those
feature structures that are not token-identical to the feature
structure described by {\tt <desc>}.  Inequations are discussed in
more detail below.

There are two ways of logically combining descriptions: following
Prolog, the comma represents conjunction and the semi-colon represents
disjunction.  A feature structure satisfies a conjunction of
descriptions just in case it satisfies both conjuncts, while it
satisfies a disjunction of descriptions if it satisfies either of the
disjuncts.

We should also add to the above {\sc bnf} grammar the following line:
%
\begin{verbatim}
  <desc> ::= (<path> == <path>)
\end{verbatim}
%
This is an equational description\label{EQUAL}, of which inequations
are the negation.  Equational or inequational descriptions are
satisfied by the presence or absence, respectively, of token-identity.
In particular, an inequation between two structurally-identical
feature structures can be satisfied, while a path equation can {\it
 only} be satisfied by two structurally-identical feature structures,
but is not necessarily satisfied.

All instances of equational descriptions can be captured by using
multiple occurrences of variables.  For example, the description:
%
\begin{verbatim}
  ([arg1]==[arg2])
\end{verbatim}
%
is equivalent to the description:
%
\begin{verbatim}
  (arg1:X,arg2:X).
\end{verbatim}
%
assuming there are no other occurrences of {\tt X}.  

Standard assumptions about operator precedence and association are
followed by {\sc ale}, allowing us to omit most of the parentheses in
descriptions.  In particular, equational descriptions bind the most
tightly, followed by feature selecting colon, then by inequations,
then conjunction and finally disjunction.  Furthermore, conjunction
and disjunction are left-associative, while the feature selector is
right-associative.  For instance, this gives us the following
equivalences between descriptions:
%
\begin{verbatim}
  a, b ; c, d ; e = (a,b);(c,d);e

  a,b,c = a,(b,c)
  
  f:g:bot,h:j = (f:(g:bot)),(h:j)

  f:g: =\=k,h:j = (f:(g: =\=(k))),(h:j)

  f:[g]==[h],h:j = (f:([g]==[h])),(h:j) 
\end{verbatim}
Note that a space must occur between {\tt =$\backslash$=} and other
operators such as {\tt :}.

A description may be satisfied by no structure, a finite number of
structures or an infinite collection of feature structures.  A
description is said to be {\em satisfiable} if it is satisfied by at
least one structure.  A description $\phi$ {\em entails} a description
$\psi$ if every structure satisfying $\phi$ also satisfies $\psi$.
Two descriptions are {\em logically equivalent} if they entail each
other, or equivalently, if they are satisfied by exactly the same set
of structures.  

{\sc ale} is only sensitive to the differences between logically
equivalent formulas in terms of speed.  For instance, the two
descriptions {\tt (tl:list,ne\_list,hd:bot)} and {\tt hd:bot} are
satisfied by exactly the same set of totally well-typed structures
assuming the type declaration for lists given above, but the smaller
description will be processed much more efficiently.  There are also
efficiency effects stemming from the order in which conjuncts (and
disjuncts) are presented.  The general rule for speedy processing is
to eliminate descriptions from a conjunction if they are entailed by
other conjuncts, and to put conjuncts with more type and feature
entailments first.  Thus with our specification for relations above,
the description {\tt (arg1:a, binary\_proposition)} would be slower
than {\tt (binary\_proposition,arg1:a)}, since {\tt
binary\_proposition} entails the existence of the feature {\tt arg1},
but not conversely.%
%
\footnote{This is because the depth of dereferencing depends on the
history of types a given structure is instantiated to.  There is no
path compression on-line, but it is carried out before an edge is
asserted into the chart.} 

At run-time, {\sc ale} computes a representation of the most general
feature structure that satisfies a description.  Thus a description
such as {\tt hd:a} with respect to the list grammar is satisfied by
the structure:
%
\begin{verbatim}
  ne_list
  HD a
  TL list
\end{verbatim}
%
Every other structure satisfying the description {\tt hd:a} is
subsumed by the structure given above.  In fact, the above structure
is said to be a {\em vague} representation of all of the structures
that satisfy the description.  The type conditions in {\sc ale} were
devised to obey the very important property, first noted by Kasper and
Rounds (1986), that every non-disjunctive description is satisfied by
a unique most general feature structure.  Thus in the case of {\tt
hd:a}, there is no more general feature structure than the one above
which also satisfies {\tt hd:a}.

The previous example also illustrates the kind of {\em type inference}
used by {\sc ale}.  Even though the description {\tt hd:a} does not
explicitly mention either the feature {\tt TL} or the type {\tt
ne\_list}, to find a feature structure satisfying the description,
{\sc ale} must infer this information.  In particular, because {\tt
ne\_list} is the most general type for which {\tt HD} is appropriate,
we know that the result must be of type {\tt ne\_list}.  Furthermore,
because {\tt ne\_list} is appropriate for both the features {\tt HD}
and {\tt TL}, {\sc ale} must add an appropriate {\tt TL} value.  The
value type {\tt list} is also inferred, due to the fact that a {\tt
ne\_list} must have a {\tt TL} value which is a list.  As far as type
inference goes, the user does not need to provide anything other than
the type specification; the system computes type inference based on
the appropriateness specification.  In general, type inference is very
efficient in terms of time.  The biggest concern should be how large
the structures become.%
%
\footnote{Finding most general satisfiers for non-disjunctive
descriptions, even those involving type inference, is quasi-linear in
the size of the description.  But it should be kept in mind that there
is also a factor of complexity determined by the size of the type
specification.  In practice, this factor is proportional to how large
the inferred structure is.  In general, the size of the inferred
structure is linear in the size of the description, with a constant
for the type specification.}
%
In contrast to a vague description, a disjunctive description is
usually {\em ambiguous}.  Disjunction is where the complexity arises
in satisfying descriptions, as it corresponds operationally to
non-determinism.%
%
\footnote{It corresponds so closely with non-determinism that
satisfiability of descriptions with disjunctions is $\mbox{\sc
NP}$-complete.  Furthermore, the algorithm employed by {\sc ale} may
produce up to $2^n$ satisfiers for a description with $n$
disjunctions.}
%
For instance, the description {\tt hd:(a;b)}
is satisfied by two distinct minimal structures, neither of which
subsumes the other:
%
\begin{verbatim}
  ne_list       ne_list
  HD a          HD b
  TL list       TL list
\end{verbatim}
%
On the other hand, the description {\tt hd:atom} is satisfied by the
structure:
%
\begin{verbatim}
  ne_list
  HD atom
  TL list
\end{verbatim}
%
Even though the descriptions {\tt hd:atom} and {\tt hd:(a;b)} are not
logically equivalent (though the former entails the latter), they have
the interesting property of being unifiable with exactly the same set
of structures.  In other words, if a feature structure can be unified
with the most general satisfier of {\tt hd:atom}, then it can be
unified with one of the minimal satisfiers of {\tt hd:(a;b)}.  

In terms of efficiency, it is very important to use vagueness wherever
possible rather than ambiguity.  In fact, it is almost always a good
idea to arrange the type specification with just this goal in mind.
For instance, consider the difference between the following pair of
type specifications, which might be used for English gender:
%
\begin{verbatim}
  gender sub [masc,fem,neut].        gender sub [animate,neut].
    masc sub [].                       animate sub [masc,fem].
    fem sub [].                          masc sub [].
    neut sub [].                         fem sub [].
                                       neut sub [].
\end{verbatim}
%
Now consider the fact that the relative pronouns {\em who} and {\em
which} are distinguished on the basis of whether they select animate
or inanimate genders.  In the flatter hierarchy, the only way to
select the animate genders is by the ambiguous description {\tt
masc;fem}.  The hierarchy with an explicit animate type can capture
the same possibilities with the vague description {\tt animate}.  An
effective rule of thumb is that {\sc ale} does an amount of work at
best proportional to the number of most general satisfiers of a
description and at worst proportional to $2^n$, where $n$ is the
number of disjuncts in the description.  Thus the ambiguous
description requires roughly twice the time and memory to process than
the vague description.  Whether the amount of work is closer to the
number of satisfiers or exponential in the number of disjuncts depends
on how many unsatisfiable disjunctive possibilities drop out early in
the computation.

\mysubsection{Enforcement of Inequations}

Inequations are persistent in that once that are created, they remain
as long as one of the structures being inequated remains.  Thus the
following two descriptions are logically equivalent:
%
\begin{verbatim}
  f:(=\=c), f:c

  f:c, f:(=\=c)
\end{verbatim}
%
Both will cause failure; but they are not operationally equivalent.
An inequation is evaluated when it arises, and again after high-level
unifications in the system; but inequations are not evaluated every
time an inequated structure is modified.  In an ideal system,
inequations would be attached directly to structures so that they
could be evaluated on-line during unification.  As things stand, {\sc
  ale} represents a feature structure as a regular feature
structure with a separate set of inequations.  Also, the complexity is
sensitive to the conjunctive normal form of inequations at the time at
which it is evaluated, though this form may later be reduced.

These sets of inequations are evaluated at run-time at the point they
are encountered, before answers are given to top-level queries, before
any edge is added to {\sc ale}'s parsing chart, after every daughter
is matched to a description in an {\sc ale} grammar rule%
%
\footnote{In the case of {\tt cats>}, they are enforced after
the list description itself is matched, and also after every element
of the list is matched.},
%
and after the head of an {\sc ale} definite clause has been matched
to a goal.  At compile-time, inequations are checked for every empty
category, for every lexical entry, and after every lexical rule
application.

Inequations are also symmetric.  Thus the following two descriptions
are logically equivalent:
%
\begin{verbatim}
  f:(=\= X),g:X

  f:X,g:(=\= X)
\end{verbatim}
%
Both inequate the values of {\sc f} and {\sc g}.  Again, these are not
operationally equivalent.  Because inequations are evaluated at the
time they are encountered, the second ordering will normally detect an
immediate failure {\it sooner} than the first.

An inequation between two feature structures is a requirement for them
not to be {\em token-identical}.  Thus, if a type is intensional, it
is possible for two feature structures to be of that same type, and
still satisfy an inequation between them.  Thus, any attempt to
inequate two structures that should be identical as a result of
extensional typing will also cause failure.  For instance, consider
the following:
%
\begin{verbatim}
  s              s
  F [1] t        F [3]
    H [1]   +    G [4]           =  failure
  G [2] t        [3] =\= [4]
    H [1]
\end{verbatim}
%
The values of the features {\tt F} and {\tt G} cannot be inequated
because they are extensionally identical (assuming the type {\tt t} is
declared to be extensional and is only appropriate for the feature
{\tt H}.  

When inequations are evaluated, they are reduced.  This reduction
consists, in part, of partial evaluation of extensionality checking,
which is otherwise delayed in {\sc ale}.  For instance, consider the
following:
%
\begin{verbatim}
  F [1] s
    H bot
    J bot
  G [2] s
    H bot
    J bot
  [1] =\= [2]
\end{verbatim}
%
If the type {\tt s} is extensional and appropriate for the features
{\tt H} and {\tt J}, then the inequation above is reduced to the
following:
%
\begin{verbatim}
  F [1] s
    H [3] bot
    J [4] bot
  G [2] s
    H [5]
    J [6]
  [3] =\= [5] ; [4] =\= [6]
\end{verbatim}
%
The set of inequations is stored in conjunctive normal form.  The cost
is some space over the re-evaluation of inequations.  Of course, if
the types on {\tt [3]} and {\tt [4]} were more refined than {\tt bot},
then the inequations {\tt [3] \verb+=\=+ [5]} and {\tt [4] \verb+=\=+
[6]} would further reduce.  In addition, when reducing inequations in
this way, we eliminate those that are trivially satisfied.  The ones
that are printed are only the residue after reduction.  For instance,
consider the following structure:
%
\begin{verbatim}
  F [1] s
    H [3] a
    J bot
  G [2] s
    H [3]
    J bot
  [1] =\= [2]
\end{verbatim}
%
Since the {\tt H} values are token-identical, this
structure reduces to the following.
%
\begin{verbatim}
  F s
    H [3] a
    J [4] bot
  G s
    H [3]
    J [5] bot
  [4] =\= [5]
\end{verbatim}
%
If structures {\tt [4]} and {\tt [5]} had been of non-unifiable types,
then there would be no residual inequation at all --- the original
inequation would trivially be satisfied.

An important subcase is that of an inequation between extensional
atoms.  If an atom is extensional, then there is only one instance of
it.  Thus an inequation between two identical, extensional atoms
always fails.  For example, if a type signature includes:
%
\begin{verbatim}
  bot sub [..., a, b, ...].
  a sub [].
    intro [f:bot].
  b sub [].
  ...
  ext([..., b, ...]).
\end{verbatim}
%
then the description:
%
\begin{verbatim}
  (a,f:=\= b)
\end{verbatim}
%
is satisfied just in those cases where the value of {\sc f} is not of
type {\bf b}.  If {\bf b} were intensional, then the inequation in
this description would essentially have no effect.  In fact, the only
productive instances of inequations between two intensionally typed
feature structures are those used with multiply occurring variables.
In all other instances, there is no way for the inequation to be
violated, since there is no way to render a structurally-identical
copy of an intensionally typed feature structure token-identical to
any other structure.  {\sc ale} detects these trivially satisfied
inequations and disposes of them.

\mysection{Macros}

{\sc ale} allows the user to employ a general form of parametric
macros in descriptions.  Macros allow the user to define a description
once and then use a shorthand for it in other descriptions.  We first
consider a simple example of a macro definition, drawn from the
categorial grammar in the appendix.  Suppose the user wants to employ
a description {\tt qstore:e\_list} frequently within a program.  The
following macro definition\label{MACROSIG} can be used in the program
file:
%
\begin{verbatim}
  quantifier_free macro
    qstore:e_list.
\end{verbatim}
%
Then, rather than including the description {\tt qstore:e\_list} in
another description, {\tt @ quantifier\_free} can be used instead.
Whenever {\tt @ quantifier\_free} is used, {\tt qstore:e\_list} is
substituted.

In the above case, the {\tt <macro\_spec>} was a simple atom, but in
general, it can be supplied with arguments.  
The full {\sc bnf} for macro definitions is as follows:
%
\begin{verbatim}
  <macro_def> ::= <macro_head> macro <desc>.

  <macro_head> ::= <macro_name>
                 | <macro_name>(<seq(<var>)>)

  <macro_spec> ::= <macro_name>
                 | <macro_name>(<seq(<desc>)>)

  <seq(X)> ::= X
             | X, <seq(X)>
\end{verbatim}
%
Note that {\tt <seq(X)>} is a parametric category in the {\sc bnf}
which abbreviates non-empty sequences of objects of category {\tt X}.
The following clause should be added to recursive definition of
descriptions: 
%
\begin{verbatim}
  <desc> ::=  @ <macro\_spec>
\end{verbatim}
%
A feature structure satisfies a description of the form {\tt @
<macrospec>}\label{AT} just in case the structure satisfies the body
of the definition of the macro.

Again considering the categorial grammar in the appendix, we have the
following macros with one and two arguments respectively:
%
\begin{verbatim}
  np(Ind) macro
    syn:np,
    sem:Ind.
\end{verbatim}
%
\begin{verbatim}
  n(Restr,Ind) macro
    syn:n,
    sem:(body:Restr,
         ind:Ind).
\end{verbatim}
%
In general, the arguments in the definition of a macro must be Prolog
variables, which can then be used as variables in the body of the
macro.  With the first macro, the description {\tt @ np(j)} would then
be equivalent to the description {\tt syn:np,sem:j}.  When evaluating
a macro, the argument supplied, in this case {\tt j}, is substituted
for the variable when expanding the macro.  In general, the argument
to a macro can itself be an arbitrary description (possibly containing
macros).  For instance, the description:
%
\begin{verbatim}
  n((and,conj1:R1,conj2:R2),Ind3)
\end{verbatim}
%
would be equivalent to the description:
%
\begin{verbatim}
  syn:n, 
  sem:(body:(and,conj1:R1,conj2:R2), 
       ind:Ind3)
\end{verbatim}
%
This example illustrates how other variables and even complex
descriptions can be substituted for the arguments of macros.
Also note the parentheses around the arguments to the first argument
of the macro.  Without the parentheses, as in {\tt
n(and,conj1:R1,conj2:R2,Ind3)}, the macro expansion routine would take
this to be a four argument macro, rather than a two argument macro
with a complex first argument.  This brings up a related point, which
is that different macros can have the same name as long as they have
the different numbers of arguments.  

Macros can also contain other macros, as illustrated by the macro for
proper names in the categorial grammar:
%
\begin{verbatim}
  pn(Name) macro
    synsem: @ np(Name),
    @ quantifier_free.
\end{verbatim}
%
In this case, the macros are expanded recursively, so that the
description {\tt pn(j)} would be equivalent to the description
%
\begin{verbatim}
  synsem:(syn:np,sem:j),qstore:e_list
\end{verbatim}

It is usually a good idea to use macros whenever the same description
is going to be re-used frequently.  Not only does this make the
grammars and programs more readable, it reduces the number of simple
typing errors that lead to inconsistencies.

As is to be expected, macros can't be recursive.  That is, a macro,
when expanded, is not allowed to invoke itself, as in the {\em
ill-formed} example:
%
\begin{verbatim}
  infinite_list(Elt) macro
    hd:Elt,
    tl:infinite_list(Elt)
\end{verbatim}
%
The reason is simple;  it is not possible to expand this macro to a
finite description.  Thus all recursion must occur in grammars or
programs;  it can't occur in either the appropriateness conditions or
in macros.  

The user should note that variables in the scope of a macro are not
the same as {\sc ale} feature structure variables --- they denote
where macro-substitutions of parameters are made, {\em not}\/
instances of re-entrancy in a feature structure.  If we employ
the following macro:
%
\begin{verbatim}
  blah(X) macro
    b,
    f: X,
    g: X.
\end{verbatim}
%
with the argument {\tt (c,h:a)} for example we obtain the following
feature structure:
%
\begin{verbatim}
  b
  F c
    H a
  G c
    H a
\end{verbatim}
%
where the values of {\tt F} and {\tt G} are not shared (unless {\tt c}
and {\tt a} are extensional).  We can, of course, create a shared
structure using {\tt blah}, by including an {\sc ale} variable in the
actual argument to the macro.  Thus {\tt blah((Y,c,h:a))} yields:
%
\begin{verbatim}
  b
  F [0] c
    H a
  G [0]
\end{verbatim}

Because programming with lists is so common, {\sc ale} has a special
macro\label{LIST} for it, based on the Prolog list notation.  A
description may also take any of the forms on the left, which will be
treated equivalently to the descriptions on the right in the following
diagram:
%
\begin{verbatim}
  []                e_list

  [H|T]             (hd:H, 
                     tl:T)

  [A1,A2,...,AN]    (hd:A1,
                     tl:(hd:A2,
                         tl: ...
                                 tl:(hd:AN,
                                     tl:e_list)...))

  [A1,...,AN|T]      (hd:A1,
                     tl:(hd:A2,
                         tl: ...
                                 tl:(hd:AN,
                                     tl:T)...))
\end{verbatim}
%
Note that this built-in macro does not require the macro operator {\tt
@}.  Thus, for example, the description {\tt [a|T3]} is equivalent to
{\tt hd:a,tl:T3}, and the description {\tt [a,b,c]} is equivalent to
{\tt hd:a,tl:(hd:b,tl:(hd:c,tl:e\_list))}.  There are many example of
this use of Prolog's list notation in the grammars in the appendix.

\mysection{Functional Descriptions}

{\sc ale} also provides the means to define functions mapping
descriptions to descriptions.  The syntax is:
%
\begin{verbatim}
  <func_def> ::= <func_spec> +++> <desc>.

  <func_spec> ::= <func_name>
                 | <func_name>(<seq(desc)>)
\end{verbatim}
%
Functional descriptions are compiled into code that is evaluated at
run-time, first adding to each argument (if any) the description given
for that argument, and then evaluating to the resulting description,
which can itself include other functional descriptions, including
recursive calls.  As an example, one may consider the {\tt append/2}
function:\label{APPEND-DEF}
%
\begin{verbatim}
append([],L) +++> L.
append([X|L1],L2) +++> [X|append(L1,L2)].
\end{verbatim}
%
The lists shown here are instances of {\sc ale}'s list macro notation.
Notice that the only type checking in this definition is performed by
appropriateness and the list macro, so the first clause could succeed
with any {\tt L}.  To add more, one could redefine the first clause
as:
%
\begin{verbatim}
append([],(list,L)) +++> L.
\end{verbatim}
%
Functional descriptions can be used wherever other descriptions can.
The user should be particularly careful with ensuring that the
arguments to a functional description call will be sufficiently
instantiated for the call to terminate; and the clauses, correctly
ordered for the description to terminate correctly.  Type checking
cannot ensure this, in general.  For example, in the definition for
{\tt append/2}, with or without explicit type-checking on the second
argument, both clauses will match a functional description whose first
argument or any {\sc tl} value in the first argument is strictly of
type {\tt list}, i.e.,  a list that is not known to be {\tt elist} or
{\tt nelist}.  Such a functional description will, thus, evaluate to
an infinite number of results.  Clauses in functional descriptions are
considered in the order they are given; and the search for solutions
{\it always} backtracks into subsequent clauses.

Any functional description that can be defined so that its argument
descriptions are only variables, and its result has no (mutual)
recursion should be defined as a macro instead, which is completely
expanded at compile-time.  Remember, however, that it is not always
sufficient to replace the {\tt +++>} with {\tt macro}: variable
replacement in macros works by true textual replacement, whereas the
variables in functional descriptions are {\sc ale} descriptions.  For
example, the functional description:
%
\begin{verbatim}
foo(X) +++> (bar,f:X,g:X).
\end{verbatim}
%
evaluates to a feature structure with a re-entrancy.  The macro:
%
\begin{verbatim}
foo(X) macro (bar,f:X,g:X).
\end{verbatim}
%
does not necessarily evaluate to a feature structure with a
re-entrancy.  The functional description, {\tt foo/1} is correctly
converted to the macro:
%
\begin{verbatim}
foo(X) macro (bar,f:(Y,X),g:(Y,X)).
\end{verbatim}
%
The {\sc ale} variable, {\tt Y}, establishes the re-entrancy that the macro
variable, {\tt X}, does not.

\mysection{Type Constraints}

Our logical language of descriptions can be used with the type system
in order to enforce constraints on feature structures of a particular
type.  Constraints are attached to types, and may consist of arbitrary
descriptions.  Their effect is to require every structure of the
constrained type to always satisfy the constraint description.  

Constraints are enforced using the {\tt cons}\label{CONS} operator,
e.g.:
%
\begin{verbatim}
  bot sub [a,b].
    a sub []
      intro [f:b,g:b].
    b sub [].
  a cons (f:X,g:=\= X).
\end{verbatim}
%
The constraint on the type {\tt a} (which must occur within
parentheses) requires all feature structures of type {\tt a} to have
non-token-identical values for features {$f$} and {$g$}.  Notice
that the type {\tt b} has no constraints expressed.  This is
equivalent to specifying the constraint:
%
\begin{verbatim}
  b cons bot.
\end{verbatim}
%
which is satisfied by any feature structure (of type {\tt b}).  A type
constraint may use any of the operators in the description language,
including further type descriptions, which may themselves be
constrained.  The type, {\tt bot}, may not have type constraints, nor
may {\tt a\_/1} atoms.

It is crucial that the type descriptions be finitely resolvable.
Because simple depth-first search is used to evaluate constraints,
infinite resolution paths will cause the system to hang.  For example,
the following signature should not contain the following constraints:
%
\begin{verbatim}
  bot sub [a,b].
    a sub [c]
      intro [f:bot].
      c sub [].
    b sub []
      intro [g:bot].
  a cons f:b.
  b cons g:c.
\end{verbatim}
%
This is because {\tt a} subsumes {\tt c}.  Notice, however, that type
constraints can be used to provide additional information regarding
value restrictions on appropriate features.  In general, {\sc ale}
performs more efficiently when restrictions are provided in the
appropriateness conditions, rather than in general constraints; but
type constraints can encode a greater variety of restrictions.
Specifically, they allow constraints to express path equations and
inequations, as well as deeper path restrictions.  Constraints may
include relational constraints, which are defined using definite
clauses, which are discussed below.  Type constraints are efficiently
compiled in the same way as other descriptions.  Also, like
appropriateness conditions, they are only enforced once for any
given structure.
% future version: error flagging when constraint is inconsistent with
%  appropriateness conditions
%                 error flagging cyclicity of constraints?

It is also important to note that because of the delay in {\sc ale}'s
inequational enforcement, type constraints that involve recursion that
terminates by an inequation failure may go into infinite loops due to
this delay in enforcement.  Because extensionality is only enforced
before the answer to a top-level query is given, recursive type
constraints that rely on the extensional identity of two feature
structures to terminate on the basis of their type will not terminate.

\mysection{Example: The Zebra Puzzle}

We now provide a simplified form of the Zebra Puzzle
(Figure~\ref{zebra-fig}), a common puzzle for constraint resolution.
This puzzle was solved by A{\"{\i}}t-Kaci (1984) using roughly the
same methods as we use here.  The puzzle illustrates the expressive
power of the combination of extensional types, inequations and type
constraints.  Such puzzles, sometimes known as logic puzles or
constraint puzzles, require one to find a state of affairs within some
situation that simultaneously satisfies a set of constraints.  The
situation itself very often implicitly levies certain constraints.
%
\begin{figure}
{\em Puzzle}:
Three different houses each contain a different pet, a different
drink, and an inhabitant of a different nationality.  The following
six facts are true about these houses:
\begin{enumerate}
\item The man in the third (middle) house drinks milk.
\item The Spaniard owns the dog.
\item The Ukranian drinks the tea.
\item The Norwegian lives in the first house.
\item The Norwegian lives next to the tea-drinker.
\item The juice-drinker owns the fox.
\end{enumerate}
{\em Questions}: Who owns the zebra?  Who drinks juice? 
\caption{The Zebra Puzzle.}\label{zebra-fig}
\end{figure}
%

We can represent the simplified Zebra Puzzle in {\sc ale} as:
%
\begin{verbatim}
  % Subsumption
  %=======================
  
  bot sub [house,descriptor,background].
  
    descriptor sub [nat_type,ani_type,bev_type].
      nat_type sub [norwegian,ukranian,spaniard].
        norwegian sub [].
        ukranian sub [].
        spaniard sub [].
      ani_type sub [fox,dog,zebra].
        fox sub [].
        dog sub [].
        zebra sub [].
      bev_type sub [juice,tea,milk].
        juice sub [].
        tea sub [].
        milk sub [].
  
    house sub []
        intro [nationality:nat_type,animal:ani_type,beverage:bev_type].
  
    background sub [clue]
               intro [house1:house,house2:house,house3:house].
      clue sub [maximality].
        maximality sub [].
  
  ext([norwegian,ukranian,spaniard,fox,dog,zebra,juice,tea,milk]).
  
  % Constraints
  %=============================
  background cons
    (house1:nationality:N1,                          % inequational constraints
     house2:nationality:(N2,(=\= N1)),
     house3:nationality:((=\= N1),(=\= N2)),
  
     house1:animal:A1,
     house2:animal:(A2,(=\= A1)),
     house3:animal:((=\= A1),(=\= A2)),
  
     house1:beverage:B1,
     house2:beverage:(B2,(=\= B1)),
     house3:beverage:((=\= B1),(=\= B2))).
  
  clue cons
    (house3:beverage:milk,                             % clue 1
  
     (house1:nationality:spaniard,house1:animal:dog    % clue 2
     ;house2:nationality:spaniard,house2:animal:dog
     ;house3:nationality:spaniard,house3:animal:dog),
  
     (house1:nationality:ukranian,house1:beverage:tea  % clue 3
     ;house2:nationality:ukranian,house2:beverage:tea
     ;house3:nationality:ukranian,house3:beverage:tea),
  
     house1:nationality:norwegian,                     % clue 4
  
     (house1:nationality:norwegian,house2:beverage:tea % clue 5
     ;house2:nationality:norwegian,house3:beverage:tea
     ;house2:nationality:norwegian,house1:beverage:tea
     ;house3:nationality:norwegian,house2:beverage:tea),

     (house1:beverage:juice,house1:animal:fox          % clue 6
     ;house2:beverage:juice,house2:animal:fox
     ;house3:beverage:juice,house3:animal:fox)).
  
  
  maximality cons
    (house1:nationality:(norwegian;ukranian;spaniard), % maximality constraints
     house2:nationality:(norwegian;ukranian;spaniard),
     house3:nationality:(norwegian;ukranian;spaniard),
  
     house1:animal:(fox;dog;zebra),
     house2:animal:(fox;dog;zebra),
     house3:animal:(fox;dog;zebra),
  
     house1:beverage:(juice;tea;milk),
     house2:beverage:(juice;tea;milk),
     house3:beverage:(juice;tea;milk)).
\end{verbatim}
%    
The subsumption hierarchy is shown pictorially in
Figure~\ref{inher-fig}.  The type, {\tt background}, with the
assistance of the types subsumed by {\tt house} and {\tt descriptor},
represents the situation of three houses (the features of {\tt
  background}), each of which has three attributes (the features of
{\tt house}).  The implicit constraints levied by the situation appear
as constraints on the type, {\tt background}, namely that no two
houses can have the same value for any attribute.  These are
represented by inequations.  But notice that, since we are interested
in treating the values of attributes as tokens, we must represent
those values by extensional types.  If we had not done this, then we
could still, for example, have two different houses with the beverage,
{\tt juice}, since there could be two different feature structures of
type {\tt juice} that were not token-identical.  Notice also that all
of these types are maximal, and hence satisfy the restriction that
{\sc ale} places on extensional types.

The explicit constraints provided by the clues to the puzzle are
represented as constraints on the type {\tt clue}, a subtype of {\tt
background}.  We also need a subtype of {\tt clue}, {\tt maximality},
to enforce another constraint implicit to all constraint puzzles,
namely the one which requires that we provide only maximally specific
answers, rather than vague solutions which say, for example, that the
beverage for the third house is a type of beverage ({\tt bev\_type}),
which may actually still satisfy a puzzle's constraints.

To solve the puzzle, we simply type:
%
\begin{verbatim}
| ?- mgsat maximality.

MOST GENERAL SATISFIER OF: maximality

maximality
HOUSE1 house
       ANIMAL fox
       BEVERAGE juice
       NATIONALITY norwegian
HOUSE2 house
       ANIMAL zebra
       BEVERAGE tea
       NATIONALITY ukranian
HOUSE3 house
       ANIMAL dog
       BEVERAGE milk
       NATIONALITY spaniard


ANOTHER?  y.

no
| ?- 
\end{verbatim}
%
So the Ukranian owns the zebra, and the Norwegian drinks juice.  A
most general satisfier of {\tt maximality} will also satisfy the
constraints of its supertypes, {\tt background} and {\tt clue}.
%
\begin{figure}
\setlength{\unitlength}{.5cm}
%\begin{center}
   \begin{picture}(30,15)(0,1)
     \put(16,1){\makebox(0,0){\type{$\bot$}}}
%
     \put(13,6){\makebox(0,0){\type{descriptor}}}
%
     \put(4,6){\makebox(0,0){\type{house}}}
     \put(4,5){\makebox(0,0)[r]{\feat{nationality}}}
     \put(4,4){\makebox(0,0)[r]{\feat{animal}}}
     \put(4,3){\makebox(0,0)[r]{\feat{beverage}}}
%
     \put(25,6){\makebox(0,0){\type{background}}}
     \put(25,5){\makebox(0,0)[l]{\feat{house1}}}
     \put(25,4){\makebox(0,0)[l]{\feat{house2}}}
     \put(25,3){\makebox(0,0)[l]{\feat{house3}}}
%
     \put(6,10){\makebox(0,0){\type{nat-type}}}
     \put(11,10){\makebox(0,0){\type{ani-type}}}
     \put(16,10){\makebox(0,0){\type{bev-type}}}
     \put(25,10){\makebox(0,0){\type{clue}}}
%
     \put(7,12){\makebox(0,0){\it{spaniard}}}
     \put(7,13){\makebox(0,0){\it{ukranian}}}
     \put(7,14){\makebox(0,0){\it{norwegian}}}
%
     \put(12,12){\makebox(0,0){\it{dog}}}
     \put(12,13){\makebox(0,0){\it{fox}}}
     \put(12,14){\makebox(0,0){\it{zebra}}}
%
     \put(25,14){\makebox(0,0){\type{maximality}}}
%
     \put(17,12){\makebox(0,0){\it{tea}}}
     \put(17,13){\makebox(0,0){\it{milk}}}
     \put(17,14){\makebox(0,0){\it{orange-juice}}}
%
     \put(16,2){\line(-4,1){12}}
     \put(16,2){\line(-1,1){3}}
     \put(16,2){\line(3,1){9}}
%
     \put(13,7){\line(-3,1){6}}
     \put(13,7){\line(-1,1){2}}
     \put(13,7){\line(1,1){2}}
%
     \put(25,7){\line(0,1){2}}
     \put(25,11){\line(0,1){2}}
%
     \put(4,10){\line(0,1){4}}
     \put(4,10){\line(1,0){.2}}
     \put(4,12){\line(1,0){.4}}
     \put(4,13){\line(1,0){.4}}
     \put(4,14){\line(1,0){.4}}
%
     \put(10.3,10.5){\line(0,1){3.5}}
     \put(10.3,12){\line(1,0){.4}}
     \put(10.3,13){\line(1,0){.4}}
     \put(10.3,14){\line(1,0){.4}}
%
     \put(14,10){\line(0,1){4}}
     \put(14,10){\line(1,0){.1}}
     \put(14,12){\line(1,0){.4}}
     \put(14,13){\line(1,0){.4}}
     \put(14,14){\line(1,0){.4}}
%
   \end{picture}
%\end{center}
\caption{Inheritance Network for the Zebra Puzzle.}
\label{inher-fig}
\end{figure}%
%

Although {\sc ale} is capable of representing such puzzles and solving
them, it is not actually very good at solving them efficiently.  To
solve such puzzles efficiently, a system must determine an optimal
order in which to satisfy all of the various constraints.  {\sc ale}
does not do this since it can express definite clauses in its
constraints, and the reordering would also be very difficult for the
user to keep track of while designing a grammar.  A system that does
do this is the general constraint resolver proposed by Penn and
Carpenter (1993)%
%
\footnote{This system was actually the precursor to {\sc ale}.
It implemented a completely reversible constraint-based
parser/generator with a weighting on the constraints based on their
maximal non-determinism.  Re-ordering constraints, however, proved to
be insufficient for efficient parsing or generation, compared to a
uni-directional system such as {\sc ale}.}.
%

\mychapter{Definite Clauses}

The next two sections, covering the constraint logic programming
and phrase structure components of {\sc ale}, simply describe how to
write {\sc ale} programs and how they will be executed.  Discussion of
interacting with the system itself follows the description of the
programming language {\sc ale} provides.

The definite logic programming language built into {\sc ale} is a
constraint logic programming ({\sc clp}) language, where the
constraint system is the attribute-value logic described above.  Thus,
it is very closely related to both Prolog and {\sc login}.  Like
Prolog, definite clauses may be defined with disjunction, negation and
cut.  The definite clauses of {\sc ale} are executed in a
depth-first, left to right search, according to the order of clauses
in the database.  {\sc ale} performs last call optimization, but does
not perform any clause indexing.%
%
\footnote{Thus, additional cuts might be necessary to ensure
determinism, so that last call optimization is effective.}
%
Those familiar with Prolog should have no trouble adapting that
knowledge to programming with definite clauses in {\sc ale}.  The only
significant difference is that first-order terms are replaced with
descriptions of feature structures.

While it is not within the scope of this user's guide to detail the
logic programming paradigm, much less {\sc clp}, this section will
explain all that the user familiar with logic programming needs to
know to exploit the special features of {\sc ale}.  For background,
the user is encouraged to consult Sterling and Shapiro (1986) with
regard to general logic programming techniques, most of which are
applicable in the context of {\sc ale}, and A\"{\i}t-Kaci and Nasr
(1986) for more details on programming with sorted feature structures.
For more advanced material on programming in Prolog with a compiler,
see O'Keefe (1990).  The general theory of {\sc clp} is developed in a
way compatible with {\sc ale} in H\"ohfeld and Smolka (1988).  Of
course, since {\sc ale} is literally an implementation of the theory
found in Carpenter (1992), the user is strongly encouraged to consult
Chapter 14 of that book for full theoretical details.

The syntax of {\sc ale}'s logic programming component is broadly
similar to that of Prolog, with the only difference being that
first-order terms are replaced with attribute-value logic
descriptions.  The language in which clauses are expressed in {\sc
ale} is given in {\sc bnf} as:
%
\begin{verbatim}
  <clause> ::= <literal> if <goal>.

  <literal> ::=  <pred_sym>
              |  <pred_sym>(<seq(desc)>)

  <goal> ::= true
           | <literal>
           | (<goal>,<goal>)
           | (<goal>;<goal>)
           | (<desc> =@ <desc>)
           | (<cut_free_goal> -> <goal>)
           | (<cut_free_goal> -> <goal> ; <goal>)
           | !
           | (\+ <goal>)
           | prolog(<prolog_goal>)
\end{verbatim}
%
Just as in Prolog, predicate symbols must be Prolog atoms.  This is a
more restricted situation than the definite clause language discussed
in Carpenter (1992), where literals were also represented as feature
structures and described using attribute-value logic.  Also note that
{\sc ale} requires every clause to have a body, which might simply be
the goal {\tt true}.  There must be whitespace around the {\tt if}
operator, but none is required around the conjunction comma, the
disjunction semicolon, the cut or shallow cut symbols {\tt !,->}, or
the unprovability symbol {\tt $\setminus$+}.  Parentheses, in general,
may be dropped and reconstructed based on operator precedences.  The
precedence is such that the comma binds more tightly than the
semicolon, while the unprovability symbol binds the most tightly.
Both the semicolon and comma are right associative.

The operational behavior of {\sc ale} is nearly identical to Prolog
with respect to goal resolution.  That is, it evaluates a sequence of
goals depth-first, from the left to right, using the order of clauses
established in the program.  The only difference arises from the fact
that, in Prolog, terms cannot introduce non-determinism.  In {\sc
ale}, due to the fact that disjunctions can be nested inside of
descriptions, additional choice points might be created both in
matching literals against the heads of clauses and in expanding the
literals within the body of a clause.  In evaluating these choices,
{\sc ale} maintains a depth-first left to right strategy.  

We begin with a simple example, the {\tt member/2} predicate:%
%
\footnote{As in Prolog, we refer to predicates by their name and
arity.} 
%
\begin{verbatim}
  member(X,hd:X) if 
    true.
  member(X,tl:Xs) if 
    member(X,Xs).
\end{verbatim}
%
As in Prolog, {\sc ale} clauses may be read logically, as
implications, from right to left.  Thus the first clause above states
that {\tt X} is a member of a list if it is the head of a
list\label{IF}.  The second clause states that {\tt X} is a member of
a list if {\tt X} is a member of the tail of the list, {\tt Xs}.  Note
that variables in {\sc ale} clauses are used the same way as in
Prolog, due to the notational convention of our description language.
Further note that, unlike Prolog, {\sc ale} requires a body for every
clause.  In particular, note that the first clause above has the
trivial body {\tt true}\label{TRUE}.  The compiler is clever enough to
remove such goals at compile time, so they do not incur any run-time
overhead.

Given the notational convention for lists built into {\sc ale}, the
above program could equivalently be written as:
%
\begin{verbatim}
  member(X,[X|_]) if
    true.
  member(X,[_|Xs]) if
    member(X,Xs).
\end{verbatim}
%
But recall that {\sc ale} would expand {\tt [X|\_]} as {\tt
(hd:X,tl:\_)}.  Not only does {\sc ale} not support anonymous variable
optimizations, it also creates a conjunction of two descriptions,
where {\tt hd:X} would have sufficed.  Thus the first method is not
only more elegant, but also more efficient.  

Due to the fact that lists have little hierarchical structure, list
manipulation predicates in {\sc ale} look very much like their
correlates in Prolog.  They will also execute with similar
performance.  But when the terms in the arguments of literals have
more interesting taxonomic structure, {\sc ale} actually provides a
gain over Prolog's evaluation method, as pointed out by A\"{\i}t-Kaci
and Nasr (1986).  Consider the following fragment drawn from the
syllabification grammar in the appendix, in which there is a
significant interaction between the inheritance hierarchy and the
definite clause {\tt less\_sonorous/2}:
%
\begin{verbatim}
  segment sub [consonant,vowel].
    consonant sub [nasal,liquid,glide].
      nasal sub [n,m].
        n sub [].
        m sub [].
      liquid sub [l,r].
        l sub [].
        r sub [].
      glide sub [y,w].
        y sub [].
        w sub [].
    vowel sub [a,e,i]
      a sub [].
      e sub [].
      i sub [].

  less_sonorous_basic(nasal,liquid) if true.
  less_sonorous_basic(liquid,glide) if true.
  less_sonorous_basic(glide,vowel) if true.

  less_sonorous(L1,L2) if
    less_sonorous_basic(L1,L2).
  less_sonorous(L1,L2) if
    less_sonorous_basic(L1,L3),
    less_sonorous(L3,L2).
\end{verbatim}
%
For instance, the third clause of {\tt less\_sonorous\_basic/2}, being
expressed as a relation between the types {\tt glide} and {\tt vowel},
allows solutions such as {\tt less\_sonorous\_basic(w,e)}, where {\tt
glide} and {\tt vowel} have been instantiated as the particular
subtypes {\tt w} and {\tt e}.  This fact would not be either as
straightforward or as efficient to code in Prolog, where relations
between the individual letters would need to be defined.  The loss in
efficiency stems from the fact that Prolog must either code all 14
pairs represented by the above three clauses and type hierarchy, or
perform additional logical inferences to infer that {\tt w} is a
glide, and hence less sonorous than the vowel {\tt e}.  {\sc ale}, on
the other hand, performs these operations by unification, which, for
types, is a simple table look-up.%
%
\footnote{Table look-ups involved in unification in {\sc ale} rely on
double hashing, once for the type of each structure being unified.}
%
All in all, the three clauses for {\tt less\_sonorous\_basic/2} given
above represent relations between 14 pairs of letters.  Of course, the
savings is even greater when considering the transitive closure of
{\tt less\_sonorous\_basic/2}, given above as {\tt less\_sonorous/2}, and
would be greater still for a type hierarchy involving a greater degree
of either depth or branching.  

While we do not provide examples here, suffice it to say that
cuts\label{CUT}, shallow cuts\label{SHALLOW},
negation\label{NEGATION}, conjunction\label{COMMADCL} and
disjunction\label{SEMIDCL} work exactly the same as they do in Prolog.
In particular, cuts conserve stack space representing backtracking
points, disjunctions create choice points and negation is evaluated by
failure, with the same results on binding as in Prolog.

The definite clause language also allows arbitrary prolog goals, using
the predicate, {\tt prolog(<prolog\_goal>)}\label{PROLOG}.  This is
perhaps most useful when used with the Prolog predicates, {\tt
assert}\label{ASSERT} and {\tt retract}\label{RETRACTPRO}, which provide
{\sc ale} users with access to the Prolog database, and with I/O
statements, which can be quite useful for debugging definite clauses.

Should {\tt prolog\_goal} contain a variable that has been
instantiated to an {\sc ale} feature structure, this will appear to
Prolog as {\sc ale}'s internal representation of that feature
structure.  Feature structures can be asserted and retracted, however,
without regard to their internal structure.  The details of {\sc
ale}'s internal representation of feature structures can be found in
Carpenter and Penn (1996), and briefly on p.~\pageref{INTERNAL-REP}.

Another side effect of not directly attaching inequations to feature
structures is that if a feature structure with inequations is asserted
and a copy of it is later instantiated from the Prolog database or
retracted, the copy will have lost the inequations.  In general,
passing feature structures with inequations to Prolog hooks should be
avoided.

Because the enforcement of extensionality is delayed in
{\sc ale}, a variable which is instantiated to an extensionally typed
feature structure and then passed to a prolog hook may also not
reflect token identities as a result of extensionality.  Provided that
there are no inequations (to which the user does not have direct
access), this can be enforced within the hook by calling the {\sc ale}
internal predicate {\tt extensionalise(FS,[])}.

There is a special literal predicate, {\tt =@}\label{EQAT}, used with
infix notation, which is true when its arguments are token-identical.
As with inequations, which forbid token-identity, the {\tt =@}
operator is of little use unless multiply occurring variables are used
in its arguments' descriptions.  Note, however, that while inequations
(\verb+=\=+) and path equations (\verb+==+) are part of the description
language, {\tt =@} is a definite clause predicate, and cannot be used
as a description.  It is more important to note that while the
negation of the structure-identity operator (\verb+==+), namely the
inequation (\verb+=\=+), is monotonic when interpreted persistently,
the negation of the token-identity operator ({\tt =@}), achieved by
using it inside the argument of the \verb1\+1 operator, is
non-monotonic, and thus its use should be avoided.

It is significant to note that clauses in {\sc ale} are truly definite
in the sense that only a single literal is allowed as the {\em head}
of a clause, while the {\em body} can be a general goal.  In
particular, disjunctions in descriptions of the arguments to the head
literal of a clause are interpreted as taking wide scope over the
entire clause, thus providing the effect of multiple solutions rather
than single disjunctive solutions.  The most simple example of this
behavior can be found in the following program:
%
\begin{verbatim}
  foo((b;c)) if true.

  bar(b) if true.

  baz(X) if foo(X), bar(X).
\end{verbatim}
%
Here the query {\tt foo(X)} will provide two distinct solutions, one
where {\tt X} is of type {\tt b}, and another where it is of type {\tt
c}.  Also note that the queries {\tt foo(b)} and {\tt foo(c)} will
succeed.  Thus the disjunction is equivalent to the two single clauses:
%
\begin{verbatim}
  foo(b) if true.
  foo(c) if true.
\end{verbatim}
%
In particular, note that the query {\tt baz(X)} can be solved, with
{\tt X} instantiated to an object of type {\tt b}.  In general, using
embedded disjunctions will usually be more efficient than using multiple
clauses in {\sc ale}, especially if the disjunctions are deeply nested
late in the description.  On the other hand, cuts can be inserted for
control with multiple clauses, making them more efficient in some
cases.

\mysection{Type Constraints Revisited}

The type constraints mentioned in the last chapter can also
incorporate relational constraints defined by definite clauses, with
the optional operator {\tt goal}\label{GOAL}.  Consider the following
example from {\sc hpsg}:
%
\begin{verbatim}
  word cons W
       goal (single_rel_constraint(W),
             clausal_rel_prohibition(W)).
\end{verbatim}
%
In this example, the constraint from the description language is
simply the variable {\tt W}, which is used to match any feature
structure of type {\tt word}.  That feature structure is then passed
as an argument to the two procedural attachments {\tt
single\_rel\_constraint/1} and {\tt clausal\_rel\_prohibition/1},
which each represent a principle from {\sc hpsg} which governs words
(among other objects).  Notice that the goal, when non-literal, must
occur within parentheses.

While every type constraint must have a description, procedural
attachments are optional.  If they do occur, they occur after the
description.  The syntax is given in {\sc bnf} as:
%
\begin{verbatim}
  <cons_spec> ::= <type> cons <desc>
                | <type> cons <desc>
                         goal <goal>
\end{verbatim}


\mychapter{Phrase Structure Grammars}

The {\sc ale} phrase structure processing component is loosely based
on a combination of the functionality of the {\sc patr-ii} system and
the {\sc dcg} system built into Prolog.  Roughly speaking, {\sc ale}
provides a system like that of {\sc dcg}s, with two primary
differences.  The first difference stems from the fact that {\sc ale}
uses attribute-value logic descriptions of typed feature structures
for representing categories and their parts, while {\sc dcg}s use
first-order terms (or possibly cyclic variants thereof).  The second
primary difference is that {\sc ale}'s parser uses a bottom-up active
chart parsing algorithm and a semantic-head-driven generator rather
than encoding grammars directly as Prolog clauses and evaluating them
top-down and depth-first.  In the spirit of {\sc dcg}s, {\sc ale}
allows definite clause procedures to be attached and evaluated at
arbitrary points in a phrase structure rule, the difference being that
these rules are given by definite clauses in {\sc ale}'s logic
programming system, rather than directly in Prolog.

Phrase structure grammars come with two basic components, one for
describing lexical entries and empty categories, and one for
describing grammar rules.  We consider these components in turn.

\mysection{Lexical Entries}

Lexical entries\label{ARROW} in {\sc ale} are specified as rewriting
rules, as given by the following {\sc bnf} syntax:
%
\begin{verbatim}
  <lex_entry> ::=  <word> ---> <desc>.
\end{verbatim}
%
For instance, in the categorial grammar lexicon in the appendix, the
following lexical entry is provided, along with the relevant macros:
%
\begin{verbatim}
  john ---> 
    @ pn(j).

  pn(Name) macro
    synsem: @ np(Name),
    @ quantifier_free.

  np(Ind) macro
    syn:np,
    sem:Ind.

  quantifier_free macro
    qstore:[].
\end{verbatim}
%
Read declaratively, this rule says that the word {\tt john} has as its
lexical category the most general satisfier of the description
{\tt @ pn(j)}, which is:
%
\begin{verbatim}
  cat
  SYNSEM basic
         SYN np
         SEM j
  QSTORE e_list
\end{verbatim}
%
Note that this lexical entry is equivalent to that given without
macros by: 
%
\begin{verbatim}
  john --->
    synsem:(syn:np,
            sem:j),
    qstore:e_list.
\end{verbatim}
%
Macros are useful as a method of organizing lexical information to
keep it consistent across lexical entries.  The lexical entry for the
word {\tt runs} is:
%
\begin{verbatim}
  runs ---> @ iv((run,runner:Ind),Ind).

  iv(Sem,Arg) macro
    synsem:(backward,
            arg: @ np(Arg),
            res:(syn:s,
                 sem:Sem)),
    @ quantifier_free.
\end{verbatim}
%
This entry uses nested macros along with structure sharing, and
expands to the category:
%
\begin{verbatim}
  cat
  SYNSEM backward
         ARG synsem
             SYN np
             SEM [0] sem_obj
         RES SYN s
             SEM run
                 RUNNER [0]
  QSTORE e_list
\end{verbatim}
%
It also illustrates how macro parameters are in fact treated as
variables.  

Multiple lexical entries may be provided for each word.  Disjunctions
may also be used in lexical entries, but are expanded out at
compile-time.  Thus the first three lexical entries, taken together,
compile to the same result as the fourth:
%
\begin{verbatim}
  bank ---> 
    syn:noun, 
    sem:river_bank.
  bank ---> 
    syn:noun, 
    sem:money_bank.
  bank ---> 
    syn:verb, 
    sem:roll_plane.

  bank ---> 
    ( syn:noun, 
      sem:( river_bank
          ; money_bank
          )
    ; syn:verb, 
      sem:roll_plane
    ).
\end{verbatim}
%
Note that this last entry uses the standard Prolog layout conventions
of placing each conjunct and disjunct on its own line, with commas at
the end of lines, and disjunctions set off with vertically aligned
parentheses at the beginning of lines.

The compiler finds all the most general satisfiers for lexical entries
at compile time, reporting on those lexical entries that have
unsatisfiable descriptions.  In the above case of {\tt bank}, the
second combined method is marginally faster at compile-time, but their
run-time performance is identical.  The reason for this is that both
entries have the same set of most general satisfiers.

{\sc ale} supports the construction of large lexica, as it relies on
Prolog's hashing mechanism to actually look up a lexical entry for a
word during bottom-up parsing.  For generation, {\sc ale} indexes
lexical entries for faster unification, as described in Penn and
Popescu (1997).  Constraints on types can also be used to enforce
conditions on lexical representations, allowing for further
factorization of information.

\mysection{Empty Categories}

{\sc ale} allows the user to specify certain categories as occurring
without any corresponding surface string.  These are usually referred
to somewhat misleadingly as {\em empty categories}, or sometimes as
{\em null productions}.  In {\sc ale}, they are supported by a special
declaration of the form\label{EMPTYSIG}:
%
\begin{verbatim}
  empty <desc>.
\end{verbatim}
%
Where {\tt <desc>} is a description of the empty category.  

For example, a common treatment of bare plurals is to hypothesize an
empty determiner.  For instance, consider the contrast between the
sentences {\em kids overturned my trash cans} and {\em a kid
overturned my trash cans}.  In the former sentence, which has a plural
subject, there is no corresponding determiner.  In our categorial
grammar, we might assume an empty determiner with the following
lexical entry (presented here with the macros expanded):
%
\begin{verbatim}
  empty @ gdet(some).

  gdet(Quant) macro
    synsem:(forward,
            arg:(syn:(n,
                      num:plu),
                 sem:(body:Restr,
                      ind:Ind)),
            res:(syn:(np,
                      num:plu),
                 sem:Ind),
    qstore:[ (Quant,
              var:Ind,
              restr:Restr) ].
\end{verbatim}
%
Of course, it should be noted that this entry does not match the
type system of the categorial grammar in the appendix, as it assumes a
number feature on nouns and noun phrases.

Empty categories are expensive to compute under a bottom-up parsing
scheme such as the one used in {\sc ale}.  The reason for this is that
these categories can be used at every position in the chart during
parsing (with the same begin and end points).  If the empty categories
cause local structural ambiguities, parsing will be slowed down
accordingly as these structures are calculated and then propagated.
Consider the empty determiner given above.  It can be used as an
inactive edge at every node in the chart, then match the forward
application rule scheme and search through every edge to its right
looking for a nominal complement.  If there are relatively few
nouns in a sentence, not many noun phrases will be created by this
rule and thus not many structural ambiguities will propagate.  But in
a sentence such as {\em the kids like the toys}, there will be an edge
spanning {\em kids like the toys} corresponding to an empty determiner
analysis of {\em kids}.  The corresponding noun phrase created
spanning {\em toys} will not propagate any further, as there is no way
to combine a noun phrase with the determiner {\em the}.  But now
consider the empty slash categories of form $X/X$ in {\sc gpsg}.
These categories, when coupled with the slash passing rules, would
roughly double parsing time, even for sentences that can be analyzed
without any such categories.  The reason is that these empty
categories are highly underspecified and thus have many options for
combinations.  Thus empty categories should be used sparingly, and
prefarably in environments where their effects will not propagate.

Another word of caution is in order concerning empty categories: they
can occur in constructions with other empty categories.  For instance,
if we specify categories $C_{1}$ and $C_{2}$ as empty categories, and
have a rule that allows a $C$ to be constructed from a $C_{1}$ and a
$C_{2}$, then $C$ will act as an empty category, as well.  These
combinations of empty categories are computed at compile-time; but the
sheer number of empty categories produced under this closure may be a
processing burden if they apply at run-time too productively.  Keep in
mind that {\sc ale} computes all of the inactive edges that can be
produced from a given input string, so there is no way of eliminating
the extra work produced by empty categories interacting with other
categories, including empty ones.


\mysection{Lexical Rules}

Lexical rules provide a mechanism for expressing redundancies in the
lexicon, such as the kinds of inflectional morphology used for word
classes, derivational morphology as found with suffixes and prefixes,
as well as zero-derivations as found with detransitivization,
nominalization of some varieties and so on.  The format {\sc ale}
provides for stating lexical rules is similar to that found in both 
{\sc patr-ii} and {\sc hpsg}.  

In order to implement them efficiently, lexical rules, as well as
their effects on lexical entries, are compiled in much the same way as
grammars.  To enhance their pow