
%
%  $Description: Author guidelines and sample document in LaTeX 2.09$
%
%  $Author: ienne $
%  $Date: 1995/09/15 15:20:59 $
%  $Revision: 1.4 $
%

\documentclass[times, 10pt,twocolumn]{article}
\usepackage{latex8}
\usepackage{times}
\usepackage{psfig,epsfig}
\usepackage{algorithmic}
\usepackage{algorithm}

%\documentstyle[times,art10,twocolumn,latex8]{article}

%-------------------------------------------------------------------------
% take the % away on next line to produce the final camera-ready version
\pagestyle{empty}

%-------------------------------------------------------------------------
\usepackage{setspace}
\doublespace
\begin{document}

\title{Routing of XML and XPath Queries in Data Dissemination Networks}

\author{Shuang Hou \\
Dept. Computer Science\\ University of Toronto, Canada \\ shou@cs.toronto.edu\\
% For a paper whose authors are all at the same institution,
% omit the following lines up until the closing ``}''.
% Additional authors and addresses can be added with ``\and'',
% just like the second author.
\and
Guoli Li\\
Dept. Computer Science\\ University of Toronto, Canada \\ gli@cs.toronto.edu\\
\and H.-A. Jacobsen \\
Dept. Electrical and Computer Engineering \\ University of
Toronto, Canada \\ jacobsen@eecg.utoronto.ca }

\maketitle \thispagestyle{empty}

\begin{abstract}
In XML data dissemination networks, data producers publish XML
documents and data consumers subscribe to documents of interest by
specifying XPath expressions. The problem lies in routing the XML
document throughout a network of XML routers to the interested
consumers. To solve this problem we show how to extract XPath-like
information from document type definitions at the data producer
site. This information is first disseminated throughout the
network and then evaluated against XPath filter expressions
submitted by data consumers signaling interest in receiving
documents. To reduce the amount of information required for
storing routing state at each node in the network, we introduce
covering and merging techniques for XPath expressions. Our
experimental evaluation shows that these optimizations reduce
routing table size by up to 90\%, improve routing performance by
roughly 85\%, and reduce overall network traffic by up to
approximately 35\%.

%Publish/subscribe systems offer a flexible way for information
%producers and information consumers to exchange data.
%Publish/subscribe systems can cope with large amounts of dynamic
%information by exploiting several optimizations, such as
%advertisement, covering, and merging techniques. Motivated by the
%fact that XML has evolved as a standard for publishing and
%exchanging data on the Internet, the routing of XML data has
%become an important research challenge. In this paper, we develop
%algorithms to apply the above optimizations to XML document and to
%XPath query routing. We present schemes to represent non-recursive
%and recursive advertisement relative to XML documents, propose
%several novel routing algorithms for advertisement and
%subscription forwarding, and present covering algorithms for XPath
%queries. We explore merging, and discuss merging rules in the
%context of XML data.

\end{abstract}



%-------------------------------------------------------------------------
\Section{Introduction}

Over the past decade, XML has rapidly evolved as the standard for
data representation and exchange. In XML marked-up message traffic
in intranets and on the Internet range from insurance claims,
health-care requests, corporate memos, online ads to news items
and entertainment information.  The standardization of the mark-up
language, the wide range of related standards, and the wide-spread
adoption of this technology is further amplifying the network
externalities created by this technology.

Data-centric, selective XML-based data dissemination networks are
starting to become a reality. In a dissemination network data,
expressed as XML documents, are routed based on filter expressions
stored at intermediate nodes that indicate where the XML document
is to be routed to. Filter expressions, often expressed as XPath
expressions, are submitted by data consumers who express interest
in receiving certain kinds of documents. For instance, a globally
operating insurance company with many branch offices distributed
world-wide is linked by an overlay network of content-based
routers that comprise the XML document dissemination network. An
insurance claim, an insurance bid, or a request for proposal can
be submitted anywhere into the overlay network (e.g., by a third
party insurance broker or an online client) and is routed towards
a currently online, specific expert employee, speaking the
language of the requester. Note, the latter constrains are
expressed as filter expressions against which the XML document is
evaluated in transit. This design fully decouples information
requester and information provider, avoids a single point of
control and single point of failure, and increases scalability due
to decentralization and distribution.

\begin{figure}[htbp]
\centering \epsfig{file=pub_sub.eps, scale=0.4} \caption{Distributed Dissemination Network}\label{pubsub}
\end{figure}

It is this XML/XPath routing problem that this paper addresses.
More specifically, this paper addresses the problem of efficiently
routing an XML document emitted from a data producers at one point
in the network to a set of data consumers located anywhere
throughout the network. Prior to receiving XML documents,
consumers must have expressed interest in receiving XML documents
by registering XPath expressions with the network. This problem
statement is akin to the well-known publish/subscribe matching
problem. However, the main difference here is that in the case of
data dissemination networks there exists no one single centralized
publish/subscribe system, but a network of content-based routers
(i.e., a network or federation of publish/subscribe systems.)  In
the dissemination network, XML documents are routed based on their
content and not based on IP address information, which is, due to
the completely decoupled design, not available -- all routing
decisions are exclusively based on content information.
Figure~\ref{pubsub} provides an overview of the kind of
dissemination network this paper assumes. In the overlay network
depicted in Figure~\ref{pubsub} each broker only knows their
neighbors (i.e., in terms of IP network address information.)
However, none of the clients -- neither publishing data producers
nor subscribing data consumers know about each other or about the
network topology, except the router they connect to.

In the context of XML-based data dissemination, one of the main
challenges is the ability to efficiently deliver relevant XML
documents to a large and dynamically changing group of consumers.
Centralized XML filtering~\cite{xfilter, yfilter::journal, xtrie}
and distributed query-based XML retrieval
approaches~\cite{diao04towards, chan02tree, nick,
georgia::edbt2004} have found wide-spread interest, but do not
address the distributed, content-based routing problem articulated
above and addressed in this paper. Content-based
routing~\cite{Routing, REBECA, JEDI} in distributed
publish/subscribe architectures, have been studied for
non-XML-based data. Their operational model assumes sets of
attribute-value-pairs and through Boolean operators joined
predicates over these pairs. It is not at all obvious how to
extend these approaches for semi-structured data, especially due
to the hierarchical data model of XML.

In this paper, we develop algorithms for dissemination of XML
documents throughout a network of content-based routers towards
data consumers who have through XPath expressions specified data
needs. First, we demonstrate how to use XML document DTD
information for defining dissemination path. We distinguish
between a non-recursive and a recursive case depending on the DTD
defining the data emitting source. We then develop routing
algorithms for both cases. Second, we present covering algorithms
for XPath queries to reduce the routing table size stored at each
router and speed up routing decisions in the content-based
routers. Third, we discuss the merging of XPath expression
techniques and explore merging rules to further optimize routing.
Finally, we perform a detailed experimental evaluation of our
approach.

The remainder of this paper is organized as follows. Section~2
introduces some definitions and background material. Section~3
reviews related work. Section~4 presents the concept of
advertisement with respect to XML data, and proposes routing
algorithms for XML/XPath advertisement and subscription
forwarding. Section~5 discusses covering and merging techniques
for XPath queries to optimize XML-based routing. In Section~6, we
experimentally validate our approach and evaluate the performance
of various routing strategies.


%Over the past few years, XML has rapidly evolved as the standard for data representation and exchange on the
%Internet. It is changing the way applications exchange information, the way systems are integrated, and the way
%information is stored and processed. Given this rapid growth of XML traffic on the Internet,
%the XML filtering~\cite{xfilter, yfilter::journal, xtrie} and dissemination~\cite{diao04towards, chan02tree,
%nick, georgia::edbt2004} have aroused significant interest. The XML filtering and routing addresses the problem
%of effectively delivering XML document to the interested consumers. In the information dissemination, consumers
%specify their interest, indicating constraints on the document's structure, attributes and content, using some
%XML query language (e.g., XPath expression).
%
%\begin{figure}[htbp]
%\centering \epsfig{file=pub_sub.eps, scale=0.4} \caption{Publish/Subscribe System}\label{pubsub}
%\end{figure}
%
%In the context of XML-based data dissemination, one of the main challenges is the ability to effectively and
%timely deliver relevant XML documents to a large, dynamic group of consumers. In this paper, we try to address
%this problem from the point of view of publish/subscribe system~\cite{eugster-many}. The publish/subscribe
%system provides a loosely coupled network for disseminating data, and effectively routes relevant data to
%interested consumers. Figure~\ref{pubsub} shows the overview of publish/subscribe system. Over the past few
%years, many interested issues, for content-based routing~\cite{Routing, REBECA, JEDI} in the distributed
%publish/subscribe architectures, have made significant progress. Content-based routing differs significantly
%from traditional communication, in which messages are routed on the basis of their content rather than the IP
%address of their destination. Many optimizations for speeding up the routing, such as advertisement,
%subscription covering and merging, have been proposed to reduce the network traffic and routing table
%size~\cite{REBECA, Routing} for content-based publish/subscribe system. In this paper, we apply these methods in
%the context of XML data to address the problem of effectively directing high volumes of XML document to
%interested consumers in the XML content- based routing. The first contribution of this paper is we propose
%schemes to represent non-recursive and recursive advertisement with respect to XML documents, using DTD
%information, and propose several novel matching algorithms for advertisement and subscription. Second, we
%present covering algorithms for XPath queries. Third, we discuss the merging technique, and explore merging
%rules in the context of XPath expressions.
%
%
%The remainder of this paper is structured as follows. Section~2 introduces some definitions and background
%material, as they are closely related to our discussion. Section~3 reviews related work. Section~4 presents the
%concept of advertisement relative to XML data, and proposes our novel routing algorithms for advertisement and
%subscriptions. Section~5 discusses the covering algorithms for XPath queries. In Section~6, we experimentally
%validate our optimizations and compare the performance of advertisement and covering techniques.


%-------------------------------------------------------------------------
\Section{Background}

To keep this paper self-contained, we provide an overview of key
concepts used in the remainder of this paper. This includes
publish/subscribe systems, content-based routing,
advertisement-based routing, covering and merging techniques.

A publish/subscribe system is comprised of information producers
who publish and information consumers who subscribe to
information~\cite{eugster-many}. The publish/subscribe system
matches or routes relevant information to interested consumers. As
shown in Figure~\ref{pubsub}, in the publish/subscribe paradigm,
information producers, referred to as publishers, submit data as
publications to the system, and information consumers, referred to
as subscribers, indicate their interests by submitting
subscriptions.  On receiving a publication, the router determines
the set of matching subscriptions and notifies the appropriate
subscribers. Publishers and subscribers are loosely coupled in
space and time, and have no knowledge of each other.
%Instead of
%address-based routing, messages in content-based publish/subscribe
%systems are routed based on their content.

%A content-based publish/subscribe system
%provides a flexible and extensible environment for information
%exchange.


%-------------------------------------------------------------------------
\SubSection{Content-based Routing}

Content-based publish/subscribe systems~\cite{Routing, REBECA,
JEDI} provide a flexible and extensible environment for
information exchange. Messages in content-based pub/sub systems
are routed based on their contents rather than the IP address of
their destinations. In order to handle a large amount of dynamic
information and reduce the network traffic, many optimization
techniques, such as advertisements~\cite{Routing}, covering and
merging techniques~\cite{JEDI, Routing, REBECA}, have been proven
to gain significant benefits for non-XML based publish/subscribe
systems.

\begin{figure}
\centering \epsfig{file=adv-route.eps, scale=0.8}\caption{Advertisement-based Content-based
Routing}\label{advroute}
\end{figure}

In advertisement-based publish/subscribe systems, advertisements
are specifications of information that the publisher will publish
in the future. A publisher must issue an advertisement before
publishing. An advertisement allows the publisher to issue a set
of matching publications~\cite{Routing}. Advertisements are
flooded in the publish/subscribe overlay. We assume that the
number of advertisement is much less than subscriptions and
publications. The relationship can be summarized as $N_{advs} <
N_{subs} << N_{pubs}$. Advertisements are used to avoid
broadcasting subscriptions in the network, so that subscriptions
are only routed to the publishers who advertise what the
subscribers are interested in. Both subscription and advertisement
define a filter on publications. Later on, matching publications
will be delivered to subscribers along the pathes built by
subscriptions. Figure~\ref{advroute} shows a scenario for
advertisement-based content-based routing. The subscription
routing table~(SRT) stores advertisements in order to route
subscriptions. Publications will trace back along the path setup
by subscriptions to interested subscribers. The publication
routing table (PRT) maintains the path information. For example,
in Figure~\ref{advroute}, $adv_1$ is broadcast in the network, and
is stored on each broker of the network with a different last hop.
Consequently, subscriptions that match $adv_1$ will be routed
according to these last hops (e.g., $sub_1$ is routed along the
link $5-4-3-1$). Note that the $sub_1$ will not be forwarded to
brokers 2 and 6 since the $adv_1$ indicates the publisher is from
broker 1. The $pub_1$ is, therefore, routed along a reverse path
$1-3-4-5$. We use the notations $P(s)$ and $P(a)$, to refer to the
set of publications that match subscription $s$ and advertisement
$a$, respectively, in the later part of this paper.

%-------------------------------------------------------------------------
\SubSection{Covering and Merging}

The goal of covering-based routing~\cite{Routing, REBECA} is to
remove redundant subscriptions from the network in order to
maintain a compacted routing table and reduce the network traffic.
In Figure~\ref{advroute}, if $sub_1$ covers $sub_2$, at broker~4,
$sub_2$ will not be forwarded to broker~3. That is, we can safely
remove $sub_2$ on broker 3 and gain a compacted routing table,
since all the publications matching $sub_2$ must match $sub_1$. A
formal definition of covering relation as following: A
subscriptions $sub_1$ covers $sub_2$, if and only if, P($sub_1$)
$\supseteq$ P($sub_2$), denoted as $sub_1$ $\sqsupseteq$ $sub_2$.
The covering relationship defines a partial order on the set of
all subscriptions with respect to $\sqsupseteq$. Since
advertisements have the same format as subscriptions, the covering
relations among advertisements can be defined in the same way.

If two subscriptions do not have covering relations, but their
publication sets overlap each other. The two subscriptions can be
merged into a more general subscription, which covers the original
subscriptions. Suppose $sub_m$ is a merger of $sub_1$ and $sub_2$,
then we have P($sub_m$) $\supseteq$ P($sub_1$) $\cup$ P($sub_2$).
There are two kinds of mergers. If the publication set of the
merger is exactly equal to the union of the publication set of the
original subscriptions, the merger is a \emph{perfect merger};
otherwise, if P($sub_m$) $\supset$ P($sub_1$) $\cup$ P($sub_2$),
it is an \emph{imperfect merger}. After merging, only the merger
is forwarded into the network. The merging technique~\cite{REBECA}
is used for further minimizing the routing table size. It is an
extension of the covering relation.

%In the context of XML/XPath routing, determining covering relations
%between subscriptions is actually a string
%matching problem. We will discuss how to determine covering relations in
%Section~\ref{covering}.


%-------------------------------------------------------------------------
\Section{Related Work}

A large body of work has focused on developing publish/subscribe-style matching algorithms for evaluating an XML
document against a set of XPath queries~\cite{xfilter, yfilter::journal, xtrie, bruno}. However, all these
approaches exclusively address centralized matching architectures, not the distributed, content-based XML
dissemination networks we address in this work. While matching is an integral step in a content-based router,
other routing operations studied in this paper are equally important in a distributed data dissemination
architecture. Thus, our work complements matching algorithms for the design of a content-based XML router.

Advertisement-based techniques for optimizing content-based routing have been developed in the area of
distributed publish/subscribe~\cite{Routing, REBECA, JEDI}. In this body of work it has also been demonstrated
that the network traffic and routing table size can be reduced by using different routing strategies, including
advertising, covering and merging techniques. However, the main differences between these approaches and our
approach lie in the subscription language and publication data format. Our approach is based on the
hierarchical, tree-based XPath and XML model, whereas the traditional content-based routing approaches operate
with attribute-value pairs and predicate constraints over these pairs. The advertising performed by data sources
is very different and more complex for XML and XPath, as the hierarchical structure of the model and the
potentially recursive structure of the data needs to be accounted for. Basically, the standard publish/subscribe
approach does not know an equivalent to a DTD.

The query aggregation scheme given in~\cite{chan02tree} addresses the problem of determining a more precise and
smaller set of XPath expressions from a given set of input XPath expressions. This problem is similar to the
covering and merging problem discussed in this paper. And we think it could be used for these purposes. It would
need to be evaluated whether the tree aggregation scheme always yields a query set that maintains the covering
relationships and whether the aggregate query set introduces false positives (i.e., matches for XML documents
not originally matched) or false negatives (i.e., misses XML documents originally matched.) These are
non-trivial extensions of this work. Furthermore, the tree aggregation approach does not address the generation
of advertisements from DTDs, which is central to our approach.

To the best of our knowledge, no prior work on content-based XML document routing and dissemination has
introduced advertisements to improve the performance of XML content distribution. This technique is unique to
our approach.

Much recent research has focused on XML data dissemination~\cite{georgia::edbt2004, diao04towards, mesh, nick}.
Koloniari and Pitoura~\cite{georgia::edbt2004} present work on XML dissemination in a peer-to-peer network. They
propose a decentralized approach to the problem of routing path queries among nodes of a peer-to-peer network.
However, in their approach queries are severely restricted in that no wildcards are allowed.  Koudas {\it et
al.}~\cite{nick} study the routing of XML queries in a data-sharing peer-to-peer network. Both approaches
propose a solution to the {\it location problem}. This problem is inverse to the problem we solve. The location
problem states that given a collection of XML database servers and an XPath query, find the database that
contain data relevant to the query. Diao {\it et al.}~\cite{diao04towards} deploy XML-based services on an
Internet-scale, and provides a detailed architectural design of the system. However, this work does not look at
the problem of XPath query covering, merging, or advertising. In ~\cite{mesh}, the authors propose an approach
for content-based routing of XML data in mesh-based overlay networks. The focus of this work is on reliable
delivery of streaming data, and does not explicitly address XML query management.

%To the best of our knowledge, our work is the first to address the
%problem of advertisement for XML data dissemination. However, the
%advertisement-based routing has been widely studied in the area of
%content-based publish/subscribe system~\cite{Routing, REBECA,
%JEDI}. They reduce the network traffic and routing table size by
%using different routing strategies, including advertisement,
%covering and merging techniques. In contrast to the format of
%advertisement, conjunction of attribute-value pairs, in the
%context of content-based publish/subscribe system, our paper
%focuses on a more complex format (as the advertisement in our work
%is defined as recursive and non-recursive with respect to
%different DTDs), which includes both data contents and structure.
%The query aggregation scheme given in~\cite{chan02tree} addresses
%the problem as part of our work, and they do not address the
%advertisement-based optimization and support powerful language
%like recursive advertisement format.
%
%Other XML data dissemination work~\cite{georgia::edbt2004,
%diao04towards, mesh, nick} have been widely studied recently.
%~\cite{georgia::edbt2004} is the work on XML dissemination in
%peer-to-peer system. It proposes a decentralized approach to the
%problem of routing path queries among nodes of a P2P system,
%however, it does not allow wildcards in the path query.
%~\cite{nick} also studies the routing of XML queries in a
%data-sharing P2P network, and proposes a scalable solution to the
%location problem. This problem is inverse to ours: given a
%collection of XML database servers and an XPath query, find the
%database that contain data relevant to the query.
%~\cite{diao04towards} deploys XML-based services on an
%Internet-scale, and provides a detailed architectural design of
%the system, however, it does not focus on the problem of XML
%queries covering and merging. In ~\cite{mesh}, the authors propose
%an approach for content-based routing of XML data in mesh-based
%overlay networks. The focus of this work is on reliable delivery
%of streaming data, and dose not explicitly address XML query
%management.

%-------------------------------------------------------------------------
\Section{Advertisement-based Routing}

When a broker receives a subscription, it matches the subscription
against its advertisements. If there is an advertisement whose
publication set is overlapping with the subscription's, it means
there is a \emph{match} between the subscription and the
advertisement. Then the broker will route the subscription to the
broker where the advertisement came from. Before we discuss the
matching algorithm of advertisement and subscription, we present
the definition and format of advertisement in our approach.

%-------------------------------------------------------------------------
\SubSection{Definition of Advertisement}\label{advDef}

In the context of XML/XPath routing, the advertisement is
generated by exploiting Document Type Definition (DTD)
information. The purpose of a DTD is to define the legal building
blocks of an XML document. The main building blocks of XML
documents are elements surrounded with tags, e.g.,
$<root>...</root>$. $root$ is the element name in the documents.
All elements appearing in the XML document must be defined in the
corresponding DTD, which determines the structure of elements and
their sequence in the document. In this paper, our discussion
focuses on the main building block: elements.  Our approach could
be easily extended to element attributes and content, which we
omit due to space limitations.

In this paper, we use the common interpretation of an XML document
as a tree of nodes and consider each path from the root node to a
leaf node. Thus, we decompose each XML document into a set of XML
paths and each path is represented as $e = /t_1/t_2/.../t_n$,
where $t_i$ is the XML element name. These paths are extracted
from the document before the publisher submits the document to the
network. Thus, a publication routed in our system is actually an
XML path annotated with a \textit{pathId} and \textit{docId}. This
is transparent to publishers and subscribers who handle entire XML
documents. Publishers submit entire XML documents, commonly
referred to as publications, and subscribers submit XPath
expressions (XPEs), commonly referred to as subscriptions.  We use
the terms XPE and subscription interchangeably in the later part
of this paper.

In the content-based publish/subscribe system, the advertisement
has the same format as a subscription. However, we restrict the
format of advertisement to the abstract XPath expression without
//-operators in the context of XML/XPath data routing. Note, this
is not a restriction of our subscription language. Advertisement
is a system internal mechanism, which is not exposed to the
application or the user.  An advertisement is described as $a =
/t_1/t_2/.../t_{n-1}/t_n$, where $t_i$ can be either an element
name or a wildcard, and $a$ has the same length with the
publication it advertises. In our approach, advertisements are
derived from the DTD, since the DTD contains all possible paths
from the root to the leaf appearing in the XML documents.

We call an advertisement a \textit{non-recursive} advertisement if
it is extracted from a non-recursive DTD. The format of
non-recursive advertisement is described as above. A DTD is said
to be recursive if it has some element that is defined in terms of
itself, directly or indirectly, e.g., the popular NITF DTD is
recursive. We define an advertisement as \textit{recursive}
advertisement if it is extracted from a recursive DTD. If the DTD
has a recursive structure, the advertisement can be recursive
anywhere, for example, the recursive part can appear at the
beginning, the middle and the end of an advertisement, and two
recursive parts can appear in sequence or be embedded in each
other. We generalize these patterns to three types of recursive
advertisements as below:

\begin{enumerate}
{\bf \item Simple-recursive advertisements:} There is only one
recursive pattern in the simple-recursive advertisement. It is
described as $a = /t_1/t_2/.../t_{i-1}$
$(/t_i/.../t_j)^{+}/.../t_n $, where the $+$ operator declares
that elements $t_i,...,t_j$ must occur one or more times in the
advertisement. Note, this is not XPath syntax, and advertisements
are used with these extensions only system internal and not
exposed to client or application.

{\bf \item Series-recursive advertisements:} A series-recursive
advertisement can include more than one recursive pattern in
sequence. For example, the advertisement containing two recursive
patterns in sequence can be described as $a =
/t_1/t_2/.../t_{i-1}$
$(/t_i/.../t_j)^{+}/.../t_{l-1}(/t_l/.../t_o)^{+}/ .../t_n$.

{\bf \item Embedded-recursive advertisements:} In an
embedded-recursive advertisement, recursive patterns can be
embedded in other recursive patterns. The simplest case is one
recursive pattern is embedded in another recursive pattern,
described as $a = /t_1/t_2/.../$
$t_{i-1}(/t_i/.../t_{l-1}(/t_l/.../t_o)^{+}/.../t_j)^{+}/.../t_n
$.
\end{enumerate}

More types of recursive advertisements can be easily extended
based on the above three types of advertisements. We discuss the
matching algorithms for non-recursive and recursive advertisements
in Sections~\ref{nonRecAdv} and ~\ref{recAdv}, respectively.



%The most simple recursion is only one recursive pattern in the advertisement, described as $a =
%/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}/.../t_n $, where the $+$ operator declares that elements $t_i,...,t_j$
%must occur one or more times in the advertisement. For the case that more than one recursive patterns in series
%in the advertisement, we describe it as $a =
%/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}/.../t_{l-1}(/t_l/.../t_o)^{+}/ .../t_n $ (e.g., two recursive patterns in
%series). For the case that one more recursive pattern embedded in another recursive pattern in the
%advertisement, we describe it as $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_{l-1}(/t_l/.../t_o)^{+}/
%.../t_j)^{+}/.../t_n $. More types of recursive advertisements can be easily extended based on above three types
%of advertisements. We discuss non-recursive and recursive advertisements in Sections~\ref{nonRecAdv} and
%~\ref{recAdv}, respectively.




%-------------------------------------------------------------------------
\SubSection{Non-recursive Advertisement}\label{nonRecAdv}

In this section, we discuss the algorithms for how a subscription
matches a non-recursive advertisement in the context of XML/XPath.
We say an advertisement $a$ matches a subscription $s$ if the
publication sets $P(a)$ and $P(s)$ overlap, that is, $P(a) \bigcap
P(s) \neq \Phi$. Figure~\ref{advsub}(a) shows all possible
relations between two sets. To forward subscriptions, we need to
identify the first two overlapping cases in
Figure~\ref{advsub}~(a). In this paper, we focus on the
subscriptions including parent-child operator (/), wildcard
operator (*), and ancestor-descendant operator (//). For other
operators appearing in the subscription, such as attribute
filters, our approach can be extended easily to support them
through simple value comparisons. We discuss the matching
algorithm with following three cases.

%\begin{figure}
%\centering \epsfig{file=xmltree.eps} \caption{XML document and
%corresponding XML paths}\label{xmltree}
%\end{figure}
%
%An XML document starts with a special element, the root element;
%all other elements in the document are descendant of the root
%element. These elements are organized hierarchically, and can be
%nested to any depth. Elements can contain data and attributes. An
%XML document can be represented as a tree of nodes, which
%correspond to elements. Figure~\ref{xmltree} displays an XML
%document $d$ and the corresponding tree. To simplify the
%presentation we use tags like $<a>,<b>$ to represent XML elements.
%Document $d$ consists of 3 XML paths $e_1, e_2$, and $e_3$. Each
%XML path is from the root element of the document to one leaf.

{\bf Absolute simple XPEs:} A simple XPE only contains
parent-child and wildcard operators. Algorithm~\ref{absAdv} shows
the matching for absolute XPE (without //-operator) and
advertisement. The algorithm does not have to be applied, if the
given XPE is longer than the advertisement (lines 1-3). This
observation is exploited because the advertisement has the same
length as its publications, and thus, publications in $P(a)$ will
not match the longer XPE. Next, the algorithm compares each pair
of elements or wildcards in advertisement and subscription (lines
4-8), according to the matching rules shown in
Figure~\ref{advsub}(b). It returns 0, if some pair does not
overlap, otherwise it returns 1. For example, given $a =
/b/*/*/c/c/d$ and $s = /*/c/*/b/c$, the algorithm return 0, since
the matching rules fail to satisfy for $i = 4$. As shown in
Figure~\ref{advsub}(b), the fifth row indicates that the
advertisement including an element $c$ and the subscription
including an element $b$ at the same position do not overlap. That
is publications matching the advertisement cannot match the
subscription.

\begin{figure}[htbp]
\centering \epsfig{file=adv-sub.eps} \caption{Adv. and Sub.
Relations}\label{advsub}
\end{figure}

%\begin{table}[htbp]\centering
%\begin{tabular}{|c|c|c|}
%\hline
%{\rule[-1mm]{0mm}{4mm}}{\footnotesize{\bf Adv.}} & {\footnotesize{\bf Sub.}} & {\footnotesize{\bf Overlap}}\\
%\hline
%    {\footnotesize $*$ }& {\footnotesize $*$} & {\footnotesize $\surd$} \\
%\hline
%  {\footnotesize $*$ } & {\footnotesize $t$} & {\footnotesize $\surd$} \\
%\hline
%    {\footnotesize $t$} & {\footnotesize $*$} & {\footnotesize $\surd$}\\
%\hline
%    {\footnotesize $t$} & {\footnotesize $t$} & {\footnotesize $\surd$}\\
%\hline
%    {\footnotesize $t_1$ }&{\footnotesize $t_2$} & {\footnotesize $\times$}\\
%\hline
%\end{tabular}\caption{Matching Rules for Adv. and Sub.}\label{advsubmatch}
%\end{table}

\begin{algorithm}[t]
\caption{AbsExpr. and Adv. Matching} \label{absAdv}
\begin{algorithmic}[1]
\REQUIRE{advertisement $a = /t_1/t_2/.../t_n $, and subscription $s = /m_1/m_2/.../m_k$} \ENSURE{1 if $P(a)
\bigcap P(s) \neq \Phi$, 0 if $P(a) \bigcap P(s) = \Phi$}

\IF {$k > n$} \STATE {return 0} \ENDIF

\FOR{$i = 1$ to $k$} \IF{$t_i \neq m_i$ and $t_i \neq \ast $ and
$m_i \neq \ast $}
\STATE{return 0}
\ENDIF
\ENDFOR

\STATE {return 1}
\end{algorithmic}
\end{algorithm}

%\begin{figure}
%\label{adv1} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_n $, and subscription  $s = /m_1$}\\
%     \hspace*{0.8cm} {\footnotesize $/m_2/.../m_k$ }\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize 0 if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $k > n$ {\bf then} return 0}\\
%    {\footnotesize 2:} {\footnotesize {\bf For} $i = 1 : k$ {\bf do}}\\
%    {\footnotesize 3:} \hspace*{0.4cm} {\footnotesize {\bf If} $t_i \neq m_i$ and $t_i \neq \ast $ and $m_i \neq \ast
%    $ {\bf then} return 0}\\
%    {\footnotesize 4:} {\footnotesize return 1}
%    }\caption{Adv1}
%\end{figure}
\textbf{Relative simple XPEs:} These expressions are similar to
absolute simple XPEs except for the first operator, that is the
XPE is relative. The matching could start at any position of the
advertisement because the subscription is relative. A naive
matching algorithm in this case is repeatedly calling
Algorithm~\ref{absAdv}. In iteration $i$, take the subscription as
an absolute one and start the matching from the $i$th bit of the
advertisement. We skip the detail of the naive algorithm because
it is straightforward. The complexity of the naive algorithm is
$O(n*k)$ where $n$ is the length of the advertisement and $k$ is
the length of the subscription. We propose an optimized version of
the matching algorithm for relative simple XPEs.

%\begin{algorithm}[t]
%\caption{RelExpr. and Adv. Matching} \label{relAdv}
%\begin{algorithmic}[1]
%\REQUIRE{advertisement $a = /t_1/t_2/.../t_n $, and subscription
%$s = m_1/m_2/.../m_k$} \ENSURE{\textit {matchPos} if $P(a) \bigcap
%P(s) \neq \Phi$, 0 if $P(a) \bigcap P(s)$ $ = \Phi$}
%
%\IF {$k > n$} \STATE {return 0} \ENDIF
%
%\FOR{$i = 1$ to $n-k+1$}
%\IF{AbsAdv($/t_i/t_{i+1}/.../t_n$,$/m_1/m_2/.../m_k$) = 1}
%\STATE{return \textit {matchPos = $i$}} \ENDIF \ENDFOR
%
%\STATE {return 0}
%\end{algorithmic}
%\end{algorithm}

%\begin{figure}
%\label{adv2} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_n $, and subscription $s = m_1/$}\\
%    \hspace*{0.8cm} {\footnotesize $m_2/.../m_k$ }\\
%    {\footnotesize \bf Output:} {\footnotesize \textit {matchPos} if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize $0$ if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $k > n$ {\bf then} return $0$} \\
%    {\footnotesize 2:} {\footnotesize {\bf For} $i = 1 : n-k+1$ {\bf do}}\\
%    {\footnotesize 3:} \hspace*{0.4cm} {\footnotesize {\bf If} Adv1($/t_i/t_{i+1}/.../t_n$,$/m_1/m_2/.../m_k
%    $) = 1 {\bf then} return  }\\
%     \hspace*{1cm} {\footnotesize \textit {matchPos = $i$}}\\
%    {\footnotesize 4:} {\footnotesize return 0}
%    }\caption{Adv2}
%\end{figure}

It is important to note that the matching for relative simple XPE
and advertisement is a string matching problem~\cite{kmp}. We try
to match the XPE, $s$, inside the advertisement, $a$, by starting
at the first element of $a$ that matches $m_1$ and continuing
(i.e., comparing to $m_2$ and so on) until we either complete the
match or find a mismatch. In the latter case, we must go back to
the place from which we started. The difference between the
traditional string matching problem and ours is that the $*$ can
match any element in our matching rules, as shown in
Figure~\ref{advsub}(b). To improve this algorithm, the KMP
algorithm~\cite{kmp} is applied to reduce the the number of
comparisons to $O(n)$. As shown in Algorithm~\ref{nextAdv}, KMP
preprocesses the subscription once to find relevant information
about it regardless of the advertisement. It computes a $next$
table, recording all repeating patterns of $s$, to avoid
backtracking. Algorithm~\ref{optRelAdv} preforms the matching by
taking advantage of the $next$ table.

\begin{algorithm}[htbp]
\caption{Next Table for Sub.} \label{nextAdv}
\begin{algorithmic}[1]
\REQUIRE{subscription $s = m_1/m_2/.../m_k $}
\ENSURE{\textit{next} for $s$ (an array with size $k$)}

\STATE{\textit{next}$(1)$ $\leftarrow$ -1}
\STATE{\textit{next}$(2)$ $\leftarrow$ 0}

\FOR{$i = 3$ to $k$} \STATE{$j \leftarrow \textit{next}$
$(i-1)+1$} \WHILE{$m_{i-1} \neq m_j$ and $m_{i-1} \neq \ast$ and
$m_j \neq \ast$ and $j > 0$} \STATE{$j \leftarrow
\textit{next}(j)+1$} \ENDWHILE \STATE{\textit{next}$(i)$
$\leftarrow j$} \ENDFOR

\STATE {return \textit{next}}
\end{algorithmic}
\end{algorithm}

\begin{algorithm}[htbp]
\caption{Optimization for RelExpr. and Adv. Matching}
\label{optRelAdv}
\begin{algorithmic}[1]
\REQUIRE {advertisement $a = /t_1/t_2/.../t_n $, and subscription
$s = m_1/m_2/.../m_k$}

\ENSURE {\textit {matchPos} if $P(a) \bigcap P(s) \neq \Phi$, 0 if
$P(a) \bigcap P(s)$ $ = \Phi$}

\IF {$k > n$} \STATE {return 0} \ENDIF

\STATE {$j \leftarrow 1$, $i \leftarrow 1$, \textit {matchPos}
$\leftarrow 0$}

\WHILE{\textit {matchPos} = 0 and $i \leq n$} \IF{$t_i = \ast $ or
$m_j = \ast$ or $t_i = m_j$} \STATE{$j \leftarrow j+1$, $i
\leftarrow i+1$} \ELSE \STATE{$j \leftarrow
\textit{next}(j)+1$,}\IF{$j = 0$}\STATE{$j \leftarrow 1$, $i
\leftarrow i+1$}\ENDIF \ENDIF \IF{$j = k+1$}\STATE{\textit
{matchPos} $\leftarrow i-k$}\ENDIF\ENDWHILE

\STATE{return \textit {matchPos}}
\end{algorithmic}
\end{algorithm}


%\begin{figure}
%\label{optadv2} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_n $, and subscription $s = m_1/$}\\
%    \hspace*{0.8cm} {\footnotesize $m_2/.../m_k $ }\\
%    {\footnotesize \bf Output:} {\footnotesize \textit {matchPos} if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize $0$ if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $k > n$ {\bf then} return $0$} \\
%    {\footnotesize 2:} {\footnotesize $j \leftarrow 1$, $i \leftarrow 1$, \textit {matchPos} $\leftarrow 0$}\\
%    {\footnotesize 3:} {\footnotesize {\bf While} \textit {matchPos} = 0 and $i \leq n$ {\bf do}}\\
%    {\footnotesize 4:} \hspace*{0.7cm} {\footnotesize {\bf If} $t_i = \ast $ or $m_j = \ast
%    $ or $t_i = m_j$ {\bf then}$j \leftarrow j+1$, $i \leftarrow i+1$}\\
%    {\footnotesize 5:} \hspace*{0.7cm} {\footnotesize {\bf Else} $j \leftarrow \textit{next}(j)+1$,}\\
%    {\footnotesize 6:} \hspace*{1.3cm} {\footnotesize {\bf If} $j = 0$ {\bf then} $j \leftarrow 1$, $i \leftarrow i+1$}\\
%    {\footnotesize 7:} \hspace*{0.7cm} {\footnotesize {\bf If} $j = k+1$ {\bf then} \textit {matchPos} $\leftarrow i-k$}\\
%    {\footnotesize 8:} {\footnotesize return \textit {matchPos}}
%    }\caption{optAdv2}
%\end{figure}


%\begin{figure}
%\label{nextAdv} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize subscription $s = m_1/m_2/.../m_k $}\\
%    {\footnotesize \bf Output:} {\footnotesize \textit{next} for $s$ (an array of size $k$)}\\
%    {\footnotesize 1:} {\footnotesize \textit{next}$(1)$ $\leftarrow$ -1} \\
%    {\footnotesize 2:} {\footnotesize \textit{next}$(2)$ $\leftarrow$ 0}\\
%    {\footnotesize 3:} {\footnotesize {\bf For} $i = 3 : k$ {\bf do}}\\
%    {\footnotesize 4:} \hspace*{0.4cm} {\footnotesize $j \leftarrow \textit{next}$ $(i-1)+1$}\\
%    {\footnotesize 5:} \hspace*{0.4cm} {\footnotesize {\bf While} $m_{i-1} \neq m_j$ and $m_{i-1} \neq \ast$ and $m_j \neq \ast$ and $j > 0$ {\bf do}}\\
%    \hspace*{1.5cm} {\footnotesize $j \leftarrow \textit{next}(j)+1$}\\
%    {\footnotesize 6:}\hspace*{0.55cm}  {\footnotesize \textit{next}$(i)$ $\leftarrow
%    j$}\\
%    {\footnotesize 7:} {\footnotesize return \textit{next}}
%    }\caption{nextAdv}
%\end{figure}


\textbf{Descendant operators in XPEs}: Descendant operators
indicate that more than one element should appear in the matching
advertisement. Algorithm~\ref{desAdv} for XPE with descendant
operators and advertisement matching is based on the above XPE
matching algorithms (i.e., Algorithm~\ref{absAdv} and
~\ref{relAdv}). We split XPE in maximal length sub-XPEs that do
not contain any descendant operators, and, match each sub-XPE
against the advertisement with sequence comparisons. As shown in
Algorithm ~\ref{desAdv}, lines 1-5 guarantee that the
advertisement is longer than the subscription. Next, we match the
first sub-XPE against the advertisement according to the different
types of subscriptions (lines 6-10), and, recompute the next
available matching position in the advertisement (lines 11-15).
The algorithm repeats this process until the end of the
subscription is reached (lines 16-22), or returns 0 immediately if
it finds the rest of the advertisement is shorter (lines 23- 27).
For instance, given $a = /a/*/e/*/d/*/c/b$ and $s =
*/a//d/*/c//b$, the algorithm matches all sub-XPEs in $s$ against
$a$ in order. It returns 1 because it finds each sub-XPE $*/a$,
$d/*/c$ and $b$ matches different parts, $a/*$, $*/d/*$ and $b$,
in $a$ sequentially.


\begin{algorithm}[t]
\caption{DesExpr. and Adv. Matching} \label{desAdv}
\begin{algorithmic}[1]
\REQUIRE{advertisement $a = /t_1/t_2/.../t_n $, and subscription
$s = /m_{11}/m_{12}/.../m_{1k_1}//m_{21}/m_{22}/.../$
$m_{2k_2}//...//m_{p1}/m_{p2}/.../m_{pk_p}$ or $s =
m_{11}/m_{12}/...$
$/m_{1k_1}//m_{21}/m_{22}/.../m_{2k_2}//...//m_{p1}/m_{p2}/.../m_{pk_p}$}
\ENSURE{1 if $P(a) \bigcap P(s) \neq \Phi$, 0 if $P(a) \bigcap
P(s) = \Phi$}

\FOR{$i = 1$ to $p$} \IF{$k_1+...+k_i > n$}\STATE{return
0}\ENDIF\ENDFOR

\IF{$s$ is absolute }\STATE{\textit{temp} $\leftarrow$
AbsAdv($/t_1/t_2/.../t_n$, $/m_{11}/m_{12}/.../$ $m_{1k_1})$}\ELSE
\STATE{\textit{temp} $\leftarrow$ RelAdv($/t_1/t_2/.../t_n$,
$m_{11}/m_{12}/.../m_{1k_1}$)}\ENDIF

\IF{\textit{temp} = 0} \STATE{return 0}\ELSE\STATE{$j \leftarrow
k_1+ \textit{temp}$ , $\textit{temp} \leftarrow 0$}\ENDIF

\FOR{$i = 2$ to $p$}\STATE{\textit{temp} $\leftarrow$
RelAdv($/t_j/t_{j+1}/.../t_n$, $m_{i1}/m_{i2}/.../$
$m_{ik_i}$)}\IF{\textit{temp} = 0}\STATE{return 0}\ELSE\STATE{$j
\leftarrow j + \textit{temp} + k_i -1$}\ENDIF\FOR{$l = i + 1$ to
$p$}\IF{$k_{i+1}+...+k_l > n-j+1$}\STATE{ return
0}\ENDIF\ENDFOR\ENDFOR

\STATE {return 1}
\end{algorithmic}
\end{algorithm}


%\begin{figure}
%\label{adv3} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_n $, and subscription $s = /m_{11}$}\\
%    \hspace*{0.8cm} {\footnotesize $/m_{12}/.../m_{1k_1}//m_{21}/m_{22}/.../m_{2k_2}//...//m_{p1}/m_{p2}/... $}\\
%    \hspace*{0.8cm} {\footnotesize $/m_{pk_p}$ or $s = m_{11}/m_{12}/.../m_{1k_1}//$$m_{21}/m_{22}/.../m_{2k_2}//$}\\
%    \hspace*{0.8cm} {\footnotesize $...//m_{p1}/m_{p2}/.../m_{pk_p}$}\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize 0 if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 01:} {\footnotesize {\bf For} $i = 1:p$ {\bf do}} \\
%    {\footnotesize 02:} \hspace*{0.5cm}{\footnotesize {\bf If} $k_1+...+k_i > n$ {\bf then} return 0} \\
%    {\footnotesize 03:} {\footnotesize {\bf If} $s$ is absolute {\bf then} }\\
%    \hspace*{1cm} {\footnotesize \textit{temp} = Adv1($/t_1/t_2/.../t_n$, $/m_{11}/m_{12}/.../m_{1k_1}
%    $)}\\
%    {\footnotesize 04:} {\footnotesize {\bf Else} \textit{temp} = Adv2($/t_1/t_2/.../t_n$, $m_{11}/m_{12}/.../m_{1k_1}
%    $)}\\
%    {\footnotesize 05:} {\footnotesize {\bf If} \textit{temp} = 0 {\bf then} return
%    0}\\
%    {\footnotesize 06:} {\footnotesize {\bf Else}  $j \leftarrow k_1+ \textit{temp}$ , $\textit{temp} \leftarrow 0$}\\
%    {\footnotesize 07:} {\footnotesize {\bf For} $i = 2 : p $ {\bf do}}\\
%    {\footnotesize 08:} \hspace*{0.4cm} {\footnotesize \textit{temp}=
%    Adv2($/t_j/t_{j+1}/.../t_n$, $m_{i1}/m_{i2}/.../m_{ik_i}
%    $)}\\
%    {\footnotesize 09:} \hspace*{0.4cm} {\footnotesize {\bf If} \textit{temp} = 0 {\bf then} return 0}\\
%    {\footnotesize 10:} \hspace*{0.4cm} {\footnotesize {\bf Else} $j \leftarrow j+
%    \textit{temp}+ k_i -1$}\\
%    {\footnotesize 11:} \hspace*{0.5cm}{\footnotesize {\bf For} $l = i+1:p$ {\bf do}} \\
%    {\footnotesize 12:} \hspace*{1cm}{\footnotesize {\bf If} $k_{i+1}+...+k_l > n-j+1$ {\bf then} return 0} \\
%    {\footnotesize 13:} {\footnotesize return 1}
%    }\caption{Adv3}
%\end{figure}

%-------------------------------------------------------------------------
\SubSection{Recursive Advertisement}\label{recAdv}

In this section, we mainly discuss the matching algorithms for XPEs and recursive advertisement.
%First, we define the format of recursive advertisement. If the DTD has a recursive structure, the advertisement
%can be recursive anywhere. The most simple recursion is only one recursive pattern in the advertisement,
%described as $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}/.../t_n $, where the $+$ sign declares that elements
%$t_i,...,t_j$ must occur one or more times in the advertisement. For the case that more than one recursive
%patterns in series in the advertisement, we describe it as $a = /t_1/$
%$t_2/.../t_{i-1}(/t_i/.../t_j)^{+}/.../t_{l-1}(/t_l/.../t_o)^{+}/ .../t_n $ (e.g., two recursive patterns in
%series). For the case that one more recursive pattern embedded in another recursive pattern in the
%advertisement, we describe it as $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_{l-1}(/t_l/.../t_o)^{+}/
%.../t_j)^{+}/.../t_n $. More types of recursive advertisements can be easily extended based on above three types
%of advertisements.
According to the format of recursive advertisement defined in the Section~\ref{advDef}, we discuss the matching
algorithm for each type of recursive advertisement, respectively.


\begin{algorithm}[t]
\caption{AbsExpr. and Simple RecAdv. Matching} \label{simRecAdv}
\begin{algorithmic}[1]
\REQUIRE{advertisement $a =
/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}$ $/.../t_n $, and
subscription $s = /m_1/m_2/.../m_k$} \ENSURE{1 if $P(a) \bigcap
P(s) \neq \Phi$, 0 if $P(a) \bigcap P(s) = \Phi$}

\IF{$k \leq j$} \STATE{return AbsAdv($/t_1/t_2/.../t_j$,
$/m_1/m_2/.../m_k$)}\ELSE\STATE{\textit{temp} $\leftarrow$
AbsAdv($/t_1/t_2/.../t_j$, $/m_1/m_2/.../m_j$)}\IF{\textit{temp} =
0}\STATE{return 0}\ENDIF\ENDIF



\IF{ $k \leq n$}\STATE{$q = 0$}\ELSE\STATE{$q = Int((k-n)/(j-i+1))
+ 1$}\ENDIF \STATE{$p = q + Int((k-j-q*(j-i+1))/(j-i+1))$}

\FOR{$c = q$ to $p$}\STATE{\textit{temp} $\leftarrow$
AbsAdv($/t_{j+1}/.../t_n$, $/m_{c*(j-i+1)+j+1}/$
$m_{c*(j-i+1)+j+2}/.../m_k$)}\IF{\textit{temp} = 1}\STATE{return
1}\ENDIF\IF{$c = p$}\STATE{\textit{temp} $\leftarrow$
AbsAdv($/t_i/.../t_j$, $/m_{c*(j-i+1)+j+1}/...$ $/m_k
$)}\ELSE\STATE{\textit{temp} $\leftarrow$ AbsAdv($/t_i/.../t_j$,
$/m_{c*(j-i+1)+j+1}/...$ $/m_{(c+1)*(j-i+1)+j}$)}\ENDIF\IF{
\textit{temp} =0}\STATE{return 0}\ENDIF\ENDFOR

\STATE{return 1}
\end{algorithmic}
\end{algorithm}

We mainly focus on the matching for absolute XPE and recursive advertisements, since the matching for other
types of XPEs and recursive advertisement can be implemented based on them. Algorithm~\ref{simRecAdv}, matching
algorithm for absolute XPE and simple recursive advertisement, calls the Algorithm~\ref{absAdv} if the
subscription is not longer than the end of the recursive pattern (lines 1-2). If the subscription is longer, the
algorithm determines how many times the recursive pattern will be repeated in the advertisement according to the
length of both subscription and advertisement (lines 9-14). Next, the algorithm tries all possible
advertisements according to the times that the recursive pattern is repeated (lines 15-28). For example, given
$a = /a/*/c(/e/d)^{+}/*/c/e$ and $s = /*/a/c/*/d/e/d/*$, first, the algorithm compares $/a/*/c/e/d$ in $a$ with
$/*/a/c/*/d$ in $s$ (line 4), and computes $q = 0$ and $p = 1$ from lines 9-14. Second, it supposes that the
recursive pattern is repeated only once, compares $*/c/e$ in $a$ with $e/d/*$ in $s$ (line 16) and fails to
match. Next, it repeats the recursive part $e/d$ twice, and continues the comparisons (line 23). Finally, it
returns 1 (lines 17-18) when it finds $a$ matches $s$ with double recursive patterns in $a$. The complexity of
the algorithm is $O(n^2)$, since it actually matches the subscription against each possible advertisement
without recursive pattern.
%In practice, the recursion step of DTD is reasonable with a maximum nesting depths.
%Thus, the matching for the recursive advertisement can be reduced to the non-recursive advertisement problem.
From a practical point of view, it would be straight forward to
limit the maximum nesting depth of documents allowed to publish,
which would reduce the complexity of processing DTDs.

%\begin{figure}
%\label{recAdv1} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}/.../t_n $, }\\
%    \hspace*{0.8cm} {\footnotesize and subscription $s = /m_1/m_2/.../m_k$}\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize 0 if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 01:} {\footnotesize {\bf If} $k \leq j$ {\bf then} return Adv1($/t_1/t_2/.../t_j$, $/m_1/m_2/.../m_k$)}\\
%    {\footnotesize 02:} {\footnotesize {\bf Else} \textit{temp} $\leftarrow$ Adv1($/t_1/t_2/.../t_j$, $/m_1/m_2/.../m_j$)}\\
%    {\footnotesize 03:} {\footnotesize {\bf If} \textit{temp} = 0 {\bf then} return
%    0}\\
%    {\footnotesize 04:} {\footnotesize {\bf If}  $k \leq n$ {\bf then} $q = 0$}\\
%    {\footnotesize 05:} {\footnotesize {\bf Else} $q = Int((k-n)/(j-i+1)) + 1$}\\
%    {\footnotesize 06:} {\footnotesize $p = q + Int((k-j-q*(j-i+1))/(j-i+1))$}\\
%    {\footnotesize 07:} {\footnotesize {\bf For} $c = q : p $ {\bf do}}\\
%    {\footnotesize 08:} \hspace*{0.4cm} {\footnotesize
%    \textit{temp} $\leftarrow$ Adv1($/t_{j+1}/.../t_n$,$/m_{c*(j-i+1)+j+1}/$}\\
%    \hspace*{2cm}  {\footnotesize $m_{c*(j-i+1)+j+2}/.../m_k$) }\\
%    {\footnotesize 09:} \hspace*{0.4cm} {\footnotesize {\bf If} \textit{temp} =1 {\bf then} return 1}\\
%    {\footnotesize 10:} \hspace*{0.4cm} {\footnotesize {\bf If} $c = p$ {\bf then} \textit{temp} $\leftarrow$ Adv1($/t_i/.../t_j$, $/m_{c*(j-i+1)+j+1}/$}\\
%    \hspace*{2cm}  {\footnotesize $.../m_k$) }\\
%    {\footnotesize 11:} \hspace*{0.4cm} {\footnotesize {\bf Else} \textit{temp} $\leftarrow$ Adv1($/t_i/.../t_j$, $/m_{c*(j-i+1)+j+1}/.../$}\\
%    \hspace*{2cm}  {\footnotesize $m_{(c+1)*(j-i+1)+j}$) }\\
%    {\footnotesize 12:} \hspace*{0.5cm}{\footnotesize {\bf If} \textit{temp} =0 {\bf then} return 0
%    }\\
%    {\footnotesize 13:}{\footnotesize  return 1}
%    }\caption{recAdv1}
%\end{figure}

\begin{algorithm}[t]
\caption{AbsExpr. and Series RecAdv. Matching} \label{serRecAdv}
\begin{algorithmic}[1]
\REQUIRE{advertisement $a =
/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}$
$/.../t_{l-1}(/t_l/.../t_o)^{+}/.../t_n $, and subscription $s =
/m_1/m_2/.../m_k$} \ENSURE{1 if $P(a) \bigcap P(s) \neq \Phi$, 0
if $P(a) \bigcap P(s) = \Phi$}

\IF{$k \leq j$} \STATE{return AbsAdv($/t_1/t_2/.../t_j$,
$/m_1/m_2/.../m_k$)}\ELSE\STATE{$p = Int((k-j)/(j-i+1))+1$}\ENDIF

\FOR{$c = 0$ to $p$}\STATE{return
SimRecAdv($/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{c+1}/$ $.../
t_{l-1}(/t_l/.../t_o)^{+}/.../t_n, /m_1/m_2/.../m_k$)}\ENDFOR


\end{algorithmic}
\end{algorithm}

Algorithm~\ref{serRecAdv} describes the matching for absolute XPE
and series-recursive advertisement (i.e., with two recursive
patterns only). The matching for the advertisement with more
recursive patterns in sequence can be easily built in this
algorithm by nested calling Algorithm~\ref{simRecAdv}.
Algorithm~\ref{serRecAdv} determines how many times the first
recursive pattern has to be repeated (lines 3-5), and, calls
Algorithm~\ref{simRecAdv} repeatedly (lines 6-8) to try all
possible advertisement formats. $(/t_i/.../t_j)^{c+1}$ in
Algorithm~\ref{serRecAdv} indicates that elements $t_i,...,t_j$
will be repeated $c+1$ times, thus,
$/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{c+1}/$ $.../
t_{l-1}(/t_l/.../t_o)^{+}/.../t_n$ can be treated as a simple
recursive advertisement. Algorithm~\ref{embRecAdv} is similar to
Algorithm~\ref{serRecAdv}. First, it determines how many times the
outer recursive pattern has to be repeated (lines 3-5), and, calls
the matching algorithm for series-recursive advertisement (not
restricted to two recursive patterns) repeatedly (lines 6-8), as
we do in Algorithm~\ref{serRecAdv}.




%\begin{figure}
%\label{recAdv2} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{+}/.../t_{l-1}$ }\\
%    \hspace*{0.8cm} {\footnotesize $(/t_l/.../t_o)^{+}/.../t_n $, and subscription $s = /m_1/m_2/.../m_k$}\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize 0 if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 01:} {\footnotesize {\bf If} $k \leq j$ {\bf then} return Adv1($/t_1/t_2/.../t_j$, $/m_1/m_2/.../m_k$)}\\
%    {\footnotesize 02:} {\footnotesize $p = Int((k-j)/(j-i+1))+1$}\\
%    {\footnotesize 03:} {\footnotesize {\bf For} $c = 0 : p $ {\bf do}}\\
%    {\footnotesize 04:} \hspace*{0.4cm} {\footnotesize
%    return recAdv1($/t_1/t_2/.../t_{i-1}(/t_i/.../t_j)^{c+1}/.../t_{l-1}$}\\
%    \hspace*{1cm} {\footnotesize $(/t_l/.../t_o)^{+}/.../t_n,
%    /m_1/m_2/.../m_k$)}
%    }\caption{recAdv2}
%\end{figure}

\begin{algorithm}
\caption{AbsExpr. and Embedded RecAdv. Matching} \label{embRecAdv}
\begin{algorithmic}[1]
\REQUIRE{advertisement $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_{l-1}$
$(/t_l/.../t_o)^{+}/.../t_j)^{+}/.../t_n $, and subscription $s =
/m_1/m_2/.../m_k$} \ENSURE{1 if $P(a) \bigcap P(s) \neq \Phi$, 0
if $P(a) \bigcap P(s) = \Phi$}

\IF{$k \leq j$} \STATE{return
SimRecAdv($/t_1/t_2/...(/t_l/.../t_o)^{+}.../t_j$, $/m_1/
m_2/.../m_k$)}\ELSE\STATE{$p = Int((k-j)/(j-i+1))+1$}\ENDIF

\FOR{$c = 0$ to $p$}\STATE{return
SerRecAdv($/t_1/t_2/.../t_{i-1}(/t_i/.../t_{l-1}(/t_l/$ $...
/t_o)^{+}/.../t_j)^{c+1}/.../t_n, /m_1/m_2/.../m_k$)}\ENDFOR


\end{algorithmic}
\end{algorithm}

%\begin{figure}
%\label{recAdv3} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize advertisement $a = /t_1/t_2/.../t_{i-1}(/t_i/.../t_{l-1}(/t_l/.../t_o)^{+}/$ }\\
%    \hspace*{0.8cm} {\footnotesize $.../t_j)^{+}/.../t_n $, and subscription $s = /m_1/m_2/.../m_k$}\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $P(a) \bigcap P(s) \neq \Phi$}\\
%    \hspace*{1cm} {\footnotesize 0 if $P(a) \bigcap P(s) = \Phi$}\\
%    {\footnotesize 01:} {\footnotesize {\bf If} $k \leq j$ {\bf then} return recAdv1($/t_1/t_2/...(/t_l/.../t_o)^{+}.../t_j$, $/m_1/$}\\
%    \hspace*{2cm} {\footnotesize $m_2/.../m_k$)}\\
%    {\footnotesize 02:} {\footnotesize $p = Int((k-j)/(j-i+1))+1$}\\
%    {\footnotesize 03:} {\footnotesize {\bf For} $c = 0 : p $ {\bf do}}\\
%    {\footnotesize 04:} \hspace*{0.4cm} {\footnotesize
%    return recAdv2($/t_1/t_2/.../t_{i-1}(/t_i/.../t_{l-1}(/t_l/.../t_o)^{+}/$}\\
%    \hspace*{1cm} {\footnotesize $.../t_j)^{c+1}/.../t_n,
%    /m_1/m_2/.../m_k$)}
%    }\caption{recAdv3}
%\end{figure}

%-------------------------------------------------------------------------
\Section{Covering and Merging}

In this section, first, we describe a data structure for
maintaining subscriptions. The data structure captures the
covering relations among subscriptions. Second, we present the
covering algorithms for absolute simple XPEs, relative simple XPEs
and descendant XPEs, respectively. Last, we explore the merging
technique, and discuss the merging rules in the context of XPEs.

\subsection{Subscription Tree}



\SubSection{Covering Algorithm}\label{covering}

In covering-based routing, if a newly arriving subscription is
covered by an existing subscription in the routing table, the new
subscription is not forwarded to the next-hop broker. One the
other hand, if a newly arriving subscription covers an existing
subscription, before it is forwarded, the broker needs to
unsubscribe all the subscriptions that are covered by the new
subscription. Therefore, the network traffic is reduced by
removing the redundant subscriptions and the routing table in the
next-hop broker is compact. Therefore, the key problem is how to
determine the relationship between two given subscriptions.

The covering relation between subscriptions is the containment
problem in the context of XPEs. It has been proven that
containment of simple path expressions can be tested in
PTIME~\cite{DBLP:conf/icdt/MiloS99}. It is studied as a part of
the problem that checking/finding a prefix replacement for a
simple query is in PTIME, in the context of semistructured
database. In this section, we extend this proof to the XPath
queries, containing wildcard, /- and //-operator, and present
covering rules and algorithms for determining covering relation
between single path XPEs. Table~\ref{subCover} shows the covering
rules used in our algorithms. For example, if $Sub_1$ contains an
element $t$, and $Sub_2$ contains a wildcard at the corresponding
position (e.g., the third row), the $Sub_1$ would not cover the
$Sub_2$ since $*$ can match any elements except for $t$.

{\bf Absolute simple XPEs:} The covering relation between two
absolute XPEs (without //-operator) is the simplest case.
Algorithm~\ref{absCov} shows whether $s_1$ covers $s_2$ or not. An
important observation is $s_1$ must be shorter than $s_2$ if $s_1$
wants to cover $s_2$ (lines 1-3). It is exploited because a
shorter XPE $s$ refers to a bigger matching set $P(s)$. Next, the
algorithm compares each pair of $t_i$ and $m_i$ in $s_1$ and
$s_2$, respectively, according to the covering rules listed in
Table~\ref{subCover}.

\begin{table}[htbp]\centering
\begin{tabular}{|c|c|c|}
\hline
{\rule[-1mm]{0mm}{4mm}}{\footnotesize{\bf $Sub_1$}} & {\footnotesize{\bf $Sub_2$}} & {\footnotesize{\bf $Sub_1$ Covers $Sub_2$}}\\
\hline
    {\footnotesize $*$} &{\footnotesize $*$} & {\footnotesize $\surd$}\\
\hline
    {\footnotesize $*$} &{\footnotesize $t$} & {\footnotesize $\surd$}\\
\hline
    {\footnotesize $t$} &{\footnotesize $*$} & {\footnotesize $\times$}\\
\hline
    {\footnotesize $t$} &{\footnotesize $t$} & {\footnotesize $\surd$}\\
\hline
 {\footnotesize $t_1$ } &{\footnotesize $t_2$} & {\footnotesize $\times$}\\
\hline
\end{tabular}\caption{Covering Rules for Subs}\label{subCover}
\end{table}

\begin{algorithm}[t]
\caption{Absolute Simple XPEs Covering} \label{absCov}
\begin{algorithmic}[1]
\REQUIRE{two subscriptions $s_1 = /t_1/t_2/.../t_n $ and $s_2 =
/m_1/m_2/.../m_k $} \ENSURE{1 if $s_1$ covers $s_2$, 0 if $s_1$
does not cover $s_2$}

\IF {$k < n$} \STATE {return 0} \ENDIF

\FOR{$i = 1$ to $n$} \IF{($t_i \neq m_i$ and $t_i \neq \ast $ and
$m_i \neq \ast$) or ($t_i \neq \ast $ and
$m_i = \ast$)} \STATE{return 0} \ENDIF \ENDFOR

%\IF{$t_i \neq \ast $ and
%$m_i = \ast$}\STATE{return 0}\ENDIF \ENDFOR

\STATE {return 1}
\end{algorithmic}
\end{algorithm}

%\begin{figure}[htbp]
%\label{cov1} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize two subscriptions $s_1 = /t_1/t_2/.../t_n $ and $s_2 = /m_1/m_2/...$ }\\
%    \hspace*{0.8cm} {\footnotesize $/m_k $ }\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $s_1$ covers $s_2$}\\
%    \hspace*{1cm} {\footnotesize 0 if $s_1$ does not cover $s_2$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $k < n$ {\bf then} return 0} \\
%    {\footnotesize 2:} {\footnotesize {\bf For} $i = 1 : n$ {\bf do}}\\
%    {\footnotesize 3:} \hspace*{0.4cm} {\footnotesize {\bf If} $t_i \neq m_i$ and $t_i \neq \ast $ and $m_i \neq \ast
%    $ {\bf then} return 0}\\
%    {\footnotesize 4:} \hspace*{0.4cm} {\footnotesize {\bf Else if} $t_i \neq \ast $ and $m_i = \ast
%    $ {\bf then} return 0}\\
%    {\footnotesize 5:} {\footnotesize return 1}
%    }\caption{Cov1}
%\end{figure}

{\bf Relative simple XPEs:} Algorithm ~\ref{relCov} shows the
covering algorithm for relative simple XPEs. This algorithm is
similar to Algorithm~\ref{relAdv}. It calls Algorithm~\ref{absCov}
repeatedly (lines 4-8) to find if $s_1$ is inside subscription
$s_2$ or not. Our covering algorithm does not include the case
that $s_1$ is absolute and $s_2$ is relative, since an absolute
XPE $s_1$ definitely refers to a smaller matching set $P(s_1)$,
and thus, $s_1$ will not cover $s_2$.

It is important to note that the covering algorithm for relative
simple XPEs is also a string matching problem, as we analyze in
Algorithm~\ref{relAdv}. The covering algorithm uses different
rules from Algorithm~\ref{relAdv}, however, a similar optimization
can be applied to reduce the complexity from $O(k*n)$ to $O(k)$.

\begin{algorithm}
\caption{Relative Simple XPEs Covering} \label{relCov}
\begin{algorithmic}[1]
\REQUIRE{two subscriptions $s_1 = t_1/t_2/.../t_n $, $s_2 =
/m_1/m_2/.../m_k $ or $s_2 = m_1/m_2/.../m_k $} \ENSURE{\textit
{matchPos} if $s_1$ covers $s_2$, 0 if $s_1$ does not cover $s_2$}

\IF {$k < n$} \STATE {return 0} \ENDIF

\FOR{$i = 1$ to $k-n+1$}
\IF{AbsCov($/t_1/t_2/.../t_n,/m_i/m_{i+1}/.../m_k$) = 1}
\STATE{return \textit {matchPos = $i$}} \ENDIF \ENDFOR

\STATE {return 0}
\end{algorithmic}
\end{algorithm}

%\begin{figure}[htbp]
%\label{cov2} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize two subscriptions $s_1 = t_1/t_2/.../t_n $, $s_2 = /m_1/m_2/.../m_k $ }\\
%    \hspace*{0.8cm} {\footnotesize or $s_2 = m_1/m_2/.../m_k $ }\\
%    {\footnotesize \bf Output:} {\footnotesize \textit {matchPos} if $s_1$ covers $s_2$}\\
%    \hspace*{1cm} {\footnotesize 0 if $s_1$ does not cover $s_2$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $k < n$ {\bf then} return 0} \\
%    {\footnotesize 2:} {\footnotesize {\bf For} $i = 1 : k-n+1$ {\bf do}}\\
%    {\footnotesize 3:} \hspace*{0.4cm} {\footnotesize {\bf If} Cov1($/t_1/t_2/.../t_n,/m_i/m_{i+1}/.../m_k$) = 1 {\bf then} return }\\
%    \hspace*{1cm} {\footnotesize $\textit {matchPos} = i$}\\
%    {\footnotesize 4:} {\footnotesize return 0}
%    }\caption{Cov2}
%\end{figure}

%\begin{algorithm}
%\caption{nextCov} \label{nextCov}
%\begin{algorithmic}[1]
%\REQUIRE{subscription $s = t_1/t_2/.../t_n $}
%\ENSURE{\textit{next} for $s$ (an array with size $n$)}
%
%\STATE{\textit{next}$(1)$ $\leftarrow$ -1}
%\STATE{\textit{next}$(2)$ $\leftarrow$ 0}
%
%\FOR{$i = 3$ to $n$} \STATE{$j \leftarrow \textit{next}$
%$(i-1)+1$} \WHILE{($t_{i-1} \neq t_j$ and $t_{i-1} \neq \ast$ and
%$t_j \neq \ast$ and $j > 0$) or ($t_{i-1} = \ast$ and $t_j \neq
%\ast$ and $j > 0$)} \STATE{$j \leftarrow \textit{next}(j)+1$}
%\ENDWHILE \STATE{\textit{next}$(i)$ $\leftarrow j$} \ENDFOR
%
%\STATE {return \textit{next}}
%\end{algorithmic}
%\end{algorithm}


%\begin{algorithm}
%\caption{Optimization for Relative XPEs Covering} \label{optRelCov}
%\begin{algorithmic}[1]
%\REQUIRE {two subscriptions $s_1 = t_1/t_2/.../t_n $, $s_2 =
%/m_1/m_2/.../m_k $ or $s_2 = m_1/m_2/.../m_k$}
%
%\ENSURE {\textit {matchPos} if $s_1$ covers $s_2$, 0 if $s_1$ does
%not cover $s_2$}
%
%\IF {$k < n$} \STATE {return 0} \ENDIF
%
%\STATE {$j \leftarrow 1$, $i \leftarrow 1$, \textit {matchPos}
%$\leftarrow 0$}
%
%\WHILE{\textit {matchPos} = 0 and $i \leq k$} \IF{($t_j = \ast $)
%or ($t_j = m_i$ and $t_j \neq \ast$ and $m_i \neq \ast$)}
%\STATE{$j \leftarrow j + 1$, $i \leftarrow i + 1$} \ELSE \STATE{$j
%\leftarrow \textit{next}(j) + 1$,}\IF{$j = 0$}\STATE{$j \leftarrow
%1$, $i \leftarrow i + 1$}\ENDIF \ENDIF \IF{$j = n +
%1$}\STATE{\textit {matchPos} $\leftarrow i - n$}\ENDIF\ENDWHILE
%
%\STATE{return \textit {matchPos}}
%\end{algorithmic}
%\end{algorithm}

%\begin{figure}
%\label{optcov2} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize two subscriptions $s_1 = t_1/t_2/.../t_n $,  $s_2 = /m_1/m_2/.../m_k $ }\\
%    \hspace*{0.8cm} {\footnotesize or $s_2 = m_1/m_2/.../m_k $}\\
%    {\footnotesize \bf Output:} {\footnotesize \textit {matchPos} if $s_1$ covers $s_2$}\\
%    \hspace*{1cm} {\footnotesize $0$ if $s_1$ does not cover $s_2$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $k < n$ {\bf then} return $0$} \\
%    {\footnotesize 2:} {\footnotesize $j \leftarrow 1$, $i \leftarrow 1$, \textit {matchPos} $\leftarrow 0$}\\
%    {\footnotesize 3:} {\footnotesize {\bf While} \textit {matchPos} = 0 and $i \leq k$ {\bf do}}\\
%    {\footnotesize 4:} \hspace*{0.7cm} {\footnotesize {\bf If} ($t_j = \ast $) or ($t_j = m_i$ and $t_j \neq \ast$ and $m_i \neq \ast$) {\bf then}}\\
%    \hspace*{1.35cm} {\footnotesize $j \leftarrow j+1$, $i \leftarrow i+1$}\\
%    {\footnotesize 5:} \hspace*{0.7cm} {\footnotesize {\bf Else} $j \leftarrow \textit{next}$ $(j)+1$, }\\
%    {\footnotesize 6:} \hspace*{1.3cm} {\footnotesize {\bf If} $j = 0$ {\bf then} $j \leftarrow 1$, $i \leftarrow i+1$}\\
%    {\footnotesize 7:} \hspace*{0.7cm} {\footnotesize {\bf If} $j = n+1$ {\bf then} \textit {matchPos} $\leftarrow i-n$}\\
%    {\footnotesize 8:} {\footnotesize return \textit {matchPos}}
%    }\caption{optCov2}
%\end{figure}


%\begin{figure}
%\label{nextCov} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize subscription $s = t_1/t_2/.../t_n $}\\
%    {\footnotesize \bf Output:} {\footnotesize \textit{next} for $s$ (an array of size $n$)}\\
%    {\footnotesize 1:} {\footnotesize \textit{next}(1) $\leftarrow$ -1} \\
%    {\footnotesize 2:} {\footnotesize \textit{next}(2) $\leftarrow$ 0}\\
%    {\footnotesize 3:} {\footnotesize {\bf For} $i = 3 : n$ {\bf do}}\\
%    {\footnotesize 4:} \hspace*{0.4cm} {\footnotesize $j \leftarrow \textit{next}$ $(i-1)+1$}\\
%    {\footnotesize 5:} \hspace*{0.4cm} {\footnotesize {\bf While} ($t_{i-1} \neq t_j$ and $t_{i-1} \neq \ast$ and $t_j \neq \ast$ and $j > 0$)}\\
%    \hspace*{1.55cm} {\footnotesize or ($t_{i-1} = \ast$ and $t_j \neq \ast$ and $j > 0$) {\bf do}}\\
%     {\footnotesize 6:}\hspace*{1.3cm} {\footnotesize $j \leftarrow \textit{next}$ $(j)+1$}\\
%    {\footnotesize 7:}\hspace*{0.6cm}  {\footnotesize \textit{next}$(i)$ $\leftarrow
%    j$}\\
%    {\footnotesize 8:} {\footnotesize return \textit{next}}
%    }\caption{nextCov}
%\end{figure}

{\bf Descendant operators in XPEs:} Algorithm~\ref{desCov} shows
the covering algorithm for descendant XPEs. It splits the XPE into
sub-XPEs without //-operator, and, matches each sub-XPE in $s_1$
against sub-XPEs in $s_2$ with sequence comparisons. Lines 1-3
guarantee that $s_2$ is longer than $s_1$. Next, it matches the
first sub-XPE in $s_1$ against $s_2$ according to different types
of $s_1$ and $s_2$ (lines 4-5, 12-13, 17-18). The algorithm moves
to the next sub-XPE in $s_1$ if it finds a match, and moves to the
next sub-XPE in $s_2$ if it does not find a match (lines 22-25 and
33-36). \textit{$temp$} is used to compute the next available
matching position in $s_2$, as we do in the Algorithm~\ref{desAdv}
for subscription and advertisement. Finally, it returns 1 or 0
when it reaches the end of $s_1$ or $s_2$, respectively (lines
42-44 and 30-32). For example, given $s_1 = /*/a//*/c$ and $s_2 =
/a/a/*//c/e/c/d$, first, the algorithm compares the sub-XPE $/*/a$
in $s_1$ with the sub-XPE $/a/a/*$ in $s_2$ (lines 4-5). Second,
it moves to the next sub-XPE $*/c$ in $s_1$ and compares it with
$*$ in $s_2$ (lines 34-36). Next, it compares $*/c$ in $s_1$ with
the next sub-XPE $c/e/c/d$ in $s_2$ (lines 23-25), and finally, it
returns 1 from lines 42-43 since the end of $s_1$ is reached and a
match is found. Generally speaking, a sub-XPE in $s_1$ could not
match a part of $s_2$ that includes a //-operator. For instance,
given $s_1 = /*/a//*/c$ and $s_2 = /a/a/*//c/b/d$, the sub-XPE
$*/c$ in $s_1$ does not cover $*//c$ in $s_2$ since $*/c$ refers
to a smaller matching set. However, there is a special case that
the sub-XPE $s_{1i}$ in $s_1$ could cover a part of $s_2$ that
includes a //-operator if $s_{1i}$ ended with a wildcard and the
matched part in $s_2$ ended with $//t$, where $t$ can be either a
wildcard or an element (lines 6-7, 14-15, 26-27 and 37-38). For
example, given $s_1 = /a/*//*/d$ and $s_2 = /a//b/c/d$, first, the
algorithm compares $/a/*$ in $s_1$ with $/a//b$ in $s_2$ (lines
6-7), where $flag = 1$ is used to record the current sub-XPE in
$s_1$ matches a part of $s_2$ with //-operator. Second, it moves
to the next sub-XPE $*/d$ in $s_1$ (lines 34 and 40-41) and
compares it with $c/d$ in $s_2$ (lines 24-25). Finally, it returns
1 since a match is found (lines 42-43).

It is important to note that the covering detection between
non-recursive advertisements is the same with the covering
detection for subscriptions, since the non-recursive advertisement
has the same format with an absolute simple subscription.

%An important observation is a sub-XPE of $s_1$ must not match an
%XPE with //-operator, since descendant operator indicates that

%\begin{algorithm}[htbp]
%\caption{Descendant XPEs Covering} \label{desCov}
%\begin{algorithmic}[1]
%{\footnotesize \REQUIRE{ two subscriptions $s_1 = /t_{11}/.../t_{1n_1}//t_{21}/.../t_{2n_2}//...$
%$//t_{q1}/.../t_{qn_q} $ or $s_1 = t_{11}/.../t_{1n_1}//t_{21}/.../t_{2n_2}//...//t_{q1}$ $/.../t_{qn_q} $, and
%$s_2 = /m_{11}/.../m_{1k_1}//m_{21}/.../m_{2k_2}//...//$ $m_{p1}/.../m_{pk_p}$ or $s_2 =
%m_{11}/.../m_{1k_1}//m_{21}/.../m_{2k_2}//...//$ $m_{p1}/.../m_{pk_p}$} \ENSURE{ 1 if $s_1$ covers $s_2$, 0 if
%$s_1$ does not cover $s_2$}
%
%\IF{$n_1 + ... + n_q
%> k_1 + ... + k_p$} \STATE{return 0}\ENDIF
%\IF{ both $s_1$ and $s_2$ are absolute}\STATE{\textit{$t$} $\leftarrow $ AbsCov($/t_{11}/.../t_{1n_1},
%/m_{11}/.../m_{1k_1}$), $f \leftarrow$ 0}\IF{$t = 0$ and $n_1$ = $k_1$ + 1 and $t_{1n_1} = \ast$ and $p \geq
%2$}\STATE{\textit{$t$} $\leftarrow $ AbsCov($/t_{11}/.../t_{1(n_1-1)}, /m_{11}/.../m_{1k_1}$), $f$ $\leftarrow$
%1}\ENDIF \IF{\textit{t} = 0}\STATE{return 0}\ENDIF\ELSIF{$s_1$ is relative}\STATE{\textit{t} $\leftarrow$
%RelCov($t_{11}/.../t_{1n_1}, m_{11}/.../m_{1k_1}$), $f \leftarrow 0$}\IF{$t$ = 0 and $k_1 \geq n_1$ and
%$t_{1n_1} = \ast$ and $p \geq 2$}\STATE{\textit{t} $\leftarrow$ RelCov($t_{11}/.../t_{1(n_1-1)},
%m_{1(k_1-n_1+2)}/.../m_{1k_1}$), $f \leftarrow 1$}\ENDIF\ELSE\STATE{return 0}\ENDIF
%
%\STATE{$i \leftarrow 1$, $j \leftarrow 1$, \textit{$t_m$} $\leftarrow $ 0, $t_1 \leftarrow 0$}
%
%\WHILE{$i \leq q$ and $j \leq p$}\IF{\textit{t} = 0}\STATE{$j \leftarrow j+1$, \textit{$t_m$} $\leftarrow
%0$}\IF{$j \neq p + 1$}\STATE{\textit{t} $\leftarrow$ RelCov($t_{i1}/.../t_{in_i},m_{j(1 + t_1)}/.../m_{jk_j}) $,
%$f \leftarrow 0$}\IF{$t$ = 0 and $k_j \geq n_i + t_1$ and $t_{in_i} = \ast$ and $j < p$}\STATE{\textit{t}
%$\leftarrow$ RelCov($t_{i1}/.../t_{i(n_i-1)},m_{j(k_j-n_i+2)}/.../m_{jk_j}) $, $f \leftarrow
%1$}\ENDIF\STATE{$t_1 \leftarrow 0$}\ELSE\STATE{return 0}\ENDIF
%
%\ELSE\STATE{\textit{$t_m$} $\leftarrow$ \textit{t} + \textit{$t_m$} + $n_i$ - 1, $i \leftarrow i+1$,}\IF{$i \neq
%q + 1$ and $f = 0$}\STATE{\textit{t} $\leftarrow$ RelCov($t_{i1}/.../t_{in_i}, m_{j(\textit{$t_m$} +
%1)}/.../m_{jk_j})$, $f \leftarrow 0$}\IF{$t = 0$ and $k_j \geq n_i + t_m$ and $t_{in_i} = \ast$ and $j <
%p$}\STATE{\textit{t} $\leftarrow$ RelCov($t_{i1}/.../t_{i(n_i-1)}, m_{j(k_j-n_i+2)}/.../m_{jk_j}) $, $f
%\leftarrow 1$}\ENDIF\ELSIF{$i \neq q + 1$ and $f = 1$}\STATE{$t \leftarrow 0$, $t_1 = 1$}\ELSE\STATE{return
%1}\ENDIF\ENDIF\ENDWHILE}
%
%\end{algorithmic}
%\end{algorithm}

\begin{algorithm}[htbp]
\caption{Descendant XPEs Covering} \label{desCov}
\begin{algorithmic}[1]
{\footnotesize \REQUIRE{two subscriptions $s_1 = /t_{11}/t_{12}/.../t_{1n_1}//t_{21}/t_{22}/...$
$/t_{2n_2}//...//t_{q1}/t_{q2}/.../t_{qn_q} $ or $s_1 = t_{11}/t_{12}/.../t_{1n_1}//t_{21}$
$/t_{22}/.../t_{2n_2}//...//t_{q1}/t_{q2}/.../t_{qn_q} $, and $s_2 = /m_{11}/m_{12}/$
$.../m_{1k_1}//m_{21}/m_{22}/.../m_{2k_2}//...//m_{p1}/m_{p2}/.../m_{pk_p}$ or $s_2 =
m_{11}/m_{12}/.../m_{1k_1}//m_{21}/m_{22}/.../m_{2k_2}//...//m_{p1}/$ $m_{p2}/.../m_{pk_p}$} \ENSURE{1 if $s_1$
covers $s_2$, 0 if $s_1$ does not cover $s_2$}

\IF{$n_1 + n_2 + ... + n_q
> k_1 + k_2 + ... + k_p$} \STATE{return 0}\ENDIF
\IF{ both $s_1$ and $s_2$ are absolute}\STATE{\textit{$temp$} $\leftarrow $ AbsCov($/t_{11}/t_{12}/.../t_{1n_1},
/m_{11}/m_{12}/...$ $/m_{1k_1}$), $flag \leftarrow$ 0}\IF{$temp = 0$ and $n_1$ = $k_1$ + 1 and $t_{1n_1} = \ast$
and $p \geq 2$}\STATE{\textit{$temp$} $\leftarrow $ AbsCov($/t_{11}/t_{12}/.../t_{1(n_1-1)},
/m_{11}/m_{12}/.../$ $m_{1k_1}$), $flag$ $\leftarrow$ 1}\ENDIF \IF{\textit{temp} = 0}\STATE{return
0}\ENDIF\ELSIF{$s_1$ is relative}\STATE{\textit{temp} $\leftarrow$ RelCov($t_{11}/t_{12}/.../t_{1n_1},
m_{11}/m_{12}/.../m_{1k_1}$), $flag \leftarrow 0$}\IF{$temp$ = 0 and $k_1 \geq n_1$ and $t_{1n_1} = \ast$ and $p
\geq 2$}\STATE{\textit{temp} $\leftarrow$ RelCov($t_{11}/t_{12}/.../t_{1(n_1-1)}, m_{1(k_1-n_1+2)}/$
$m_{1(k_1-n_1+3)}/.../m_{1k_1}$), $flag \leftarrow 1$}\ENDIF\ELSE\STATE{return 0}\ENDIF

\STATE{$i \leftarrow 1$, $j \leftarrow 1$, \textit{$temp_m$}
$\leftarrow $ 0, $temp_1 \leftarrow 0$}

\WHILE{$i \leq q$ and $j \leq p$}\IF{\textit{temp} = 0}\STATE{$j \leftarrow j+1$, \textit{$temp_m$} $\leftarrow
0$}\IF{$j \neq p + 1$}\STATE{\textit{temp} $\leftarrow$ RelCov($t_{i1}/t_{i2}/.../t_{in_i},m_{j(1 + temp_1)}/$
$m_{j(2+temp_1)}/.../m_{jk_j}) $, $flag \leftarrow 0$}\IF{$temp$ = 0 and $k_j \geq n_i + temp_1$ and $t_{in_i} =
\ast$ and $j < p$}\STATE{\textit{temp} $\leftarrow$ RelCov($t_{i1}/t_{i2}/.../t_{i(n_i-1)}, m_{j(k_j-n_i+2)}/$
$m_{j(k_j-n_i+3)}/.../m_{jk_j}) $, $flag \leftarrow 1$}\ENDIF\STATE{$temp_1 \leftarrow 0$}\ELSE\STATE{return
0}\ENDIF

\ELSE\STATE{\textit{$temp_m$} $\leftarrow$ \textit{temp} + \textit{$temp_m$} + $n_i$ - 1, $i \leftarrow
i+1$,}\IF{$i \neq q + 1$ and $flag = 0$}\STATE{\textit{temp} $\leftarrow$ RelCov($t_{i1}/t_{i2}/.../t_{in_i},
m_{j(\textit{$temp_m$} + 1)}$ $/m_{j(\textit{$temp_m$} + 2)}/.../m_{jk_j})$, $flag \leftarrow 0$}\IF{$temp = 0$
and $k_j \geq n_i + temp_m$ and $t_{in_i} = \ast$ and $j < p$}\STATE{\textit{temp} $\leftarrow$
RelCov($t_{i1}/t_{i2}/.../t_{i(n_i-1)}, m_{j(k_j-n_i+2)}/$ $m_{j(k_j-n_i+3)}/.../m_{jk_j}) $, $flag \leftarrow
1$}\ENDIF\ELSIF{$i \neq q + 1$ and $flag = 1$}\STATE{$temp \leftarrow 0$, $temp_1 = 1$}\ELSE\STATE{return
1}\ENDIF\ENDIF\ENDWHILE}

\end{algorithmic}
\end{algorithm}

%\begin{figure}[htbp]
%\label{cov3} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize two subscriptions $s_1 = /t_{11}/t_{12}/.../t_{1n_1}//t_{21}/t_{22}/...$$/t_{2n_2}$}\\
%    \hspace*{0.8cm} {\footnotesize $//...//t_{q1}/t_{q2}/.../t_{qn_q} $ and $s_2 = /m_{11}/m_{12}/.../m_{1k_1}//$}\\
%    \hspace*{0.8cm} {\footnotesize $m_{21}/m_{22}/.../m_{2k_2}//...//m_{p1}/m_{p2}/.../m_{pk_p}$}\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $s_1$ covers $s_2$}\\
%    \hspace*{1cm} {\footnotesize 0 if $s_1$ does not cover $s_2$}\\
%    {\footnotesize 01:} {\footnotesize {\bf If} $n_1 + n_2 + ... + n_q > k_1 + k_2 + ... + k_p$ {\bf then} return 0} \\
%    {\footnotesize 02:} {\footnotesize {\bf If} Cov1($/t_{11}/t_{12}/.../t_{1n_1}, /m_{11}/m_{12}/.../m_{1k_1}$) = 0 {\bf then} return 0}\\
%    {\footnotesize 03:} {\footnotesize {\bf Else} \textit{$temp_m$} $\leftarrow n_1$, \textit{temp} $\leftarrow$ Cov2($t_{21}/t_{22}/.../t_{2n_2}, $}\\
%    \hspace*{2cm} {\footnotesize $m_{1(\textit{$temp_m$}+1)}/m_{1(\textit{$temp_m$}+2)}/.../m_{1k_1}$)}\\
%    {\footnotesize 04:} {\footnotesize $i \leftarrow 2$, $j \leftarrow 1$ }\\
%    {\footnotesize 05:} {\footnotesize {\bf While} $i \leq q$ and $j \leq p$ {\bf do}}\\
%    {\footnotesize 06:}  \hspace*{0.7cm} {\footnotesize {\bf If} \textit{temp} = 0 {\bf then} $j \leftarrow j+1$, \textit{$temp_m$} $\leftarrow 0$,}\\
%    {\footnotesize 07:} \hspace*{1cm} {\footnotesize {\bf If} $j \neq p + 1$ {\bf then} \textit{temp} $\leftarrow$ Cov2($t_{i1}/t_{i2}/.../t_{in_i}, $}\\
%    \hspace*{1.8cm} {\footnotesize $m_{j1}/m_{j2}/.../m_{jk_j})$}\\
%    {\footnotesize 08:} \hspace*{1cm} {\footnotesize {\bf Else} return 0}\\
%    {\footnotesize 09:} \hspace*{0.7cm}  {\footnotesize {\bf Else} \textit{$temp_m$} $\leftarrow$ \textit{temp} + \textit{$temp_m$} + $n_i$ - 1, $i \leftarrow i+1$,} \\
%    {\footnotesize 10:} \hspace*{1.3cm} {\footnotesize  {\bf If} $i \neq q + 1$ {\bf then} \textit{temp} $\leftarrow$ Cov2($t_{i1}/t_{i2}/.../t_{in_i}, $ }\\
%    \hspace*{2cm} {\footnotesize $m_{j(\textit{$temp_m$} + 1)}/m_{j(\textit{$temp_m$} + 2)}/.../m_{jk_j})$}\\
%    {\footnotesize 11:}  \hspace*{1.3cm} {\footnotesize {\bf Else} return 1}
%    }\caption{Cov3}
%\end{figure}

%\begin{algorithm}[htbp]
%\caption{Relative Descendant XPEs Covering} \label{relDesCov}
%\begin{algorithmic}[1]
%\REQUIRE{two subscriptions $s_1 =
%t_{11}/t_{12}/.../t_{1n_1}//t_{21}$
%$/t_{22}/.../t_{2n_2}//...//t_{q1}/t_{q2}/.../t_{qn_q} $ and $s_2
%= /m_{11}$
%$/m_{12}/.../m_{1k_1}//m_{21}/m_{22}/.../m_{2k_2}//...//m_{p1}/m_{p2}$
%$/.../m_{pk_p}$ or $s_2 =
%m_{11}/m_{12}/.../m_{1k_1}//m_{21}/m_{22}$
%$/.../m_{2k_2}//...$$//m_{p1}/m_{p2}/.../m_{pk_p}$} \ENSURE{1 if
%$s_1$ covers $s_2$, 0 if $s_1$ does not cover $s_2$}
%
%\IF{$n_1 + n_2 + ... + n_q > k_1 + k_2 + ... + k_p$} \STATE{return
%0}\ELSE\STATE{\textit{temp} $\leftarrow$
%RelCov($t_{11}/t_{12}/.../t_{1n_1}, m_{11}/m_{12}/.../m_{1k_1}$), $i
%\leftarrow 1$, $j \leftarrow 1$, \textit{$temp_m$} $\leftarrow
%0$}\ENDIF
%
%\WHILE{$i \leq q$ and $j \leq p$}\IF{\textit{temp} = 0}\STATE{$j
%\leftarrow j+1$, \textit{$temp_m$} $\leftarrow 0$,}\IF{$j \neq p +
%1$}\STATE{\textit{temp} $\leftarrow$
%RelCov($t_{i1}/t_{i2}/.../t_{in_i},m_{j1}/m_{j2}/.../m_{jk_j})
%$}\ELSE\STATE{return 0}\ENDIF\ELSE\STATE{\textit{$temp_m$}
%$\leftarrow$ \textit{temp} + \textit{$temp_m$} + $n_i$ - 1, $i
%\leftarrow i+1$,}\IF{$i \neq q + 1$}\STATE{\textit{temp}
%$\leftarrow$ RelCov($t_{i1}/t_{i2}/.../t_{in_i},
%m_{j(\textit{$temp_m$} + 1)}/$ $m_{j(\textit{$temp_m$} +
%2)}/.../m_{jk_j})$}\ELSE\STATE{return 1}\ENDIF\ENDIF\ENDWHILE
%
%\end{algorithmic}
%\end{algorithm}

%\begin{figure}[htbp]
%\label{cov4} \setlength{\fboxsep}{0.5cm}
%    \centering
%    \thicklines
%    \shortstack[l]{\\
%    {\footnotesize \bf Input:} {\footnotesize two subscriptions $s_1 = t_{11}/t_{12}/.../t_{1n_1}//t_{21}/t_{22}/.../t_{2n_2}//$}\\
%    \hspace*{0.8cm} {\footnotesize $...//t_{q1}/t_{q2}/.../t_{qn_q} $ and $s_2 = /m_{11}/m_{12}/...$$/m_{1k_1}//m_{21}/$ }\\
%    \hspace*{0.8cm} {\footnotesize $m_{22}/.../m_{2k_2}//...//m_{p1}/m_{p2}/.../m_{pk_p}$ or $s_2 = m_{11}/$}\\
%    \hspace*{0.8cm} {\footnotesize $m_{12}/.../m_{1k_1}//m_{21}/m_{22}/.../m_{2k_2}//...$$//m_{p1}/m_{p2}/.../m_{pk_p}$}\\
%    {\footnotesize \bf Output:} {\footnotesize 1 if $s_1$ covers $s_2$}\\
%    \hspace*{1cm} {\footnotesize 0 if $s_1$ dose not cover $s_2$}\\
%    {\footnotesize 1:} {\footnotesize {\bf If} $n_1 + n_2 + ... + n_q > k_1 + k_2 + ... + k_p$ {\bf then} return 0} \\
%    {\footnotesize 2:} {\footnotesize {\bf Else} \textit{temp} $\leftarrow$ Cov2($t_{11}/t_{12}/.../t_{1n_1}, m_{11}/m_{12}/.../m_{1k_1}$), }\\
%    \hspace*{0.9cm} {\footnotesize $i \leftarrow 1$, $j \leftarrow 1$, \textit{$temp_m$} $\leftarrow 0$}\\
%    {\footnotesize 3:} {\footnotesize {\bf While} $i \leq q$ and $j \leq p$ {\bf do}}\\
%    {\footnotesize 4:}  \hspace*{0.7cm} {\footnotesize {\bf If} \textit{temp} = 0 {\bf then} $j \leftarrow j+1$, \textit{$temp_m$} $\leftarrow 0$,}\\
%    {\footnotesize 5:} \hspace*{1cm} {\footnotesize {\bf If} $j \neq p + 1$ {\bf then} \textit{temp} $\leftarrow$ Cov2($t_{i1}/t_{i2}/.../t_{in_i}, m_{j1}/m_{j2}$}\\
%    \hspace*{1.8cm} {\footnotesize $/.../m_{jk_j}$)}\\
%    {\footnotesize 6:} \hspace*{1cm} {\footnotesize {\bf Else} return 0}\\
%    {\footnotesize 7:} \hspace*{0.7cm}  {\footnotesize {\bf Else} \textit{$temp_m$} $\leftarrow$ \textit{temp} + \textit{$temp_m$} + $n_i$ - 1, $i \leftarrow i+1$,} \\
%    {\footnotesize 8:} \hspace*{1.3cm} {\footnotesize  {\bf If} $i \neq q + 1$ {\bf then} \textit{temp} $\leftarrow$ Cov2($t_{i1}/t_{i2}/.../t_{in_i},$}\\
%    \hspace*{2cm} {\footnotesize $m_{j(\textit{$temp_m$} + 1)}/m_{j(\textit{$temp_m$} + 2)}/.../m_{jk_j}$)}\\
%    {\footnotesize 9:}  \hspace*{1.3cm} {\footnotesize {\bf Else} return 1}
%    }\caption{Cov4}
%\end{figure}

\SubSection{Merging}

If there is no covering relation among a set of subscriptions,
subscriptions can be merged into a new subscription to create a
more concise routing table. In this section, we exploit the
merging rules for XPath queries.

%The merging technique~\cite{} is used for further minimizing the
%routing table size and the reducing network traffic in a
%content-based contexts. It further extends the concept of
%covering. It is applied if two subscriptions $s_1$ and $s_2$ do
%not cover each other but have largely overlap. They can be merged
%into a new subscription $s$, where $P(s) \supseteq P(s_1) \bigcup
%P(s_2)$. If $=$ holds, the merging is perfect merging, otherwise,
%it is an imperfect merging that will introduce false
%positive~\cite{}. More general case is if there is no covering
%relation among a set of subscriptions, they can be merged into a
%new subscription to create more concise routing table. In this
%section, we discuss what should be merged and how to do the
%merging in the contexts of XPath data. The interested reader is
%referred to ~\cite{} for discussion of when to do the merging.

The subscriptions can be merged if they have only one difference
(e.g., different elements). For instance, two subscriptions $s_1 =
a/*/c/d$ and $s_2 = a/*/c/e$ can be merged into $s = a/*/c/*$.
Note that if they differ in one operator, they should be in a
covering relation to other. The general form of this rule is:
\begin{itemize}
\item $s_1 = o_1$ $t_1$... $o_i$ $t_i$ $o_{i+1}$ $m$ $o_{i+2}$
$t_{i+1}$...$o_{n+1}$ $t_n$ \item $s_2 = o_1$ $t_1$... $o_i$ $t_i$
$o_{i+1}$ $k$ $o_{i+2}$ $t_{i+1}$...$o_{n+1}$ $t_n$
\end{itemize}
are merged into
\begin{itemize}
\item $s = o_1$ $t_1$... $o_i$ $t_i$ $o_{i+1}$ $*$ $o_{i+2}$
$t_{i+1}$...$o_{n+1}$ $t_n$
\end{itemize}
where $m$ and $k$ are different elements, $o_i$ is either a
/-operator or a //-operator, and, $t_i$ is a wildcard or an
element.

The merging rules can also be applied, if the subscriptions have
two differences (e.g., different operators or different elements).
For example, two subscriptions $s_1 = /a/c/*/*$ and $s_2 =
/a//c/*/c$ that do not cover each other can be merged into $s =
/a//c/*/*$. That is, different operators are merged into
//-operator, and different elements are merged into $*$. We
represent the general form of this rule as:
\begin{itemize}
\item $s_1 = o_1$ $t_1$... $o_i$ $t_i$ $o_{i+1}$ $m$... $o_{j+1}$
$t_j$ $/$ $t_{j+1}$...$o_{n+2}$ $t_n$ \item $s_1 = o_1$ $t_1$...
$o_i$ $t_i$ $o_{i+1}$ $k$... $o_{j+1}$ $t_j$ $//$
$t_{j+1}$...$o_{n+2}$ $t_n$
\end{itemize}
are merged into
\begin{itemize}
\item $s = o_1$ $t_1$... $o_i$ $t_i$ $o_{i+1}$ $*$... $o_{j+1}$
$t_j$ $//$ $t_{j+1}$...$o_{n+2}$ $t_n$
\end{itemize}
where $m$, $k$ and $t_i$ is a wildcard or an element, and $o_i$ is
a /-operator or a //-operator.

More general rules can be applied, for example, two subscriptions
which have a common part can be merged by simply replacing the
different parts with the //-operator. We generalize this rule to:
\begin{itemize}
\item $s_1 = o_1$ $t_1$... $o_i$ $t_i$ $XPE_1$ $o_{i+1}$
$t_{i+1}$...$o_n$ $t_n$ \item $s_2 = o_1$ $t_1$... $o_i$ $t_i$
$XPE_2$ $o_{i+1}$ $t_{i+1}$...$o_n$ $t_n$
\end{itemize}
are merged into
\begin{itemize}
\item $s = o_1$ $t_1$... $o_i$ $t_i$ // $t_{i+1}$...$o_n$ $t_n$
\end{itemize}
where $XPE_1$ and $XPE_2$ are different XPath expressions,
$t_i$ is a wildcard or an element, and $o_i$ is a /-operator or a
//-operator. This rule is applied if most parts in two subscriptions are equal, otherwise, more false
positives will be introduced.

Based on the above merging rules, we can compute an
\textit{imperfect degree} ~\cite{MBD} if each broker in the
network knows the DTD relative to the XML data producer. The
imperfect degree of a new merger $s$, derived from $s_1$,
$s_2$,..., $s_n$, is:
\begin{displaymath}
\vspace*{-1mm} D_{imperfect} = \frac{|P(s)-
\cup_{i=1}^nP(s_i)|}{|P(s)|}
    %\end{equation}
\end{displaymath}
It measures the imperfectness of an individual new merger. If the
publications are distributed uniformly, the bigger the imperfect
degree, the more false positive introduced by the new merger. For
example, two subscriptions $s_1 = /a/*/c/d$ and $s_2 = /a/*/c/e$
can be merged into $s = /a/*/c/*$. If the corresponding DTD
indicates that the elements $a, b, c, d, e$ are allowed at the
fourth position, $60\%$ false positive will be introduced at
position 4. We need to consider the distribution of other elements
in the subscription, e.g., the probability of each element
appearing at other positions, to compute the total number of false
positive introduced.


%-------------------------------------------------------------------------
\Section{Evaluation}

In this section, we experimentally evaluate the performance of our
routing and covering algorithms. All algorithms are implemented in
C++. We perform all experiments on a computer with an Intel Xeon
2.4GHz processor with 2GB RAM. For generating the XPE workload, we
used the XPath generator released by Diao \textit{et
al.}~\cite{yfilter::journal}. We set the flag of distinct vs.
non-distinct $(D)$ to \textit{true} (that is, queries are
distinct), and the maximum length of XPE $(L)$ to 10. For
generating the XML document workload, we used the IBM XML
Generator~\cite{xmlgenerator}. This generator was used with
default parameters except that we fixed the maximum number of
levels of the resulting XML documents to 10. This was set
consistently with the maximum length of XPEs. We used two
different DTDs: the NITF (News Industry Text Format)
DTD~\cite{nitf-dtd} and the PSD (Protein Sequence Database)
DTD~\cite{psd-dtd}. The performance metrics we took include
routing table size and publication routing time in a single
broker, and the network traffic in a broker topology of 8 brokers.


%-------------------------------------------------------------------------
\SubSection{Routing Table Size}

\begin{figure}[htbp]
\centering \epsfig{file=rts.eps, scale=0.65}\caption{Routing Table Size}\label{rts}
\end{figure}

Our algorithms exploit the covering relations among XPath queries.
To verify this fact, we generate two data sets for NITF which
include 100,000 XPath queries each. We vary the probability of
``$*$'' occurring at a location step ($W$) and the probability of
``$//$'' occurring at a location step ($DO$) to obtain data sets
with different numbers of covering rates.
%Set $A$ is comprised of
%data with $DO$ set to 0 and $W$ set to 0 or 0.1, and Set $B$ is
%comprised of data with $W$ set to 0 and $DO$ set to 0 or 0.1.
For each data set, we evaluate the effect of the covering
optimization against no covering. As shown in Figure~\ref{rts},
for Set $A$, the routing table size is reduced dramatically by
covering. The routing table size is the number of XPath queries in
the table. The subscriptions in Set $A$ have a higher degree of
overlap. About 90\% subscriptions are covered by other
subscriptions in Set $A$; while the covering rate of Set $B$ is
about 50\%. The covering algorithm perform better on Set $A$ due
to the higher degree of overlap, in this set.


%-------------------------------------------------------------------------
\SubSection{Publication Routing Performance}

In this experiment, we compare the covering-based routing
performance of the system with the routing performance of the
system without covering technique, using the above two data sets
$A$ and $B$. We generate 500 XML documents and extract 23,098
publications from these documents. Table~\ref{prt} shows the
routing time of the publications against 100,000 XPath queries.
The measurements were obtained by averaging the time taken to
route all publications. Both Set $A$ and Set $B$ exhibit benefits,
derived from subscription covering. After applying the covering
algorithm, the routing time for Set $A$ and Set $B$ are reduced by
84.6\% and 47.5\%, respectively.

\begin{table}[htbp]\centering
\begin{tabular}{|l|c|c|}
\hline
{\rule[-1mm]{0mm}{4mm}}{\footnotesize{Method}} & {\footnotesize{Set $A$ (ms)}} & {\footnotesize{Set $B$ (ms)}}\\
\hline
    {\footnotesize No Covering} &{\footnotesize 13.96} & {\footnotesize 14.23}\\
\hline
    {\footnotesize Covering} &{\footnotesize 2.15} & {\footnotesize 7.47}\\
\hline
\end{tabular}\caption{Publication Routing Performance}\label{prt}
\end{table}

%-------------------------------------------------------------------------
\SubSection{Network Traffic}

The network traffic can be influenced by the broker topology, the distribution of subscribers and publishers and
the routing strategy applied. In this section, we investigate the impact of advertisement and covering
techniques on network traffic, given a tree-like broker topology. As shown in Figure~\ref{tplg}, starting from a
root broker which is connected to one publisher, all brokers except the leaves are connected to one or two
subordinate brokers. Each leaf broker is connected to one subscriber.

\begin{figure}[htbp]
\centering \epsfig{file=topology.eps, scale=0.65}\caption{Broker Topology}\label{tplg}
\end{figure}

In this experiment, we compare four routing methods with different
optimization techniques. They are, the routing with neither
advertisement, nor covering technique ({\footnotesize {\sf
no-Adv-no-Cov}}), the routing with covering ({\footnotesize {\sf
no-Adv-with-Cov}}), the routing with advertisement ({\footnotesize
{\sf with-Adv-no-Cov}}) and the routing with both advertisement
and covering techniques ({\footnotesize {\sf with-Adv-with-Cov}}).
We generate 1,000 distinct XPath queries for each subscriber using
the PSD DTD, and 50 XML documents for the publisher. 4,182
publications are extracted from these documents. Table~\ref{nt}
shows the total number of messages, including advertisements,
publications and subscriptions, received by all brokers in the
network against each routing strategy. As can be seen, the two
advertisement-based routing methods significantly reduce the
network traffic, because in this case a subscription is only
forwarded to brokers that are on a path from the respective
subscriber to the publisher. The introduction of advertisements
reduces the network traffic to 68.5\% and 75.6\% for
non-covering-based and covering-based routing strategies,
respectively. Moreover, the covering-based routing strategies
outperform the two corresponding non-covering-based routing
strategies, and the network traffic is reduced by 12.4\% and
3.4\%, respectively. This improvement is a direct consequence of
the covering technique. However, the with-Adv-with-Cov technique
does not perform much better than with-Adv-no-Cov technique, as
the difference between the two non-advertisement-based routings,
because the use of advertisements significantly reduces the number
of subscriptions in the network, and consequently, reduces the
probability of the covering technique applied.


\begin{table}[htbp]\centering
\begin{tabular}{|c|c|}
\hline
{\rule[-1mm]{0mm}{4mm}} & {\footnotesize{\bf Network Traffic (total number of messages}} \\
    \raisebox{1mm}{\footnotesize{\bf Method}} & {\footnotesize {\bf received by all brokers in the whole network)}} \\
    \hline
    {\footnotesize no-Adv-no-Cov} &{\footnotesize 58,138} \\
\hline
    {\footnotesize no-Adv-with-Cov} &{\footnotesize 50,931} \\
\hline
    {\footnotesize with-Adv-no-Cov} &{\footnotesize 39,849} \\
\hline
    {\footnotesize with-Adv-with-Cov} &{\footnotesize 38,492} \\
\hline
\end{tabular}\caption{Network Traffic}\label{nt}
\end{table}


%-------------------------------------------------------------------------
\Section{Conclusion}

In this paper, we have studied the problem of efficiently routing XML documents through a data dissemination
network. In the dissemination network, XPath-like expressions, referred to as advertisements, serve to determine
dissemination paths from data producers to data consumers. Advertisements are used to establish XML document
routing trees defined through XPath filter expressions submitted to the data dissemination network by data
consumers. The filter expressions propagate in the reverse direction from data consumers to data produces
establishing the routing trees along which the XML documents are disseminated. To enable this processing for XML
and XPath, routing protocols that evaluate advertisements against advertisements and advertisements against
subscriptions were developed in this paper. Advertisements are determined by the DTD of data producers.
Non-recursive DTDs yield simple expressions, whereas recursive DTDs require complex processing. From a practical
point of view, it would be straight forward to limit the recursive depth of documents a data producer is allowed
to publish, which would reduce the complexity of processing DTDs. In this paper, we have presented algorithms
for all these cases. Furthermore, we have shown how to apply several optimizations, including advertising,
covering and merging techniques to content-based routing of XML documents. In the experiments, we have shown
that the use of covering techniques can significantly reduce the routing table size and improve the routing time
by 84.6\% and 47.5\% for different data sets, respectively. The introduction of advertisements reduces the
network traffic to 68.5\% and 75.6\% for non-covering-based and covering-based routing strategies, respectively,
given a tree-like broker topology.

%In this paper, we have studied the problem of efficiently routing
%XML paths and XPath queries in the XML-based publish/subscribe
%system. Several optimizations, including advertisement, covering
%and merging techniques, that have been widely studied in the
%content-based publish/subscribe routing, are extended to XML/XPath
%context. We have presented schemes to represent non-recursive and
%recursive advertisement with respect to XML documents. The
%advertisement is generated by exploiting different types of DTD
%information, non-recursive or recursive. We have proposed several
%novel routing algorithms for advertisement and subscription
%forwarding in the context of XML/XPath routing. The Covering
%algorithms are presented for XPath expressions, including
%wildcards, child and descendant operators. Finally, we have
%explored merging technique, and discussed merging rules among
%XPath queries. Results from a prototype implementation have
%verified that advertisement-based routing can significantly reduce
%the network traffic under a given broker topology, and the
%covering technique further reduces dissemination costs.


%-------------------------------------------------------------------------
%\nocite{ex1,ex2}
\bibliographystyle{latex8}
\bibliography{latex8}

\end{document}

