Designing Data Warehouses

                 Jeffrey D. Ullman
             Department of Computer Science
                Stanford University

It is becoming increasingly common to store copies of information
gleaned from various databases in another database called a "warehouse."
Planning and analysis of operations such as selecting merchandise to
display in different stores can be performed on the warehouse.
The use of warehouses to support complex queries is called OLAP (on-line
analytical processing), as opposed to the OLTP (on-line transaction
processing) that is performed at the source databases.

Because warehouses tend to be extremely large, and OLAP queries tend to
involve large amounts of data, there is an opportunity to represent the
data of the warehouse redundantly, typically using materialized views of
the data to support some of the expected OLAP queries.
We consider the common case of a "data cube," where the data is thought
of as multidimensional, and the potential materialized views are formed
by projecting out some attributes and aggregating.
We show that a simple greedy algorithm for selecting materialized views
cannot be worse than 63% of the optimal choice for a given space budget
and assumed query population.  Moreover, it has been shown that no
polynomial algorithm whatsoever can be better than 63% of optimal in the
worst case.  Finally, we consider the effect of allowing the creation of
indexes on the materialized views.  The simple greedy algorithm can be
arbitrarily bad, but we show there are cheap ways to add lookahead and
get close to the 63% performance guarantee.