Designing Data Warehouses Jeffrey D. Ullman Department of Computer Science Stanford University It is becoming increasingly common to store copies of information gleaned from various databases in another database called a "warehouse." Planning and analysis of operations such as selecting merchandise to display in different stores can be performed on the warehouse. The use of warehouses to support complex queries is called OLAP (on-line analytical processing), as opposed to the OLTP (on-line transaction processing) that is performed at the source databases. Because warehouses tend to be extremely large, and OLAP queries tend to involve large amounts of data, there is an opportunity to represent the data of the warehouse redundantly, typically using materialized views of the data to support some of the expected OLAP queries. We consider the common case of a "data cube," where the data is thought of as multidimensional, and the potential materialized views are formed by projecting out some attributes and aggregating. We show that a simple greedy algorithm for selecting materialized views cannot be worse than 63% of the optimal choice for a given space budget and assumed query population. Moreover, it has been shown that no polynomial algorithm whatsoever can be better than 63% of optimal in the worst case. Finally, we consider the effect of allowing the creation of indexes on the materialized views. The simple greedy algorithm can be arbitrarily bad, but we show there are cheap ways to add lookahead and get close to the 63% performance guarantee.