# Database Architectures for New Hardware

### a tutorial by

### Anastassia Ailamaki

Database Group Carnegie Mellon University http://www.cs.cmu.edu/~natassa















# Breaking the Memory Wall

Databases @Carnegie Mellor

#### Wish for a Database Architecture:

- that uses hardware intelligently
- that won't fall apart when new computers arrive
- that will adapt to alternate configurations

#### Efforts from multiple research communities

- Cache-conscious data placement and algorithms
- Instruction stream optimizations
- Novel database software architectures
- Novel hardware designs (covered briefly)

| Detailed Outline                                          | Databases<br>@Carnegie Mellon |
|-----------------------------------------------------------|-------------------------------|
| Introduction and Overview                                 |                               |
| New Hardware                                              |                               |
| Execution Pipelines                                       |                               |
| <ul> <li>Cache memories</li> </ul>                        |                               |
| Where Does Time Go?                                       |                               |
| <ul> <li>Measuring Time (Tools and Benchmarks)</li> </ul> |                               |
| Analyzing DBs: Experimental Results                       |                               |
| Bridging the Processor/Memory Speed Gap                   |                               |
| <ul> <li>Data Placement</li> </ul>                        |                               |
| <ul> <li>Access Methods</li> </ul>                        |                               |
| <ul> <li>Query Processing Alorithms</li> </ul>            |                               |
| <ul> <li>Instruction Stream Optimizations</li> </ul>      |                               |
| <ul> <li>Staged Database Systems</li> </ul>               |                               |
| Newer Hardware                                            |                               |
| Hip and Trendy                                            |                               |
| <ul> <li>Query co-processing</li> </ul>                   |                               |
| <ul> <li>Databases on MEMStore</li> </ul>                 |                               |
| Directions for Future Research                            |                               |
| ©2004 Anastassia Ailamaki                                 | 10                            |







































# This Section's Goals

Hardware takes time: how do we measure time?

- Understand how to efficiently analyze microarchitectural behavior of database workloads
  - Should we use simulators? When? Why?
  - How do we use processor counters?
  - Which tools are available for analysis?
  - Which database systems/benchmarks to use?
- Survey experimental results on workload characterization
  - Discover what matters for database performance









| Example: 1                 | Intel PPRO/PIII                                | Databases<br>@Carnegie Mellon |
|----------------------------|------------------------------------------------|-------------------------------|
| Cycles                     | CPU_CLK_UNHALTED                               |                               |
| Instructions               | INST_RETIRED                                   |                               |
| L1 Data (L1D) accesses     | DATA_MEM_REFS                                  |                               |
| L1 Data (L1D) misses       | DCU_LINES_IN                                   | "41:00 0"                     |
| L2 Misses                  | L2_LINES_IN                                    | •"time"                       |
| Instruction-related stalls | IFU_MEM_STALL                                  |                               |
| Branches                   | BR_INST_DECODED                                |                               |
| Branch mispredictions      | BR_MISS_PRED_RETIRED                           |                               |
| TLB misses                 | ITLB_MISS                                      |                               |
| L1 Instruction misses      | IFU_IFETCH_MISS                                |                               |
| Dependence stalls          | PARTIAL_RAT_STALLS                             |                               |
| Resource stalls            | RESOURCE_STALLS /                              |                               |
|                            | asurable events, stati<br>neasure the same thi |                               |









Databases @Carnegie Mellon

### Microbenchmarks [KPH98,ADH99,KP00,SAF04]

What matters is basic execution loops

- Isolate three basic operations:
  - Sequential scan (no index)
  - Random access on records (non-clustered index)
  - Join (access on two tables)
- Vary parameters:
  - selectivity, projectivity, # of attributes in predicate
  - join algorithm, isolate phases
  - table size, record size, # of fields, type of fields
- Determine behavior and trends
  - Microbenchmarks can efficiently mimic TPC microarchitectural behavior!
  - Widely used to analyze query execution

Excellent for microarchitectural analysis











































Databases @Carnegie Mellon

[HP03]

## Dynamic PAX: Data Morphing

PAX random access: more cache misses in record
Store attributes accessed together contiguously
Dynamic partition updates with changing workloads

Optimize total cost based on cache misses
Partition algorithms: naïve & hill-climbing algorithms

Fewer cache misses

Better projectivity and scalability for index scan queries
Up to 45% faster than NSM & 25% faster than PAX

Same I/O performance as PAX and NSM
Future work: how to handle conflicts?





| Page layout | Cache-memory Performance |                          | Memory-disk Performance |                          |
|-------------|--------------------------|--------------------------|-------------------------|--------------------------|
|             | full-record<br>access    | partial record<br>access | full-record<br>access   | partial record<br>access |
| NSM         | 0                        | 8                        | ٢                       | 8                        |
| DSM         | 8                        | 0                        | 8                       | <b>O</b>                 |
| PAX         | Θ                        | <b></b>                  | ٢                       | 8                        |

64



























































































| Outline                                 | Databases<br>@Carnegie Mellon |
|-----------------------------------------|-------------------------------|
| Introduction and Overview               |                               |
| New Hardware                            |                               |
| Where Does Time Go?                     |                               |
| Bridging the Processor/Memory Speed Gap |                               |
| Hip and Trendy                          |                               |
| Query co-processing                     |                               |
| Databases on MEMStore                   |                               |
| Directions for Future Research          |                               |
|                                         |                               |
|                                         |                               |
| ©2004 Anastassia Ailamaki               | 110                           |



















### Future research directions

- Rethink Query Optimization with increasing complexity, cost-based optimization not ideal
- Multiprocessors and really new modular software architectures to fit new computers
  - Current research in DB workloads only scratches surface
  - Optimize execution on multiple-core chips
  - Exploit multithreaded processors
- Power-aware database systems
  - On embeded processors, laptops, etc.
- Automatic data placement and memory layer optimization one level should not need to know what others do
  - Auto-everything
- Aggressive use of hybrid processors

©2004 Anastassia Ailamaki



## Special thanks go to ...







Databases @Carnegie Mellor

 Shimin Chen, Minglong Shao, Stavros Harizopoulos, and Nikos Hardavellas for invaluable contributions to this talk



- > Steve Schlosser (MEMStore)
- > Ravi Ramamurthy (fractured mirrors)
- > Babak Falsafi and Chris Colohan (h/w architecture)

©2004 Anastassia Ailamaki



|              | References Databas<br>@Carnegie M                                                                                                                                                                                                                                                                            |     |
|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| ١            | Where Does Time Go? (simulation only)                                                                                                                                                                                                                                                                        |     |
| [ADS02]      | Branch Behavior of a Commercial OLTP Workload on Intel IA32 Processors. M.<br>Annavaram, T. Diep, J. Shen. International Conference on Computer Design: VLSI in<br>Computers and Processors (ICCD), Freiburg, Germany, September 2002.                                                                       |     |
| [SBG02]      | A Detailed Comparison of Two Transaction Processing Workloads. R. Stets, L.A. Barroso, and K. Gharachorloo. <i>IEEE Annual Workshop on Workload Characterization (WWC), Austin, Texas, November 2002.</i>                                                                                                    |     |
| [BGN00]      | Impact of Chip-Level Integration on Performance of OLTP Workloads. L.A. Barroso, K. Gharachorloo, A. Nowatzyk, and B. Verghese. <i>IEEE International Symposium on High-Performance Computer Architecture (HPCA), Toulouse, France, January 2000.</i>                                                        |     |
| [RGA98]      | Performance of Database Workloads on Shared Memory Systems with Out-of-Order<br>Processors. P. Ranganathan, K. Gharachorloo, S. Adve, and L.A. Barroso. International<br>Conference on Architecture Support for Programming Languages and Operating Systems<br>(ASPLOS), San Jose, California, October 1998. |     |
| [LBE98]      | An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors. J. Lo, L.A. Barroso, S. Eggers, K. Gharachorloo, H. Levy, and S. Parekh. ACM International Symposium on Computer Architecture (ISCA), Barcelona, Spain, June 1998.                                                    |     |
| [EJL96]      | <b>Evaluation of Multithreaded Uniprocessors for Commercial Application Environments.</b><br>R.J. Eickemeyer, R.E. Johnson, S.R. Kunkel, M.S. Squillante, and S. Liu. <i>ACM International Symposium on Computer Architecture (ISCA), Philadelphia, Pennsylvania, May</i> 1996.                              |     |
|              |                                                                                                                                                                                                                                                                                                              |     |
| ©2004 Anasta | ssia Ailamaki                                                                                                                                                                                                                                                                                                | 123 |

|                | References Databas<br>@Carnegie M                                                                                                                                                                                                                                                         |     |
|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Where          | e Does Time Go? (real-machine/simulation)                                                                                                                                                                                                                                                 | )   |
| [RAD02]        | Comparing and Contrasting a Commercial OLTP Workload with CPU2000. J. Rupley II,<br>M. Annavaram, J. DeVale, T. Diep and B. Black (Intel). <i>IEEE Annual Workshop on Workload</i><br><i>Characterization (WWC), Austin, Texas, November 2002.</i>                                        |     |
| [CTT99]        | Detailed Characterization of a Quad Pentium Pro Server Running TPC-D. Q. Cao, J. Torrellas, P. Trancoso, J. Larriba-Pey, B. Knighten, Y. Won. <i>International Conference on Computer Design (ICCD), Austin, Texas, October 1999.</i>                                                     |     |
| [ADH99]        | <b>DBMSs on a Modern Processor: Experimental Results</b> A. Ailamaki, D. J. DeWitt, M. D. Hill, D.A. Wood. <i>International Conference on Very Large Data Bases (VLDB), Edinburgh, UK, September 1999.</i>                                                                                |     |
| [KPH98]        | Performance Characterization of a Quad Pentium Pro SMP using OLTP Workloads. K. Keeton, D.A. Patterson, Y.Q. He, R.C. Raphael, W.E. Baker. ACM International Symposium on Computer Architecture (ISCA), Barcelona, Spain, June 1998.                                                      |     |
| [BGB98]        | <b>Memory System Characterization of Commercial Workloads.</b> L.A. Barroso, K. Gharachorloo, and E. Bugnion. <i>ACM International Symposium on Computer Architecture (ISCA), Barcelona, Spain, June 1998.</i>                                                                            |     |
| [TLZ97]        | The Memory Performance of DSS Commercial Workloads in Shared-Memory<br>Multiprocessors. P. Trancoso, J. Larriba-Pey, Z. Zhang, J. Torrellas. <i>IEEE International</i><br><i>Symposium on High-Performance Computer Architecture (HPCA), San Antonio, Texas,</i><br><i>February</i> 1997. |     |
| ©2004 Anastass | ia Ailamaki                                                                                                                                                                                                                                                                               | 124 |

Databases @Carnegie Mellon

#### References

#### Architecture-Conscious Data Placement

[SSS04] Clotho: Decoupling memory page layout from storage organization. M. Shao, J. Schindler, S.W. Schlosser, A. Ailamaki, G.R. Ganger. International Conference on Very Large Data Bases (VLDB), Toronto, Canada, September 2004.

[SSS04a] Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks. J. Schindler, S.W. Schlosser, M. Shao, A. Ailamaki, G.R. Ganger. USENIX Conference on File and Storage Technologies (FAST), San Francisco, California, March 2004.

2004 Anastassia Ailamaki

|              |                                                                                                                                                                                                                     | Databases<br>Carnegie Mellon |
|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------|
| /            | Architecture-Conscious Access Methods                                                                                                                                                                               | 5                            |
| [ZR03a]      | Buffering Accesses to Memory-Resident Index Structures. J. Zhou and K. International Conference on Very Large Data Bases (VLDB), Berlin, Germany, Septem                                                            | .A. Ross.<br>ber 2003.       |
| [HP03a]      | Effect of node size on the performance of cache-conscious B+ Trees. R.A. Har<br>J.M. Patel. ACM International conference on Measurement and Modeling of Computer<br>(SIGMETRICS), San Diego, California, June 2003. |                              |
| [CGM02]      | Fractal Prefetching B+ Trees: Optimizing Both Cache and Disk Performance.<br>P.B. Gibbons, T.C. Mowry, and G. Valentin. ACM International Conference on Manag<br>Data (SIGMOD), Madison, Wisconsin, June 2002.      | S. Chen, gement of           |
| [GL01]       | B-Tree Indexes and CPU Caches. G. Graefe and P. Larson. International Conference<br>Engineering (ICDE), Heidelberg, Germany, April 2001.                                                                            | e on Data                    |
| [CGM01]      | Improving Index Performance through Prefetching. S. Chen, P.B. Gibbons, and T.C. ACM International Conference on Management of Data (SIGMOD), Santa Barbara, (May 2001.                                             |                              |
| [BMR01]      | Main-memory index structures with fixed-size partial keys. P. Bohannon, P. McIlro<br>Rastogi. ACM International Conference on Management of Data (SIGMOD), Santa<br>California, May 2001.                           |                              |
| [BDF00]      | Cache-Oblivious B-Trees. M.A. Bender, E.D. Demaine, and M. Farach-Colton. Symp<br>Foundations of Computer Science (FOCS), Redondo Beach, California, November 200                                                   | oosium on<br>)0.             |
| [RR00]       | Making B+ Trees Cache Conscious in Main Memory. J. Rao and K.A. Ro<br>International Conference on Management of Data (SIGMOD), Dallas, Texas, May 2000                                                              | oss. ACM<br>).               |
| [RR99]       | Cache Conscious Indexing for Decision-Support in Main Memory. J. Rao and K<br>International Conference on Very Large Data Bases (VLDB), Edinburgh, the United<br>September 1999.                                    | .A. Ross.                    |
| [LC86]       | Query Processing in main-memory database management systems. T. J. Lehma J. Carey. ACM International Conference on Management of Data (SIGMOD), 1986.                                                               | an and M.                    |
| ©2004 Anasta | secia Ailamaki                                                                                                                                                                                                      | 126                          |

|                                         | References Databa                                                                                                                                                                                                                                  |     |
|-----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Architecture-Conscious Query Processing |                                                                                                                                                                                                                                                    |     |
| [MBN04]                                 | Cache-Conscious Radix-Decluster Projections. Stefan Manegold, Peter A. Boncz, Niels Nes, Martin L. Kersten. In Proceedings of the International Conference on Very Large Data Bases (VLDB), Toronto, Canada, September 2004.                       |     |
| [GLW04]                                 | Fast Computation of Database Operations using Graphics Processors. N.K. Govindaraju, B. Lloyd, W. Wang, M. Lin, D. Manocha. ACM International Conference on Management of Data (SIGMOD), Paris, France, June 2004.                                 |     |
| [CAG04]                                 | Improving Hash Join Performance through Prefetching. S. Chen, A. Ailamaki, P. B. Gibbons, and T.C. Mowry. International Conference on Data Engineering (ICDE), Boston, Massachusetts, March 2004.                                                  |     |
| [ZR04]                                  | Buffering Database Operations for Enhanced Instruction Cache Performance. J. Zhou, K. A. Ross. ACM<br>International Conference on Management of Data (SIGMOD), Paris, France, June 2004.                                                           |     |
| [SAA03]                                 | Hardware Acceleration for Spatial Selections and Joins. C. Sun, D. Agrawal, A.E. Abbadi. ACM<br>International conference on Management of Data (SIGMOD), San Diego, California, June, 2003.                                                        |     |
| [CHK01]                                 | Cache-Conscious Concurrency Control of Main-Memory Indexes on Shared-Memory Multiprocessor<br>Systems. S. K. Cha, S. Hwang, K. Kim, and K. Kwon. International Conference on Very Large Data Bases<br>(VLDB), Rome, Italy, September 2001.         |     |
| [PMA01]                                 | Block Oriented Processing of Relational Database Operations in Modern Computer Architectures. S. Padmanabhan, T. Malkemus, R.C. Agarwal, A. Jhingran. <i>International Conference on Data Engineering (ICDE), Heidelberg, Germany, April 2001.</i> |     |
| [MBK00]                                 | What Happens During a Join? Dissecting CPU and Memory Optimization Effects. S. Manegold, P.A. Boncz, and M.L Kersten. International Conference on Very Large Data Bases (VLDB), Cairo, Egypt, September 2000.                                      |     |
| [SKN94]                                 | Cache Conscious Algorithms for Relational Query Processing. A. Shatdal, C. Kant, and J.F. Naughton.<br>International Conference on Very Large Data Bases (VLDB), Santiago de Chile, Chile, September 1994.                                         |     |
| [NBC94]                                 | AlphaSort: A RISC Machine Sort. C. Nyberg, T. Barclay, Z. Cvetanovic, J. Gray, and D.B. Lomet. ACM<br>International Conference on Management of Data (SIGMOD), Minneapolis, Minnesota, May 1994.                                                   |     |
| 2004                                    |                                                                                                                                                                                                                                                    | 107 |



|          | References Databa                                                                                                                                                                                                                                                                  |     |
|----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
|          | Newer Hardware                                                                                                                                                                                                                                                                     |     |
| [BWS0    | 3] Improving the Performance of OLTP Workloads on SMP Computer Systems by Limiting<br>Modified Cache Lines. J.E. Black, D.F. Wright, and E.M. Salgueiro. IEEE Annual Workshop<br>on Workload Characterization (WWC), Austin, Texas, October 2003.                                  |     |
| [GH03]   | Technological impact of magnetic hard disk drives on storage systems. E. Grochowski and R. D. Halem <i>IBM Systems Journal 42(2), 2003.</i>                                                                                                                                        |     |
| [DJN02   | ] Shared Cache Architectures for Decision Support Systems. M. Dubois, J. Jeong , A. Nanda, <i>Performance Evaluation 49(1), September 2002</i> .                                                                                                                                   |     |
| [G02]    | Put Everything in Future (Disk) Controllers. Jim Gray, talk at the USENIX Conference on<br>File and Storage Technologies (FAST), Monterey, California, January 2002.                                                                                                               |     |
| [BGM0    | D] Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing. L.A. Barroso, K. Gharachorloo, R. McNamara, A. Nowatzyk, S. Qadeer, B. Sano, S. Smith, R. Stets, and B. Verghese. International Symposium on Computer Architecture (ISCA). Vancouver, Canada, June 2000. |     |
| [AUS98   | Active disks: Programming model, algorithms and evaluation. A. Acharya, M. Uysal, and<br>J. Saltz. International Conference on Architecture Support for Programming Languages and<br>Operating Systems (ASPLOS), San Jose, California, October 1998.                               |     |
| [KPH9    | A Case for Intelligent Disks (IDISKs). K. Keeton, D. A. Patterson, J. Hellerstein. SIGMOD<br>Record, 27(3):42–52, September 1998.                                                                                                                                                  |     |
| [PGK8    | A Case for Redundant Arrays of Inexpensive Disks (RAID). D. A. Patterson, G. A. Gibson,<br>and R. H. Katz. ACM International Conference on Management of Data (SIGMOD), June 1988.                                                                                                 |     |
|          |                                                                                                                                                                                                                                                                                    |     |
| 2004 Ana | tassia Ailamaki                                                                                                                                                                                                                                                                    | 129 |



Databases @Carnegie Mellon

# **Useful Links**

Info on Intel Pentium4 Performance Counters: ftp://download.intel.com/design/Pentium4/manuals/25366814.pdf

AMD hardware performance counters http://www.amd.com/us-en/Processors/DevelopWithAMD/

PAPI Performance Library

http://icl.cs.utk.edu/papi/

Intel® VTune™ Performance Analyzers http://developer.intel.com/software/products/vtune/

©2004 Anastassia Ailamaki