2024 IEEE 40th International Conference on Data Engineering (ICDE)
Download PDF

Abstract

Indexes in a database system can consume a large amount of memory. When they grow too large to be entirely held in the memory, selected portions of the indexes have to be unloaded to the secondary storage. There are a number of challenges in the design of an extensible index spanning memory and disk. First, the designs of in-memory portion and on-disk portion of the index must be decoupled so that the best choice for each device can be independently made. Second, selective unloading of in-memory portion to the disk must be carefully designed to maximize chance of memory access and to produce the most disk-friendly I/O access. Third, the strategy for index reloading from the disk and retaining in the memory must be optimized for the highest memory efficiency. In this paper, we proposed a memory-disk-spanning index design, named IndeXY, to effectively address the challenges. IndeXY distinguishes itself by being a framework that allows separate adoption of an in-memory index design and an on-disk data organization and access scheme that are deemed most efficient to its workloads. Instead of being just another one-size-fit-all index across memory and disk, the framework provides well-designed mechanisms and policies to integrate a selected in-memory index (Index X) and an on-disk index (Index Y) into one extensible index (IndeXY). We have implemented IndeXY with alternative in-memory indexes (ART tree or B+ tree) and alternative disk indexes (LSM tree or B+ tree). As an anecdotal example, experiments show that integrating the ART tree and an LSM tree in the framework can lead to a throughput improvement by as high as an 8.6X on a TPC-C workload over LeanStore that uses B+-tree indexes in the memory and disk, and can improve performance for almost all YCSB workloads.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles