Title: Machine Learned DB Components
Technical Area: Database
Database management system (DBMS) includes several components (like index, SQL optimizer, cache algorithm) that are critical for the system performance and storage cost. For example, the data volume is huge at Alibaba and it requires the DMBS to be able to identify and predict hot/cold data. Based on the prediction, hot/cold data can be separated and stored in different layers. Another example is the traditional cache algorithm (like LRU) is not an appropriate way to adapt different scenarios. It is rule based and caches data item without prediction. So the cache miss is not stable under different workload. And the maintenance for the cache structure is high as well as the storage cost.
The goal is to explore which components of a database system can be replaced by machine learning algorithm. Specifically, the projects will focus on (but not limited to):
1. How we can identify and predict hot/cold data and store them with both access and storage efficiency.
As mentioned above, Alibaba has huge data that is touched by different access ratio. The data is supposed to be divided and stored in different layers to save storage cost. Several challenges should be addressed: 1) identify and predict hot and cold data which is the fundamental preparation for the storage and data layout; 2) tier and store hot and cold data in different layers (like DRAM, NVM, SSD, HDD) to save storage cost; 3) guarantee the access efficiency for online transaction processing including lookups, inserts and updates.
2. How we can replace cache algorithm with machine learning model.
Specifically, for LRU algorithm, the storage cost and maintenance is high. However, for machine learning algorithm, it requires training data as the preparation to perform classification and prediction, which is an issue for the initiation of the system since there is no history data with the startup. The project is supposed to handle such practical issues and replace LRU for Alibaba OLTP DBMS.
3. How we can replace index structure in DBMS and extend the design to support inserts and updates.
While there is an explosion in work on machine learning with index, the technical requirements to build models for Alibaba OLTP DBMS are quite different, where inserts and updates occur frequently. Most noticeably, we need to co-design the models with the systems component to ensure the stringent performance requirements, learning objectives, and space requirements.
Related Research Topics
- ML for system. The intersection of artificial intelligence, machine learning, and systems design.
- Workload (especially SQL workload) prediction and analysis.
- Time series data analysis.
- Clustering and classification for real time analysis.
- Reinforcement learning for online workload adaptation.