Title: Machine Learning Empowered Query and System Optimization
Technical Area: Database
Query optimization is a typical NP-hard problem, where the conventional database optimization technologies, e.g., dynamic programming, genetic algorithm, etc., are applied to select the optimal access plan of a given query in a bounded time. Cost estimation is usually the foundation to make such a decision. However, cost estimation is determined by the cardinality estimation which heavily replies on the available statistics. In modern database systems, statistics are collected and utilized in a limited way where data skew and correlation are not handled well. As a result, the selectivity estimation of predicates, single or compound, is not accurate enough that causes cost estimation error and thus results in sub-optimal query plan and query performance problems.
Furthermore, the queries nowadays get more and more complex, along with the database system evolution where new hardware, new data sources, and new computation models are evolved in the database systems. The conventional query optimization approaches become less capable to handle such complex scenarios.
Besides query optimization, system performance also relies on many other sub- systems, e.g., workload management, resource management, database physical design, etc. Usually tuning a database system requires extensive expert experiences on many such sub-systems. However, manually tuning by such an expert is less and less feasible with thousands of database instances provided by database services on cloud, and thus raises huge challenges to cloud database service providers.
Meanwhile, artificial intelligence especially machine learning technologies in recent years reveal the promising direction in solving traditionally challenging problems in large number of domains. Therefore, in this collaborative research project the opportunities and appropriate approaches are to be explored to solve challenging query and system optimization problems by exploiting evolving machine learning technologies.
With this collaborative research project, we aim to solve the following problems:
- Better selectivity (i.e., cardinality) estimation, given benchmark queries and certain specific scenarios.
- Machine learning based incremental calibration of cost model that generates better cost and resources estimation at operator as well as workload level, with given benchmarks and specific scenarios.
- Machine learning based resources and performance prediction that produces higher accuracy of prediction, with given benchmarks and specific scenarios.
- Better query plan selection with given benchmarks and specific scenarios.
- Automatic resources optimization at workload and system levels, with improved system resources utilization, and reduced cost on system computation and I/O (local as well as network).
- Autonomous system, including system and query auto tuning, system auto resources management, and auto system model calibration, auto physical design modification, etc.
- Architecture design to integrate machine learning into MPP DB system.
One or more patent and/or paper publication are expected in solving each of above problems. Detailed implementation are expected upon the mutual agreement between research institute and Alibaba Group.
Related Research Topics
- Cardinality estimation by utilizing machine learning technologies
- Machine learning based incremental calibration of cost model
- Intelligent query plan selection
- Machine learning based system performance and resources prediction
- Automatic resources optimization of workload and DB system
- System and query performance auto tuning
- Autonomous DB system
- Unstructured data computation and optimization