Topic Title: The forecast of large-scale heterogeneous SQL execution time

 

Technical Area: Large-scale Heterogeneous SQL 、 Semi-Supervised Learning 、 Reinforcement Learning

 

Background

1. Alibaba has massive EB to ZB level data

2. Data Coverage: e-commerce, finance, entertainment, catering, cloud computing, logistics, IOT, etc.

3. Data Enable: data enables business development, and has a great impact on business

4. The Key Point: from application to data storage interaction standard language: SQL statements and SQL execution

5. Important Value: forecast SQL execution time, optimizing index, fusing, dynamic caching, and dynamic routing. Improve query efficiency, calculate cost, and ensure the stability of the whole Alibaba business ecosystem

 

SQL status at Alibaba

1. SQL involves various heterogeneous data sources such as mysql, hbase, and MaxCompute

2. Total amount of SQL invoking: billions per day

3. The impact of optimization and cost reduction is significant: SQL execution time reduction by one second can save 1KCU computation cost by optimizing 0.1% per year, and optimization by 1% can save 10 KCU per year

 

Target

1. Quantity optimization and upgrading: the number of SQL engaged from post recognition to beforehand recognition is increased by 50%

2. A significant reduction in the cost of computing CU: decrease of 1 KCU per year

 

Related Research Topics

1. Large scale heterogeneous SQL analysis: real-time monitoring and real-time computation in variable execution environment status.

2. Complex learning of SQL execution time prediction research: the supervised learning of large-scale data such as classification, regression, and unsupervised learning such as time series analysis, anomaly detection, and so on, and the correlation analysis of the mutual influence of the SQL and environment and data volume based on the knowledge graph. Or semi- supervised learning such as let learners not rely on external interaction but automatically use unlabeled SQL execution time samples to improve learning performance.

3. SQL performance enhancement technology research: Based on reinforcement learning and SQL execution time prediction, the dynamic adjustment of SQL execution environment and state can be carried out ahead of time, so that the performance of SQL can be enhanced.