Title: New Generation OLAP Engine
Technical Area: Database
The traditional OLAP database is designed for structure data analytics for high performance based on Volcano mode. In the era of big data, besides providing SQL interface for structure data analytics, diverse data are generated including image, video, audio etc, moreover there are many new compiling technology including LLVM, SIMD etc, and more analytics scenarios such as approximate computing, which ask a new generation OLAP engine based on new tech platform for new scenarios.
The theme target is to building a new generation OLAP engine.
1. Vector Engine for image/video data analytics.
Besides structure data, more image/video data are generated and need to be processed in real-time within an analytic database with structured data to explore data value. How to build index efficiently, how to query in real-time is an urgent requirement. It is different to the traditional RDBMS engine and need new processing mode and algorithm.
2. Code Gen (LLVM) and SIMD for OLAP engine performance improvement.
Traditional RDBMS engine is based x86 instruction set, generally is implemented with C/Java language. In the recent years, new compiling Technology including LLVM & SIMD is used to improve system’s performance with less CPU instructors, which can be used in OLAP engine to boost the analytic performance.
3. Approximate query in analytics.
In traditional analytics scenarios, precise computing is mandatory, such as in enterprise data warehouse. But in many new big data scenarios, it is not necessary. How to explore PB/EB data in seconds is a challenge. So for such scenarios, approximate query is very useful. For example, in ads, a brand business in TAOBAO want to know the top 100k customers that may be interesting to its goods. If OLAP database can provide an 95% approximate result in sec, 100x faster than the precise result. The way is more useful to explore data value indeed.
Related Research Topics
- Vector Engine for image/video data analytics
- Code Gen (LLVM) and SIMD for OLAP engine performance improvement
- Approximate query in analytics