Theme Title: Actor Model Task in Big Data Processing

 

Technical Area: Big Data Processing

 

Background

Nowadays the tasks run in a big data system has relatively stable and predictable utilization of each resource throughout their execution (e.g., Map tasks use CPU, while Reduce tasks use network). And existing work on scheduler (e.g., Yarn, Fuxi) can pack these tasks according to their resource estimation to achieve maximum resource utilization.

 

However, actor model tasks, which refer to tasks that might be much short-lived, run for many iterations and within each iteration there are dynamic utilization of multiple resources, become more and more important along with the rising tide of machine learning. Typical workloads that consist of actor model tasks include machine learning, graph analytic, and common scientific computation (matrix or tensor calculation), for example, machine learning tasks are typically refined iteratively until convergence in the training process, and each iteration may only last less than a second but uses CPU/GPU, memory, local disk and network.

 

The utilization pattern for actor model tasks is both dynamic and fast paced. In fact existing big data systems are very difficult to give timely response to such rapid changes in resource utilization at short intervals.

 

We invite researchers who are experts in designing distributed systems, to reveal the best design of scheduler or engine for actor model tasks, and investigate the possibility of introducing such schedulers into MaxCompute.

 

Target

We have noticed that there are new systems (e.g., Ray Project, Dask) designed for only actor model tasks. Briefly, the former one targets on reinforcement learning and the latter one targets on more common scientific computation. And there are several difference in their system design, for example, the former one introduces local/global 2-level scheduling while the latter one sticks on a much more traditional centralized scheduling way. All these systems are in their early stage at the moment.

 

We propose an investigation on existing actor model task oriented systems, figure out the advantages and the disadvantages, and form a new scheduler or engine design for running actor model tasks in big cluster.

 

Furthermore, this new scheduler or engine handles traditional tasks (e.g., Map/Reduce task) as well, which should achieves better resource utilization.

 

Related Research Topics

Actor model tasks becomes more and more important in big data processing. It concerns machine learning, graph analytic and scientific computation. Problems in each domain are complex enough to lead to diverse system design. For example, machine learning might emphasize iteration latency and calculating matrix multiply in scientific computation cares more on resource utilization to avoid serious long tail issue.

 

We accept related topics include but not limited to: