Topic Title: Machine-learning Based Data Placements and Job Scheduling Optimization
Technical Area: Machine Learning and Big Data Processing
As the business of Alibaba grows and becomes more complex, more and more data need to be stored and processed in MaxCompute. As a result, more and more resources are required for processing a large amount of data at any time of the day, which is becoming a big problem.
On the other hand, there are already tens of thousands of machines serving in several MaxCompute clusters, which are located in different cities of China and even in different countries. It is the most important thing for us to improve the efficiency continually. Developing better scheduling methods is one of the ways to do this.
As mentioned above, we have several clusters in different datacenters. A critical problem is that in which cluster we should place a table, schedule and run a job, especially in the case when a job joins two tables distributed in different clusters of different cities. We call this problem Geo-distributed scheduling. A better scheduling method can reduce both the job completion time and the probability of failures.
Another important problem is about the fairness of scheduling. Some large jobs, which require lots of resources, may wait for a long time to be scheduled. It will cause serious resource under-utilization if we stop scheduling small jobs in order to wait for enough resources to run a large job. This is called All-or-Nothing scheduling. Effective solutions are needed to make better balance between utilization and fairness.
Geo-distributed scheduling: A cross-datacenters scheduling approach needs to schedule jobs with data (both inputs and outputs) stored in multiple Geo-distributed datacenters. A simple example workload is to join two datasets from two datacenters. We aim to design an auto-learning Geo-distributed scheduling method, which can show us the best cluster to place a certain MaxCompute-Project, and maintain the data copy strategy automatically.
All-or-Nothing scheduling: We expect to design a scheduling algorithm, which can make better balance between utilization and fairness. Fox example, if we can estimate the jobs’ arrival time and completion time, we can improve the resource utilization when we have made resources reservations for huge jobs.
Related Research Topics
We invite researchers who have strong interests and expertise in the following domains:
Geo-distributed scheduling: This problem can be further divided by two sub-problems, although there is interconnect among them:
1) Project placement problems. Considering the size of a Project, it is quite expensive to move from one datacenter to another. So it is more a static planning problem. And at the moment, it is done manually using a very basic strategy.
2) Data copy strategies, such as to read a remote table directly or to read from local cluster after copying from a remote cluster, or when and what pattern (the full table, a partition or only several columns) should be copied when it is necessary.
Both the information about the clusters (such as bandwidth, storage space, resources usage patterns and so on) and information about the jobs (such as the dependencies, data size and placement, expected completion times and so on) should be considered in order to design such a Geo-distributed scheduling method.
All-or-Nothing scheduling: There are two straightforward choices to this problem:
1) Stop scheduling small jobs until the large job has enough resources, which makes low resource utilization.
2) Continue the scheduling of other jobs, which may starve the large job. Possible solutions to this problem:
i) Predict the arrival time of large jobs using Machine-learning methods.
ii) Find out the critical path of jobs, and estimate the Job Completion Times. If we can estimate the JCTs, we can improve the resource utilization when the large job is waiting for enough resources (by filling the gap with predictable small jobs).
iii) Make resource reservations for large jobs, or rearrange the schedule of the jobs according to their relationships, sizes, and expected complete times.
Detailed and dynamic resource management and scheduling: Currently there are two different scheduling types in MaxCompute, which are process scheduling and service scheduling. Both of them are synchronous. Resources are pre-allocated before the tasks begin to run and then they do not change until the tasks terminate. As a result, there are gaps between allocated resources and real used resources, or OOMs happen. Thus, detailed and dynamic methods to manage jobs’ resources are needed. We also need to improve scheduling methods to reduce the E2E time of jobs’ running.