Title: Towards a New Garbage Collector for Alibaba’s Large Scale Java Applications

 

Technical Area: System Software

 

Background

OpenJDK actually provides a couple of garbage collector policies to Java developers, such as parallel, CMS and G1 collector, which are widely used in real production environment. In OpenJDK community, there are also some new garbage collector algorithms are in development, including Shenandoah and ZGC which are designed for large heap. All of above them are implemented for common Java workloads, none of them are specifically optimized according to the characteristics of applications in domain. Most of all the application in Alibaba are written in Java, with more than billion lines of Java code. Alibaba has customized most of its Java software based on the rich open-source ecosystem. These Java programs are developed for online trading, payments and logistics operations. In fact, Alibaba almost uses the full spectrum of Java technologies, including middleware (Apache Tomcat, Jetty, etc), big data (Spark, HBase, and Hadoop), modularity (OSGi), etc. we are looking forward to collaborating with researchers in memory management area and exploring new opportunities how we can optimize the performance of garbage collector in Alibaba’ real world workload environment.

 

Target

Based on the understanding of the characteristics of Alibaba’ real world Java workloads, the major goal of this research project is to

 

Related Research Topics

In order to minimize the Stop-the-World time, popular GC implementations usually take work in advance which are done parallel with mutator threads, which hence has impact on throughput of application. Most of existing GC technology don’t fully respect the object allocation pattern reflected in our real workloads. For example, generational pattern assumed by CMS, G1 collector is common, but the reality is it doesn’t fit for all our needs. We have opportunity to devise different GC algorithm according to different object allocation pattern.

 

As the new hardware technology is emerging, e.g. new CPU architecture and instructions, new 3D-point memory, etc. How we can leverage these new hardware features to facilitate the GC performance that is interesting area for further exploitation.