Title: High Quality Code Detection and Recommendation

 

Technical Area: Program Analysis

 

Background

Currently, there are a large number of systems in Alibaba, such as Taobao, Tmall, Alipay and so on, which include massive codes. The quality of codes is different because of the differences in the development level and experience. Actually, the high quality codes have a high value: much programmer effort and time has gone into developing, testing, debugging, annotating and fixing bugs in these programs, which can be used to improve the development efficiency and quality, and increase the productivity of the company. However, the existing codes are not utilized well. For example, a new developer may develop many duplicate and low quality algorithms. It is very helpful that the existing codes can recommend him/her the similar code fragment or the excellent programming style.

 

We have seen successful data-based applications of machine learning in many fields, including speech recognition, image classification, natural language understanding, and semantic search. The increased projects in Alibaba and other open source repositories such as GitHub, BitBucket provide a large corpus of high quality code ("Big Code").  In fact, there are many similarities between the applications and the program analysis tasks based on "Big Code". For example, code de-obfuscation can be seen as image de-noisification, where noisy pixels correspond to obfuscated program parts. Code completion corresponds to image completion and code documentation is akin to words describing an image. The similarities inspire us not only new applications on "Big Code", but formalizing these problems in similar terms as in vision. We can transfer advances such as efficient feature learning and a rich arsenal of inference techniques to programming language problems.

 

Target

In this project, we aim to develop an intelligent recommendation framework based on the "Big Code" in Alibaba and other open source repositories. The framework explores the valuable semantic information in existing codes automatically and provide the recommendation about similar code fragment or excellent programming style intelligently.

 

Related Research Topics