Building the World's Best Chatbot - 2018 CIKM Analyticup
    2018-06-05    Alibaba Tech

As part of its strategy to help develop chatbot technologies around the world, Alibaba is hosting a contest at CIKM 2018 to find the globe’s best algorithm engineers.

  

In October 2018, the Italian city of Turin plays host to the International Conference on Information and Knowledge Management (CIKM), the world’s leading conference on the management of knowledge, information, and data. The event brings together experts from various fields, and as participants this year, the AliMe algorithm team has been preparing a data mining contest, the CIKM AnalytiCup, to be held during the conference.

 

Last year, the contest was organized by Alibaba Cloud and the Meteorological Bureau of Shenzhen Municipality, and by the end, the candidates had created a model that could accurately forecast precipitation. With the best score being an RMSE of 10.997, which was 25% better than the baseline provided by the organizers, the model has led to an increase in the short-term accuracy of precipitation forecasts. This year, CIKM invited Alibaba’s AliMe team to organize the contest.

 

What is AliMe?

 

AliMe is a series of service chatbots used in e-commerce that fall under Alibaba Group's Intelligent Services Division. Driven by data and technology, this department is committed to improving user experience and problem solving efficiency through artificial intelligence (AI) algorithms. Currently, four core systems – smart dialogue, smart assistance, smart decision-making, and smart management – are powered by AI technology and are helping in the intelligent upgrading of services. In 2017, through natural language processing, knowledge graphs, and deep learning technologies, the AliMe team brought their AliMe family of products to the world stage. Supported by the Alibaba Cloud ecosystem, AliMe has energized millions of businesses and SMEs around the globe, in countries where English, Russian, Portuguese, Spanish, Indonesian, and Thai are spoken.

 

In the past year, the AliMe team has carried out extensive research in both text matching and transfer learning, serving not only real customers in the industry, but also participants at leading international conferences, such as ACL, WSDM, CIKM. For example, at WSDM 2018, the AliMe team proposed a text-matching model within a transfer learning framework and attempted to solve the cold-start problem of text-matching models when using a fresh scenario with little labeled data.

Read more about the technology behind AliMe: https://102.alibaba.com/detail?id=36

 

What is Tianchi?

 

Tianchi is a platform connecting businesses and governments with data scientists globally to provide solutions to the toughest problems across industries. Established in 2014, Tianchi has become one of the biggest data science communities with 200,000 users globally and has a proven track-record in Machine Learning and Artificial Intelligence.

Learn more about Tianchi: https://tianchi.aliyun.com/

 

Why did we choose the topic "cross-lingual short text matching" for the competition?

 

With the arrival of AI, the development of Internet companies has been driven more and more by big data and algorithms. Chatbots are a good example of how AI is helping companies reduce labor costs and improve user experience. Over the past few years, chatbots have become a mainstay on many companies’ websites and vast sums of money have been invested in them. AliMe, a personal assistant used in e-commerce, not only assists users in seeking information, but also helps them select products and book flights. From Microsoft's Xiaoice and Amazon's Echo, to customer service robots in various vertical industries, chatbots are thriving – and while they may all look different, they have a lot in common.

 

There are three different types of chatbot – retrieval based, generative based, and hybrid – depending on how responses are generated. In a retrieval based chatbot, the text-matching model is crucial to the ability of the chatbot to solve users’ problems. The system receives a user's question, finds a matching question and answer pair on the FAQ data set, and passes the manually-written answer back to the user. To build a text-matching model, a data set called FAQ needs to be collected manually. To train the model, a large scale of manually labeled data, consisting of users’ questions and question-answer pairs in the FAQ, also needs to be collected. However, for minority languages, this is not realistic. On the one hand, there is a lack of labeled data. On the other hand, NLP R&D engineers who understand minority languages are few and far between. Both of these aspects limit the research into and development of chatbots.

 

Alibaba Group started to accelerate its internationalization last year. Not only has it now expanded to over 120 countries and regions through AliExpress, but it also acquired or invested in e-commerce companies around the world. Examples of this include acquiring Lazada, the largest e-commerce provider in Southeast Asia, and investing in Indian e-commerce payment system Paytm.

 

Looking ahead, Alibaba Group is soon expected to be providing services for 2 billion users. With this in mind, the AliMe team – the largest service team in the Alibaba Group – has to start optimizing the way it serves overseas customers, including focusing on how to solve customer service problems. However, deep learning systems generally require a large amount of labeled data. For some minority languages, such as Indonesian, Thai, or Filipino, there is a lack of large-scale labeled data and corresponding algorithm developers. So, how do we provide high-quality services to customers in countries and regions that use these minority languages? Alibaba has designed a solution and hopes to transfer the capabilities of languages rich in data resources to resource-poor languages.

 

We look forward to your participation

 

By hosting this contest, we hope not only to foster academic exchange, presenting a problem that needs to be solved in real applications, but we also hope to seek out talent to help improve the capabilities of chatbots in general. We want customers to be able to choose their language, with speakers of minority languages offered the same high-quality services as English-speakers.

 

There are rules to the contest, such as not paying too much attention to machine translation technology, as we want participants to focus on a language's own characteristics and transfer capabilities, instead of using external resources. Participants are encouraged to use their imagination and creativity and propose various models and solutions.

 

The contest is open for anyone who would like to take part. Based on our experience from last year, we hope to see many leading algorithm engineers from universities, research institutes, and Internet companies.

 

To register for the contest or learn more about it, please go to:

https://tianchi.aliyun.com/markets/tianchi/CIKM2018?spm=a2c41.11603215.0.0