Behind the Chat: How E-commerce Bot AliMe Works

How can you tell whether your shop assistant is a person or a robot?

The most significant innovation in AI these recent years, smart chatbots, al personal assistants, are only a glimpse of what the future holds. Technology companies such as Google, Facebook, Microsoft, Amazon and Apple are at the forefront of personalized interactive products where intelligent human-computer interactions (IHCI) technology will continue to play a central role in automated messaging, task assistance and the Internet of Things. As the market matures, chatbots are becoming more and more specialized according to their specialized intended purposes, such as customer service, entertainment, personal assistance, or education.

Launched in July 2015, AliMe is an IHCI-based shopping guide and assistant for e-commerce that overhauls traditional services, and improves the online user experience. During 2016’s Double 11 shopping festival, AliMe successfully responded to 6.43 million queries, and accounted for 95% of the customer services rendered by Alibaba’s e-commerce platforms.

Intelligent human-computer interaction (IHCI) systems are commonly referred to as chatbots or bot systems. Natural language understanding (NLU) is the very foundation of IHCI, a dialogue system that processes users’ questions and generates answers in natural language. This in itself is quite a feat as computers are built on logic-heavy cognitive bases that are not suited for processing dynamic human languages.

The first step in creating AliMe required setting up abstract frameworks for different fields, strata, and scenarios.

Standard IHCI flow

. . .

AliMe’s Stratified Framework

The majority of intelligent matching processes in use today fall into three main categories- rule-based matching, retrieval, and DL. The technology behind AliMe is based on a combination of all three.

The dialogue system is thus divided into the following strata:

1.Intention identification stratum

This stratum identifies the underlying intention for each message, classifying them and then extracting their attributes. Since intentions determine the subsequent domain identification flow, the intention stratum is a necessary first step in initiating contextual and domain data model processes.

The technical framework for AliMe’s intention and matching stratification

2.Answering stratum:

Questions are matched and identified to generate answers; AliMe’s dialogue system employs three answering strategies according to different intentions:

a. FAQs such as "?" trigger a query on knowledge graph or retrieval model.

The knowledge graph is constructed by mining entities and phrases, the relations of which are predefined, from the vast pool of data available. Though knowledge graph-based methods accurately identify answers, they also accrue higher maintenance costs and looser initial data structures AliMe's Q&A design overcomes this by integrating traditional retrieval models.

Mining data for creating knowledge graphs

b.Tasks such as "I’d like to book a one-way flight from New York to Paris for tomorrow" can be solved by the intention commitment + slot filing matching or deep reinforcement learning (DRL) model.

c.Chitchatting, such as "I'm in a bad mood", pulls up a method that marries the retrieval model with deep learning (DL).

The chitchat domain mainly involves two kinds of models- the retrieval-based model and the deep generative model. The former makes selections from a fixed corpus of answers relevant to a given query, while the latter is more advanced, generating answers without relying on any corpus. The integrated merits of the two models form the core of AliMe's chat engine. First, the candidate data sets are brought up using the traditional retrieval model; then, candidate sets are re-ranked through the Seq2Seq model; the top answer candidate is chosen when the ranking score is higher than the preset threshold, failing which the seq2seq model is activated to generate an answer.

AliME’s chatting module flow

. . .

The Deep Learning Practices of AliME’s Intention Identification

AliME’s identification and extraction of intentions is reliant on the classification results. AliME incorporates features of both traditional textual and user behaviors to analyze incomplete user intentions.

The user behavior-based DL model’s classification of intentions

During the process of creating DL-based prediction systems, the team came up with two specific modeling options. The multi-classification model, though faster, required retraining with every new label added to the class family, whereas the binary classification model, a clear underperformer which needed constant dichotomization, allowed for unfettered field expansions on the original platform. It was apparent that both models, with their specific drawbacks and strengths, serve very distinct sets of scenarios.

AliME’s DL-based intention classification embeds behavioral factors and textual features, and concatenates different vectors before multi-classification or binary classification processing. Textual features can be represented as bag of words or word embedding.

Classification of intentions by DL accounting for user behavior

How AliMe Works as an Intelligent Shopping Guide

Intelligent shopping guide systems interact with users to analyze their intentions with the goal of providing a better shopping experience. The interactions serve two main purposes- helping machines understand user intentions, and optimizing recommendation rankings and the interactive process itself.

Standard technical framework for the AliMe intelligent shopping guide

Intelligent shopping guide systems are created to deduce what users want, and the attributes of those goods. This brings with it a new set of issues:

Challenge 1: Users tend to express themselves in short sentences, therefore, identifying intentions accurately requires multiple rounds.

Challenge 2: Users often interact inconsistently, detailing or modifying parts of their intentions.

Challenge 3: Shoppers’ intention may not always be semantically correct or accurate.

Challenge 4: Relations between intentions are very complex.

AliME can accommodate phrasal expressions, intention boundary switches and logical modifications owing to the intention stack and product knowledge graph Due to the vast variety of goods, knowledge graphs are combined with semantic indexes to make identification extremely effective.

Under intelligent shopping guide scenarios, category management consists of category identification and calculation of category relations.

Category relations framework

Category Identification

AliME’s identification plans are built on knowledge graphs, semantic indexes and DSSM (deep semantic similarity model). The semantic indexes are built on textual information as well as search and click data. Similarities between word segmentations and candidate categories are calculated using word embedding,

AliME’s goods identification plan based on semantic indexing and DSSM

Calculation of Category Relations

The calculation of category relations addresses intentions arising from the intelligent shopping guide. Two important examples of these relations are hyponymy relations and similarity relations.

For example, when a user first intends to buy some clothes but later changes mind to buy a cup, the attributes associated with clothes should not be passed down to the cup. On the other hand, if the user changes his mind and buys a shirt, a hyponym of clothes, the attributes associated with clothes should be passed down to the shirt.

Hyponymy relations can be calculated through the following two options:

a） Knowledge graph-based relation calculation

b）Extraction from users' queries

Similarity relations can be calculated through the following two options:

a）Use of the same hypernym: For example, both Xiaomi and Huawei share the phrase ‘mobile phone’ as a hypernym

b）Semantic similarity based on embedding computation

. . .

The Road Ahead for IHCI Technologies

Though the technological progress observed in the 21^st century is significant, the current phase of AI and its application are definitely nascent. Fields ranging from perception to cognition require vast levels of improvement in order for IHCI to continue enabling industry. Efforts in gathering data and refining knowledge graphs will contribute to IHCI’s development. Task-oriented bots across industrial verticals are poised to provide explosive economic growth; interactive bots targeted at open domains, however, require higher scrutiny and experimentation in the long-term future. Following its successful adoption in and voice recognition, DL will continue to be applied in the domain of natural language processing (NLP).

Fortunately, the urgency of development in AI has been met with equal enthusiasm from various stakeholders, from private enterprises to governments, and from academic circles to industrial communities. Given this, we can expect IHCI to fulfill our expectations and visions for the near and long-term future, where even the wildest of science fiction movies and books pale in comparison to the actualized level of technology.

References:

[1]: Huang P S, He X, Gao J, et al. Learning deep structured semantic models for web search using click through data[C]// ACM International Conference on Conference on Information & Knowledge Management. ACM, 2013: 2333-2338.

[2] Minghui Qiu and Feng-Lin Li. MeChat: A Sequence to Sequence and Rerank based Chatbot Engine. ACL 2017

[3] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of ICLR 2015

[4] Matthew Henderson. 2015. Machine learning for dialog state tracking: A review. In Proceedings of The First International Workshop on Machine Learning in Spoken Language Processing.

[5] Mnih V, Badia A P, Mirza M, et al. Asynchronous Methods for Deep Reinforcement Learning[J]. 2016

[6] Li J, Monroe W, Ritter A, et al. Deep Reinforcement Learning for Dialogue Generation[J]. 2016.

[7] Sordoni A, Bengio Y, Nie J Y. Learning concept embeddings for query expansion by quantum entropy minimization[C]// Twenty-Eighth AAAI Conference on Artificial Intelligence. AAAI Press, 2014: 1586-1592.

. . .

Alibaba Tech
First hand, detailed, and in-depth information about Alibaba’s latest technology → Search “Alibaba Tech” on Facebook