Topic Title: Cross-Lingual Knowledge Transfer for Large Scale Product Graph

 

Technical Area: Transfer Learning, Knowledge Graph, Natural Language Processing

 

Background

Knowledge graph is first introduced by Google to significantly enhance the value of information returned by Google searches. It automatically gathers and merges information across the Internet into a knowledge base capable of answering direct questions. Within past decades, numerous knowledge graph initiatives including Facebook Graph, Microsoft Satori, IBM Watson, Amazon Product Graph have been launched and applied successfully later in Question & Answering, decision support and many other real business use scenarios.

 

Taobao is pioneered in Chinese e-Commerce Knowledge Graph construction and application leveraging billions of product information accumulated over years within its e-commerce ecosystem in China market. But the recent expansion to global markets brings new challenges of serving 10 million active users around 200+ countries in hundreds of languages and raises huge demand of localized knowledge graphs to support local.

 

We invite researchers who are either experts or are keenly aware of the challenges and opportunities that their fields can bring to cross lingual knowledge representation and transfer, allowing a generalized solution to move from Chinese Product Graph to support other countries with minimum label efforts.

 

Target

However, we notice that current knowledge graph is relatively scattered, e.g. the development of knowledge graph for a new language is either separated from the work of other languages or simply copied from other languages. The whole framework for cross lingual knowledge transfer is lacking.

 

We propose a principled effort to investigate a full solution to move existing large scale Chinese Product Graph to other languages to support market localization.

 

Related Research Topics

Machine translation is the most intuitive approach when moving from one language to others. However, information loss is unavoidable during machine translation. In addition, it cannot uncover all reasonable expressions, which are required by some real use cases. This poses challenge as well as opportunity to the capability of cross lingual generalization and representation.

 

Except the language issue, another practical challenge lies on fusing information from different domains/markets. The knowledge inferences based on regulations and consumer understandings obtained from one market may not hold in other countries so simple replications do not work and domain adaptation is needed. Besides, it is also common in global e-commerce that the same product can be entitled with different names following brand’s market differentiation strategies. Thus, entity resolution and linking is mandatory when integrating knowledge extracted from different markets.