Topic Title: Exploring Innovative Cross Language Information Retrieval (CLIR) Algorithm

 

Technical Area: Information Retrieval, Machine Translation

 

Background

In the era of globalization, assist users to access, consume and understand the information across different languages becomes a critical job for Alibaba. To achieve this goal, it can be important to explore novel bilingual, or even, in many cases, multilingual, natural language processing (NLP) technologies from information retrieval viewpoint. Empowered by Cross-Language Information Retrieval (CLIR) technologies, users will not be constrained to their own language regime. However, this could be a challenging task, while the classical machine translation algorithms may not be directly employed to address the CLIR problems. For instance, when user entering a short query, the translated query could be noisy to represent user’s original information need, and such noise may pollute the retrieval result.

 

In the past decades, scholars proposed a variety of methodologies to address the CLIR tasks, e.g., machine translation approaches, dictionary-based approaches, latent semantic approaches, probabilistic-based approaches, and deep learning approaches. While each method has its own strength and limitation, there is no best solution for CLIR thus far. For instance, CLIR should/could not be limited to the popular languages, and we need the novel methods to address training data sparseness problems, e.g., parallel corpora/lexicon resource unavailable/sparse. Meanwhile, we will need to test the new CLIR model and attempt to use different retrieving strategies, combining it deeply with the MT model and the latest ranking models, e.g., Learning to Rank (L2R) or Deep Semantic Structure Model (DSSM), to improve the performance of the cross-language information retrieval engine.

 

We invite researchers who are either experts or are keenly aware of the challenges and opportunities that their fields bring to CLIR to work on the new framework that seamlessly integrates information retrieval and machine translation, which enables us to address the critical CLIR problems in a globalized environment.

 

Target

We propose a principled effort to investigate the new framework/method/ algorithm of CLIR that seamlessly integrates the cutting-edge IR and MT methods.

 

Related Research Topics

This meeting will provide the scientific and industrial community a dedicated forum to investigate and discuss the novel methods and algorithms for CLIR. We believe this meeting will be an excellent place to bridge the scholars from academic and industrial environments. Topics of interest include, but are not limited to, the following: