Topic Title: Exploring Innovative Cross Language Information Retrieval (CLIR) Algorithm
Technical Area: Information Retrieval, Machine Translation
In the era of globalization, assist users to access, consume and understand the information across different languages becomes a critical job for Alibaba. To achieve this goal, it can be important to explore novel bilingual, or even, in many cases, multilingual, natural language processing (NLP) technologies from information retrieval viewpoint. Empowered by Cross-Language Information Retrieval (CLIR) technologies, users will not be constrained to their own language regime. However, this could be a challenging task, while the classical machine translation algorithms may not be directly employed to address the CLIR problems. For instance, when user entering a short query, the translated query could be noisy to represent user’s original information need, and such noise may pollute the retrieval result.
In the past decades, scholars proposed a variety of methodologies to address the CLIR tasks, e.g., machine translation approaches, dictionary-based approaches, latent semantic approaches, probabilistic-based approaches, and deep learning approaches. While each method has its own strength and limitation, there is no best solution for CLIR thus far. For instance, CLIR should/could not be limited to the popular languages, and we need the novel methods to address training data sparseness problems, e.g., parallel corpora/lexicon resource unavailable/sparse. Meanwhile, we will need to test the new CLIR model and attempt to use different retrieving strategies, combining it deeply with the MT model and the latest ranking models, e.g., Learning to Rank (L2R) or Deep Semantic Structure Model (DSSM), to improve the performance of the cross-language information retrieval engine.
We invite researchers who are either experts or are keenly aware of the challenges and opportunities that their fields bring to CLIR to work on the new framework that seamlessly integrates information retrieval and machine translation, which enables us to address the critical CLIR problems in a globalized environment.
We propose a principled effort to investigate the new framework/method/ algorithm of CLIR that seamlessly integrates the cutting-edge IR and MT methods.
Related Research Topics
This meeting will provide the scientific and industrial community a dedicated forum to investigate and discuss the novel methods and algorithms for CLIR. We believe this meeting will be an excellent place to bridge the scholars from academic and industrial environments. Topics of interest include, but are not limited to, the following:
- Use multilingual embedding for joint learning and make full use of resource-rich languages to solve NLP tasks in resource-poor languages.
- Solve the current problem of translation ambiguity by exploring a new CLIR model. Optimize Learning to Rank (L2R) or DSSM performance with MT N-Best output to improve the retrieval results. By combining the joint model of L2R and MT models, we hope to optimize the performance of CLIR and improve the quality and relevance of retrievals.
- Propose novel methods/algorithms to better characterize user information needs for CLIR systems.
- Address personalization problem for CLIR systems.