Code-switch speech recognition
Along with globalization, it is common to have frequent interactions among people from various countries and areas. Foreign words that are not belong to their original language are often used in people’s daily life and work. When more than one language is used in a conversation, it is referred as code-switching speech. For example, during a technical sharing session conveyed in Chinese, people often use English terminologies, and product names may often be referred in other language in commercial trading. Facing this challenge, a practical speech recognition system must not only be able to recognize the main language used but also be able to recognize those foreign words from other languages.
Data driven approach is one of the most commonly adopted methods in building a speech recognition system. It requires a large amount of labeled speech data to train acoustic model, which will model the acoustic characteristics of the language. A large collection of text data is also demanded to train language model. However, the code-switching speech and text data collection is a very challenging task. The code-switching data has more variability than single language data. The code-switching is hardly predictable as people have different speaking style, and their education background and proficiency in the second language are also some of the factors that affecting the production of the code-switching speech.
With the increasing demand for recognizing code-switched speech in speech recognition, we would like to build a framework that is able to build a code-switching speech recognition system which is able to recognize foreign words and at the same time keep the recognition performance for the speech in the main language.
Related Research Topics
Current attempts to tackle the code-switching problem can be categorized into two categories.
- Pronunciation dictionary refinement: the objective is to accommodate words and phrases in other languages. Various techniques can be explored, like proper application of the linguistic knowledge and data driven pronunciation generation.
- Acoustic modelling: advanced acoustic modelling methods can be adopted, for example end-to-end modelling method will mitigate the dependency on pronunciation dictionary.
- Language modelling: how to model the language characteristics of more than one language in a unified model is worth to be explored such as artificial data augmentation and cross-lingual word-embedding.