Research Topic: Multimodal Speech Interaction Technology
Technical Area: Speech, Artificial Intelligence, Interaction
Human-machine interaction is a bridge which connects users and Internet services and contents. It is one of the most fundamental and sophisticated Internet technologies nowadays. Human-human interaction naturally relies on speech, facial expression and gesture. However, the current human-machine interaction manner is still far behind in terms of naturalness. Recall that the breakthroughs in interaction technology always brought tremendous changes in industry. For example, keyboard and mouse made possible the graphical user interface, and touch screen technology started the era of smart phones. New interaction technologies will be the key to bring the new experience of Internet access in future. Recently, new speech interaction modalities have emerged.
However, the new technology is still immature, and the resulted experience is not as natural as expected. For example, a wake-up word is usually required. It is unlikely to work well in a noisy environment and cannot understand spontaneous spoken language, either. Besides, it is very sensitive to speech recognition error.
In this topic, we are interested in developing the multimodal speech interaction, i.e., integrating other modalities to speech. By that, we expect the interaction experience could be more robust and natural.
Suggested Research Topics:
- Sensor and hardware technologies: acoustic design, microphone hardware, optical camera, depth camera, AI chip for multimodal interaction
- Signal processing technologies: microphone array signal processing, image / video enhancement, etc.
- Perception technologies: speech recognition (far-field and robust speech recognition), computer vision (face recognition, mouth tracking, motion detection), multimodal speech recognition (e.g., speech + lip-reading)
- Spoken language understanding and dialog system: dialog system optimized for spoken language interaction, multimodal cognition (e.g., topic understanding, emotion, rich context understanding)
- Feedback: emotional speech synthesis, other feedback methodologies in interaction
- Engineering platform: engineering platform and technologies that supports highly natural interaction experience (e.g., low latency response)