Fraunhofer upHear Voice Quality Enhancement enables far-field voice commands and voice calls on Mail.ru’s first smart speaker

Full-duplex communication enables users to make voice calls with the Capsula smart speaker.

The Mail.ru Group, a major Russian online services provider that hosts the region’s most popular social networks VKontakte (ВКонтакте) and Odnoklassniki (Одноклассники), has launched its first smart speaker: Capsula (Капсула). The Fraunhofer upHear Voice Quality Enhancement (VQE) technology optimizes the microphone signals collected by the device’s microphone array and provides a clean speech signal to the Mail.ru voice assistant, Marusia (Маруся), while also enabling far-field voice calls.

In the communication mode, the full-duplex communication functionalities of Fraunhofer upHear VQE ensure that the conversational partners in a voice call can talk to each other in the best possible audio quality. This is achieved by canceling out acoustic echoes, removing reverberation and noise while ensuring that the perceived loudness remains at the same level—even when the user is moving closer or further away from the smart speaker.

In the smart assistant mode, upHear VQE enables Marusia to accurately hear voice commands issued from anywhere in the room by providing them in the best possible audio quality. The Fraunhofer technology removes interfering sounds for far-field operation and cancels out acoustic echoes caused by the speaker’s own loudspeaker signal during playback to enable barge-in. As a result, no matter where in a room the commands are given, and even while the smart speaker is playing music, the keyword spotter and speech recognizer receive a clean audio signal.

Fraunhofer upHear VQE processes microphone signals, thus enabling far-field full-duplex conversations in the full perceptible audio bandwidth for communication devices. It also allows far-field voice commands and barge-in during audio playback for smart assistant devices, always with outstanding audio quality. This is achieved by combining advanced multichannel acoustic echo cancellation, source localization, noise reduction, dereverberation, automatic gain control and beamforming methods. The fully integrated technology is suitable for numerous applications, including natural language understanding in mobile and smart assistant devices, as well as conferencing solutions. upHear VQE’s flexibility allows for its application with a wide range of microphone array geometries built into mobile phones and smart assistant devices, such as smart speakers, soundbars, cameras and TVs. It can also be configured to meet the computational resources’ requirements. upHear VQE is optimized for mono and stereo as well as surround and even immersive audio devices.

