3GPP Immersive Phone Calls

IVAS makes Immersive Calls and Conferencing a Reality

Many people feel more exhausted by telephone conferences and other group calls than by real-life conversations. One of the reasons for this is that the brain has to untangle the voices coming out of the speakers and to match them to the people in the conversation. In most cases, the audio scenes are played in mono, which makes that task even harder for the brain.

The call quality in today’s mobile networks has come a long way since the conversation “guessing games” of past decades. Communication codecs like Enhanced Voice Services (EVS), deliver great audio quality, are very robust, and highly efficient. But even EVS, which has become the de facto standard in telephony since its release in 2014, works with the assumption that we do indeed hold a phone to one of our ears while we are in a call. This leads to a focus on mono sound, which is certainly justified in many cases, but does not cover the range of options the rapid development of technology has added to telephone calls.

Spatial audio for mobile phone networks

Today, devices like earbuds, sound bars, and car as well as laptop speakers are everywhere, and people already use them for calls and online conferencing. Although many of them can deliver stereo or immersive experiences, this is currently rarely used for telephony applications. The new IVAS (Immersive Voice and Audio Services) codec can change this: IVAS is an extension of EVS and makes possible the transmission of stereo and immersive audio via mobile networks. As a communication codec, it was optimized for the highly efficient compression of spatial audio, which is crucial in this environment. Whereas the past development of communication codecs has focused on improved compression efficiency and higher audio bandwidth for monaural signals only, IVAS finally overcomes this limitation. It supports stereo and immersive signals including multi-channel, Ambisonics, objects, a novel metadata-assisted spatial audio (MASA) format, and even combined Ambisonics/MASA and objects.

Creating communication spaces

The transmission of spatial and immersive audio with the IVAS codec and suitable devices can open up completely new communication spaces:

  • Immersive Calls: IVAS enables participants to capture immersive scenes and to convey them to one another. This is great for sharing the full immersive experience of an event or outdoor experience
  • Ad-hoc Conferencing: By placing the telephone on a conferencing table, a realistic acoustic image of the surrounding people can be picked up and rendered again at one or multiple receivers. Rendering the immersive scene makes it easier to distinguish between the speakers’ voices and to separate them from ambient sounds.
  • Multi-party Conferencing: For more complex situations, the voices of multiple participants are transmitted as individual streams and spatially rendered on the receiving device to match the video scene transmitted in parallel. Users can then customize the audio, for instance by altering the volume for individual participants or moving them around the room. Also, an intermediate call server could combine multiple participants calling from various locations into a (fictional) immersive scene.

This makes IVAS a great technology for business calls as well as for sharing the full immersive experience of an event or outdoor expedition. In fact, it is suitable for a range of environments and can even connect them – possibly for a meeting with people at home, in the office, and the car. There are also application options that connect outdoor, city, and industrial environments by rendering participants into a captured immersive scene. In all these situations, IVAS delivers a more lifelike experience to reduce the effort required for listening and concentration and to minimize fatigue.
In addition to these main application areas, IVAS is also capable of improving still-common mono audio channel messages like SMS/iMessage and RCS with immersive impressions.

IVAS was recently selected by 3GPP as a feature of the more advanced mobile networks of 5G Advanced (Release 18). As one of the world’s most renowned research and development institutions for audio technologies, Fraunhofer IIS contributed significantly to the “IVAS Public Collaboration”, a cooperation of eleven companies on the IVAS standard.

This post is also available in: 汉语

Twitter – Fraunhofer IIS