International Conference on Semantic Audio at Fraunhofer IIS

The Audio Engineering Society’s (AES) International Conference on Semantic Audio took place on June 22-24 in Erlangen, Germany at Fraunhofer IIS. The conference included technical discussions along with demos, poster sessions and a few cultural and entertaining activities every evening.

Attendees received a pre-conference in-depth introduction with three tutorials on music performance analysis, sonic interactions for virtual reality applications and phase reconstruction from magnitude spectrograms. While the tutorials were not mandatory for attendance, almost all the delegates joined the sessions. Alexander Lerch and Stefan Weinzierl (“Music Performance Analysis”) explained how semantic analysis could be applied to support learning a musical instrument by providing feedback on the accuracy of a performance with respect to rhythm, pitch and tone.

In the second tutorial entitled “Sonic Interactions for Virtual Reality Applications”, Stefania Serafin talked about ways to enhance VR applications by providing sonic and haptic feedback with applications to gaming, virtual musical instruments and rehabilitation.

Christian Dittmar presented a comprehensive overview of different methods for phase reconstruction from magnitude spectrograms. Such algorithms are applied for enhancing the sound quality when applying source separation methods.

The conference officially started on June 22 with a welcome speech from Bernhard Grill, Director of Fraunhofer IIS. Mark Plumbley, Professor of Signal Processing at the University of Surrey gave the keynote speech on audio event detection and scene recognition. He talked about the computational methods for analyzing recordings of everyday sounds with the aim to classify the sounds, such as door slamming or gun shots; and the environment, e.g. train station or an office space.

Fraunhofer IIS’s Audio and Media Technologies division presented the EVS, MPEG-H and Cingo demos. The delegates were curious to see the latest audio technologies being applied and engaged in discussions regarding the implementation of MPEG-H in Korea.

Awards were given out later that evening. Rodrigo Schramm and Emmanouil Benetos received the Best Paper Award for their paper on “Automatic Transcription of A Cappella Recordings from Multiple Singers”. The winners of another category, the Best Student Paper Award, were Rachel M. Bittner, Justin Salamon, Juan J. Bosch and Juan P. Bello, for their paper on “Pitch Contours as a Mid-Level Representation for Music Informatics”. The evening continued with a concert by LINda Capo (http://www.lindamund.de/), who performed a pleasant fusion of jazz and pop.

The conference’s second day started with the “Pitch Tracking” oral session which included the keynote “Pitch-based Audio Algorithms” by Udo Zölzer. The TU Hamburg professor, who is well versed in digital audio processing, discussed various pitch-related topics, including estimating the fundamental frequency of an audio signal and how to use this information for creative audio effects, such as automatic harmonization of singing and novel methods for waveform synthesis.

Attendees also dined at a Franconian beer garden where they toured the maze of historic cellars that cooled beer in the summer before the advent of the modern refrigerator.

The conference finished on June 24 with an oral session on Deep Learning followed by an invited talk from Masataka Goto, a well-known researcher and scientist, who made major contributions to the field of Semantic Audio. Goto introduced Hatsune Miku, a digital avatar which uses singing synthesis that has inspired many people in Japan to create and share multimedia content. Goto also explained how audio analysis technology could be applied to browse and interact with this web-native content.

Facts and figures

This year’s AES International Conference has been the third on the Semantic Audio topic; The first conference was held in Ilmenau, Germany in July 2011, and the second conference in London, UK in January 2014.
AES 2017 was jointly organized by the Fraunhofer Institute for Integrated Circuits IIS, the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and the International Audio Laboratories Erlangen.
76 delegates from 15 countries from Europe, Australia, Asia and America participated, roughly 30% industry professionals and 70% from academia. The conference was chaired by Dr. Christian Uhle (Fraunhofer IIS) and Prof. Meinard Müller (FAU, AudioLabs). The scientific program was coordinated by the paper chairs Christian Dittmar (FAU, AudioLabs) and Dr. Jakob Abeßer (Fraunhofer IDMT).
The conference organizers received 38 paper submissions, of which 27 have been presented as lectures (13) or posters (14). The call for late-breaking demos gathered five presenters and two demos have been presented by Fraunhofer IIS. The technical program also comprised 2 keynotes given by Prof. Mark Plumpley and Prof. Udo Zölzer and an invited talk by Masataka Goto.

Header image © Fraunhofer IIS/David Willner

Here are some impressions from the conference: