Robust Audio Project

Vision

Two issues are influencing this project. The first is that audio is usually the most important component in multimedia communication. The second is that multimedia communication is and will be done over heterogeneous networks. Therefore we have requirements in terms of audio compression, error recovery, acoustic echo cancellation, platform independence and low computational cost. Our goal is to find mechanisms which provide the best audio quality at the least cost.

Subprojects

ICSI Audio Lab - IAL

by Roy Chua, Rainer Storn, Hartmut Chodura

The ICSI Audio Lab was built to provide a test environment for audio applications on Unix-Workstations and Windows95 PCs. Much like the tools vic and vat this tools is designed to be public domain and give researchers the opportunity to test new signal processing and communication algorithms easily. Although there are many audio tools publicly available, few of them provide the source code and none of them can easily be modified to serve as a test environment for new ideas in audio and communication processing. We have implemented and tested a howl remover relying on UDP communication. As the tool is in its final stages, we will then be able to quickly implement and test signal processing components like echo suppression, audio or speech compression, audio protection with error correcting mechanisms and the like. In addition we want to test different communication mechanisms like for example multicast. Documentation of the IAL can be found in Roy's report.

Audio Encoding and Protection - AEP

by Martin Isenburg, Rainer Storn, Hartmut Chodura

Audio communication over communication networks is often degraded due to packet losses either in the network or in the endstations. Therefore one often encounters substantially reduced audio quality. Our goal is to encode the audio information in such a way that this quality degradation can be graceful. Two different scenarios have to be kept in mind, one is broadcast like in internet radio etc., the other one is full duplex communication like in internet telephony. In broadcast the encoding procedure is allowed to be time consuming whereas in full duplex communication delay is critical. We are looking into protection schemes like PET and the one used in the MICE project in order to cope with both situations. We are investigating how well the protection schemes work, whether they provide a graceful quality degra- dation under various conditions but also how much delay they are introducing. In order to allow for graceful degradation the encoding of the audio signal has to be layered. This way different protection priorities can be assigned to the different layers. Our current focus is on wavelet based encoding which seems to be well suited for that task. Eventually all tests will be done with the IAL. Documentation of our current work is available in Martin's study.

Acoustic Echo Cancellation - AEC

by Rainer Storn, Roy Chua

State of the art multimedia full duplex communication calls for a headset free environment, i.e. the communicating parties can freely walk about their rooms and communicate via omnidirectional microphones and loudspeakers. As the loudspeaker signal is picked up by the local microphone, acoustical echo occurs which affects a natural dialog. Electronic cancellation of the echoes, performed by an adaptive filter or Acoustic Echo Canceller (AEC) is therefore mandatory. Although good AECs are already on the market they are still expensive and therefore don't belong to the standard equipment of workstations and notebooks. Our goal is to build a cost-effective AEC the implementation of which can be done in SW on future generation workstations. The AEC should also have the graceful degradation property, i.e. it should adapt to the processing power of the workstation or notebook and render merely echo suppression if the computational power doesn't suffice. State of the art workstations as well as notebooks are still unable to run AECs due to a lack in speed and real-time OS. Therefore most of our algorithms have still to be implemented still on a DSP. In order to maintain low cost, we concentrate low cost 16-bit fixed point DSPs like the ADSP2181 from Analog Devices

A preliminary version of a simple SW echo canceler with merely 342 FIR filter taps is already available. The algorithm uses a modified LMS algorithm. Listen to a distorted message which mimicks speech + echo and the filtered message which has been processed by the canceler and thus is much easier to understand. In the current example you can listen to the filter "on its way to adaptation" which takes quite some time for speech signals. Our current results concerning AEC are summarized in Rainer's techreport.

This page is under construction by Rainer Storn