|
1. Introduction
Low bitrate audio coding is an enabling technology for a number
of applications like digital radio, Internet streaming (web radio)
and mobile multimedia applications.
The
limited overall bandwidth available for a digital radio system
(terrestrial or satellite based) makes it desirable to use a low
bitrate per channel in order to create an attractive portfolio
of programs for the listener. Therefore, system designers have
to use highly efficient perceptual audio codecs (like mp3 or AAC)
at low bitrates.
In
Internet streaming applications, the connection bandwidth that
can be established between the web radio server and the listener's
client application depends on the listener's connection to the
Internet. In most cases today, people use analog modems or ISDN
lines with a fairly limited datarate, lower than the rate which
would be desirable to produce an appealing audio quality by means
of conventional perceptual audio codecs. And even if consumers
connect to the Internet through high bandwidth connections such
as xDSL, the ever-present congestion on the Internet limits the
connection bitrate that can be used in a stable manner over a
longer time period.
In
mobile communications, the situation is similar to the digital
radio scenario. Since the overall bandwidth available for all
services in a certain geographic area (a network cell) is limited,
the system operator has to take measures to allow as many users
as possible in that network cell to access mobile communication
services in parallel. It is evident that for commercial reasons,
the network operators have to ensure that they use their available
spectrum as efficiently as possible by means of highly efficient
speech and audio codecs. Considering the impact that the advent
of multimedia services has on the datarate demands in mobile communication
systems, it becomes immediately apparent that even with UMTS,
cellular networks will have to use perceptual codecs at a fairly
low datarate.
[back
to top]
2.
The Technical Challenge
Using
perceptual codecs at low bitrates, however, is not without its downside.
State-of-the-art perceptual audio codecs achieve "CD-quality" or
"transparent" audio quality at a bitrate of approximately 128 kbps
(~ 12:1 compression). Below 128 kbps, the perceived audio quality
of most of these codecs begins to degrade significantly. The codecs
either start to reduce the audio bandwidth and to modify the stereo
image, or they introduce annoying coding artifacts resulting from
a shortage of bits in the attempt to represent the complete audio
bandwidth. Both ways of modifying the perceived sound can be considered
unacceptable above a certain level. At 64 kbps for instance, mp3
would either offer an audio bandwidth of only ~ 10 kHz or introduce
a fair amount of coding artifacts. Each of these factors severely
affects the listening experience.
[back to top]
3.
The Technical Solution
SBR (Spectral Band Replication) is a new audio coding enhancement
tool, which is standardized in ISO/IEC 14496-3:2001/Amd.1:2003.
It offers the possibility to improve the performance of low bitrate
audio and speech codecs by either increasing the audio bandwidth
at a given bitrate or by improving coding efficiency at a given
quality level.
SBR
can increase the limited audio bandwidth that a conventional perceptual
codec offers at low bitrates, so that it equals or exceeds analogue
FM audio bandwidth (15 kHz). SBR can also improve the performance
of narrow-band speech codecs, offering the broadcaster speech-only
channels with 12 kHz audio bandwidth used for example in multilingual
broadcasting. As most speech codecs are very bandlimited, SBR
is important not only for improving speech quality, but also for
improving speech intelligibility and speech comprehension. SBR
is mainly a post-process, although some pre-processing is performed
in the encoder in order to guide the decoding process.

From a technical point of view, SBR is a new method for highly
efficient coding of high frequencies in audio compression algorithms.
When used in conjunction with SBR, the underlying coder is only
responsible for transmitting the lower part of the spectrum. The
higher frequencies are generated by the SBR decoder, which is
mainly a post-process following the conventional waveform decoder.
Instead of transmitting the spectrum, SBR reconstructs the higher
frequencies in the decoder based on an analysis of the lower frequencies
transmitted in the underlying coder. To ensure an accurate reconstruction,
some guidance information is transmitted in the encoded bitstream
at a very low datarate.
The
reconstruction is efficient for harmonic as well as for noise-like
components and allows for proper shaping in the time domain as
well as in the frequency domain. As a result, SBR allows full
bandwidth audio coding at very low data rates, thus offering a
significantly increased compression efficiency compared to the
core coder.
[back to top]
4.
The Performance
SBR can enhance the efficiency of perceptual audio codecs
by ~ 30 % (even more in certain configurations) in the medium
to low bitrate range. The exact level of improvement that SBR
can offer also depends on the underlying codec. For instance,
using SBR in conjunction with mp3 (see below under mp3PRO) we
can achieve a quality at 64 kbps stereo that compares to conventional
mp3 at a bitrate of > 100 kbps stereo. SBR can be used with mono
and stereo as well as with multichannel audio.
SBR
offers maximum efficiency in the bitrate range where the underlying
codec itself is able to encode audio signals with an acceptable
level of coding artifacts at a limited audio bandwidth.
[back
to top]
|