DIGITAL AUDIO: Tutorial on Audio MPEG

Davide Cilano
Fall 1995
© IEEE Computer Society Press.

1.1. Compressed Encoding

The preceding chapters explained that digital audio allows a greater flexibility in use than analogical signals. However, to obtain a good signal quality, digital systems must be capable of handling a substantial quantity of information in short time and at high speed.

For instance, Pulse Code Modulation (PCM) requires:

if the signal bandwidth reaches 20,000 Hz and a signal-to-noise ratio (SNR) of at least 90 dB is desired, then a processing capability of 768 Kb/s is required for each monophonic channel; from:

16(Bit) * 48000 (sampling frequency) = 768,000 Bit/sec.

Thus, storage in PCM format calls for costly equipments and transmission channels of sophisticated design, which have required high precision optical media (Compact Disks) and magnetic devices (DAT: Digital Audio Tape), while radio broadcasting in PCM format is unrealistic because of the high costs involved.

The needs of broadcasting via radio high quality digital signals has stimulated the study of encoding systems to reduce the quantity of data required to store the signal without sacrificing the high quality of 16-bit, 44.1 kHz PCM systems - from here on labeled "CD quality". Such encoding systems are called "compressed", because they do "compress" the numerical data representing the signal and reduce their size; two examples, mentioned above, are A_Law and _Law, which compress the signal by reducing its dynamics according to the 40-phon isosensation curve. These encoding systems are capable of maintaining all information of the signal, or entropy, by optimizing only its amplitude encoding, based on the perceptive capability of the human ear; practically they use the assumption that man uses the whole audio information to construct his final sensation of "music". In an experiment (related in our case to audio signal) which can take values with probability and such that , ENTROPY or AVERAGE INFORMATION is defined as the expression:

However, psychoacoustical research have shown that the human ear is not capable of capturing the whole information of a PCM signal. It follows that capacity reduction of a channel can be obtained with an encoding system which exploits the human characteristics of perception, thus integrating entropic encoding. In fact, the ear perceives only one portion of the sound information (most sound information is not perceived because of the behavior of the eardrum, of the cochlea and of the receiving cells). Moreover, if presented with audio signals with many frequency components, a large part of them offers no contribution whatsoever to the "musical sensation", because of masking effects. Each sound effectively masks those which are more similar and less intense - a well known phenomenon in the frequency domain, and therefore contributes to setting a global masking threshold.

Therefore, the next step consists of exploiting the perceptive capability of the ear to significantly reduce the information quantity needed to represent the audio signal, encoding only those parts which are really perceptible. In practice such encodings, called "perceptive", optimize the use of the medium by exploiting the characteristics of the receiver, obtaining in such a way the desired signal compression.

The techniques that satisfy such constraints are called "Perceptive Compression Techniques" and allow a typical reduction factor of 1:4 to 1:8 in respect to the quantity of information to transmit, thus reducing substantially the channel size and making low cost radio broadcasting possible.

The compression factor (%) is defined as follows [Salton]:

Compression_Factor = Uncoded_Dim - Coded_Dim / Uncoded_Dim

where Dim stands for "dimension". For practical applications, digital encoding must pay attention to the following criteria:

COMPRESSION, for a distribution with low bit rate and thus low cost
HIGH QUALITY of the audio signal, comparable to Compact Disk quality
ROBUSTNESS, which has to be guaranteed for transmission to mobile targets
FLEXIBILITY, to adapt itself to various needs
DECODING IN REAL TIME

One encoding system which satisfies all these constraints is for instance MPEG_Audio, which is known for its efficiency and flexibility.

[ Index | Main Paragraph | Next Paragraph ]