DIGITAL AUDIO: Tutorial on Audio MPEG

Davide Cilano
Fall 1995
© IEEE Computer Society Press.

2.4. THE ENCODING PROCESS Layer II

The encoder produces the encoded sequence from the PCM samples.

INPUT: audio signal as PCM samples
OUTPUT: compressed sequence according to the ISO-MPEG-Audio standard (sequence of frames)

The encoding sequence is not totally standardized and can vary in respect to:

the model of perception
the table of the encoding quantization steps
the table of the encoding scaling factors

The fundamental numerical quantities of MPEG_Audio encoding are: Signal to Noise Ratio, SNR, i.e. the difference in dB between the signal and the quantization noise; it is always positive and in first approximation is about 6*B, where B is the number of bits used to quantize the signal; the SNR value depends exclusively from the number of bits used to quantize the signal and therefore from the bit allocation value.
Signal to Mask Ratio, SMR, i.e. the difference in dB between the signal and the masking threshold; the latter is computed from the psychoacoustical model; a positive SMR value indicated that the signal is audible, while a negative one shows that the signal cannot be heard because masked by other signal components.
Mask to Noise Ratio, MNR, i.e. the difference in dB between the masking value and the quantizing noise; its value has a very important qualitative meaning, since a negative value indicates that in that band the quantization noise is audible, while a positive one implies complete masking of the quantization noise; in term of quantity, it indicates the dB margin for signal processing and how much the quantization noise can be heard.

These quantities have the following relation:

MNR = SNR - SMR

The efficiency of encoding depends mainly on the accuracy of the computations of the psychoacoustical model, which can be seen as as a computational system that takes as input the 1152 time samples and gives as output the SMR value in dB for every subband and channel. The standard allows use of any model, leaving the possibility open of updating the encoder with more accurate and efficient psychoacoustical models, which may be available in the future. Note that this will not make existing sequences obsolete, as the decoding process does not depend from the encoding model. What however the psychoacoustical model must accomplish is well defined: it must give the SMR (Signal to Mask Ratio, or difference in dB between the maximum signal and the minimum masking level) for every channel and every subband.

Schematic sequence of the function of the perceptive model for every monophonic channel:

input of 1152 PCM audio samples
application of the psychoacoustical model
output of 32 SMR values

The relationship among SNR, SMR and MNR in each subband can be graphically illustrated as follows:

Fig. 2.1 Diagram of the qualitatively best values for Signal, Mask and Noise. The figures illustrates the fact that SMR influences directly the BIT_ALLOCATION value and that SNR depends uniquely from the number of quantization steps for the subband. From the SMR value follows BIT_ALLOCATION, because of the consideration that it is useless to quantize with a number of steps that introduces a quantization noise much smaller than the already existing noise (because of signal characteristics) in each subband; noise is meant to be all the signal below the masking threshold. Recall the quantization noise decreased by 3 dB for every additional quantization bit. In fact, every signal below the masking threshold cannot be perceived in any way and is therefore unneeded for sound reproduction. In such as way, dynamic quantization of the frequency samples can be justified in order to maintain the Signal-to-Noise-Ration, SNR, in function of the number of bits used for quantizing and dependent on the value between Signal and Mask (SMR). It is therefore sufficient to quantize the samples in each band with a number of bits such that:

SNR > SMR

The quality of audio, and the margin for audio editing, is the greater the larger the difference MNR = SNR - SMR, and this depends uniquely from the number of available bits to encode all 1152 samples, i.e. from the dimension of the FRAME - audio modality and bit rate.

An optimum allocation is one that succeeds in obtaining value of MNR > 0 for all subbands with an audible signal. Obviously, since the frame has a fixed dimension, it is not said that such situation can be reached; or if reached, it can happen that there remain some bits available which can be allocated with no loss to further increase the MNR value in all subbands.

The number of bits used to quantize the samples (BIT_ALLOCATION) for each subband is found by excess by selecting the band with the lest MNR, thus obtaining a value as constant as possible among all subbands, and at any rate by trying to make it positive.

Note also that psychoacoustical models, even though common for all three layers, use a different accuracy per layer, as more accuracy implies more complexity and thus increased cost for hardware devices to perform encoding in real time.

Encoding syntax eliminates a portion of the irrelevance of the audio signal, as variations less than 6 dB cannot practically be perceived.

ENCODING ALGORITHM:

The filter bank obtains the frequency representation of the input samples, mapped in 32 subbands with equal bandwidth
Computation of scaling factors and scaling indices is performed
The psychoacoustical model computes the masking threshold and supplies the SMR values for each subband
The number of quantization steps for each subband is computed so that quantization noise is minimal (using the results of the psychoacoustical model)
Samples are normalized (divided by the THRESHOLD FACTOR)
Samples are quantized
Data structures are sent to a module which formats them as frames and adds auxiliary information (such as a code for error detection or auxiliary data), thus yielding the final compressed sequence in the MPEG syntax
The sequence of frames at the decoder output can thus be stored or transmitted.

The complexity of the encoder depends on the layer.

[ Index | Main Paragraph | Previous Paragraph | Next Paragraph ]