The encoder produces the encoded sequence from the PCM samples.
These quantities have the following relation:
The efficiency of encoding depends mainly on the accuracy of the computations of the psychoacoustical model, which can be seen as as a computational system that takes as input the 1152 time samples and gives as output the SMR value in dB for every subband and channel. The standard allows use of any model, leaving the possibility open of updating the encoder with more accurate and efficient psychoacoustical models, which may be available in the future. Note that this will not make existing sequences obsolete, as the decoding process does not depend from the encoding model. What however the psychoacoustical model must accomplish is well defined: it must give the SMR (Signal to Mask Ratio, or difference in dB between the maximum signal and the minimum masking level) for every channel and every subband.
Schematic sequence of the function of the perceptive model for every monophonic channel:
Fig. 2.1 Diagram of the qualitatively best values for Signal, Mask and Noise. The figures illustrates the fact that SMR influences directly the BIT_ALLOCATION value and that SNR depends uniquely from the number of quantization steps for the subband. From the SMR value follows BIT_ALLOCATION, because of the consideration that it is useless to quantize with a number of steps that introduces a quantization noise much smaller than the already existing noise (because of signal characteristics) in each subband; noise is meant to be all the signal below the masking threshold. Recall the quantization noise decreased by 3 dB for every additional quantization bit. In fact, every signal below the masking threshold cannot be perceived in any way and is therefore unneeded for sound reproduction. In such as way, dynamic quantization of the frequency samples can be justified in order to maintain the Signal-to-Noise-Ration, SNR, in function of the number of bits used for quantizing and dependent on the value between Signal and Mask (SMR). It is therefore sufficient to quantize the samples in each band with a number of bits such that:
The quality of audio, and the margin for audio editing, is the greater the larger the difference MNR = SNR - SMR, and this depends uniquely from the number of available bits to encode all 1152 samples, i.e. from the dimension of the FRAME - audio modality and bit rate.
An optimum allocation is one that succeeds in obtaining value of MNR > 0 for all subbands with an audible signal. Obviously, since the frame has a fixed dimension, it is not said that such situation can be reached; or if reached, it can happen that there remain some bits available which can be allocated with no loss to further increase the MNR value in all subbands.
The number of bits used to quantize the samples (BIT_ALLOCATION) for each subband is found by excess by selecting the band with the lest MNR, thus obtaining a value as constant as possible among all subbands, and at any rate by trying to make it positive.
Note also that psychoacoustical models, even though common for all three layers, use a different accuracy per layer, as more accuracy implies more complexity and thus increased cost for hardware devices to perform encoding in real time.
Encoding syntax eliminates a portion of the irrelevance of the audio signal, as variations less than 6 dB cannot practically be perceived.
ENCODING ALGORITHM:
[ Index | Main Paragraph | Previous Paragraph | Next Paragraph ]