DIGITAL AUDIO: The Sony/Philips ADPCM Audio Coder

Fabio Minazzi
Fall 1995
© IEEE Computer Society Press.

3 The encoder and the compression techniques involved

CD-I audio encoder belongs to the category of medium complexity coders, since the processing of data for storage on the disc features 4 simple stages, each involving a different processing technique:

Subsampling,
Linear Switched Prediction (SP),
Adaptive Quantization Forward (AQF),
Noise Feedback Coding (NFC).

Since the theory and the application of the sampling frequency conversion are well known and documented (see for example [CR] and [BY]), we shall point our attention to the techniques 2 - 4. Level C audio requires a subsampling by a factor 2 (37800 to 18900 Hz). Figure 1 illustrates shows a detailed view.

Figure 1: The ADPCM Encoder

Because of the low number of bit/sample used to represent the signal (Table 2) it is necessary to introduce, before quantization, a linear filtering stage, to feed into the quantizer a reduced version of the input signal, stripped of some information redundancies. This removal which could be complete if infinite length filters were used, comes out to be only partial in practical cases, where only finite order filters can be used. For a given filter, moreover, the degree of redundancy removal depends on the characteristics of the input signal x(n). As this is expressed in terms of a better or worse signal prediction by the filters, the filtering stage is called prediction, and its output (from now on d(n)) is called "prediction error".

The transfer functions of the 4 prediction filters of the CD-I audio coder are:

It is important to notice that CD-I encoding works in the digital domain, with two possible sampling frequencies (37.8 kHz and 18.9 kHz), so that for A and B coding the operation of the prediction filters is different that for C level coding, as shown in figure 2.

(a)

frequency (Hz)

(b)

frequency (Hz)

figure 2: predictor's transfer functions for
(a) Fs = 37.8 kHz, (b) Fs = 18.9 kHz

It should be remarked that the difference between characteristics at the two sampling frequencies can be as much as 20 dB. This partially accounts for a worse performance of the level C ADPCM audio.

To understand the logic that drives the choice of the best filter it is worth making a short digression about the adaptive systems to introduce some concepts used also in the following chapters.

Let's therefore consider the two general adaptive systems [WS]: "open loop" and "closed loop". The difference between the two schemes concerns the fact that in the open loop scheme the adaptation is performed on the ground of "a priori" knowledge about the system operation, while in the closed loop scheme the adaptation is of based on the "trial and error" concept. The error control is performed by using the processed signal, plus a reference signal (that can also be the input signal) and some information coming from other sources. From these data one or more control parameters are produced, whose nature is specific to the adaptation system under question.

figure 3: Encoder's prediction stage

the control parameter is deduced from the input signal and the predicted signal, and more exactly is the peak of their difference d(n) over a group of 28 samples,
the signal processing is performed with the filter units shown in figure 1.3,
the adaptation is restricted to a set of only 4 filters, therefore the system is said to be "switched" instead of adaptive.

Although the limited possibility of adaptation outlined above introduces a non ideality factor which causes the filter coefficients to lay only occasionally in the minimum of the performance curve [WS] whichever is the performance parameter chosen, it has been shown by Tuyt that this does not entail deviations as to the ideal case [Tu].
Summarizing, in the switched prediction stage of the CD-I audio encoder the choice of the coefficients (K1 and K2) to be used for filtering is made every 28 samples, by writing on the output channel just the number corresponding to the filter chosen (0 to 3). This solution is very simple and robust because it avoids transmission of filter coefficients.

[ Index | Previous Paragraph | Next Paragraph ]