DIGITAL AUDIO: Notes about Audio Standards

Goffredo Haus editor
Fall 1995
© IEEE Computer Society Press.

3.6. AES3-1992 (ANSI S4.40-1992) "AES-EBU"

Serial transmission format for two-channel linearly represented digital audio data,
Revision of AES3-1985 (ANSI S4.40-1985)

Description

The format provides for the serial digital transmission of two channels of periodically sampled and uniformly quantized audio signals on a single shielded twisted wire pair. The transmission rate is such that samples of audio data, one from each channel, are transmitted in time division multiplex in one sample period. Provision is made for the transmission of both user and interface related data as well as of timing related data, which may be used for editing and other purposes. It is expected that the format will be used to convey audio data that have been sampled at any of the sampling frequencies recognized by AES5, "Recommended Practice for Professional Digital Audio Applications Employing Pulse-Code Modulation. Preferred Sampling Frequencies".

Contents

Scope
Interface format (Terminology: Sampling frequency, Audio sample word, Auxiliary sample bits, Validity bit, Channel status, User data, Parity bit, Preambles, Subframe, Frame, Block, Channel coding; Structure of format: Subframe format, Frame format; Channel coding; Preambles, Validity bit)
User data format
Channel status format
Interface format implementation (General; Transmitter: "Minimum", "Standard", and "Enhanced" implementation of channel status)
Electric Requirements (General characteristics, Line driver characteristics, Line receiver characteristics, Connectors)
Normative References
Provision of additional, voice-quality channels via the digital audio interface
Generation of CRCC (byte 23) for channel status

Detailed Description

Scope This AES3-1992 standard specifies a recommended interface for the serial digital transmission of two channels of periodically sampled and linearly represented digital audio data from one transmitter to one receiver.
It is expected that the format will be used to convey audio data that have been sampled at any of the sampling frequencies recognized by the AES5 Recommended Practice for Professional Digital Audio Applications Employing Pulse-Code Modulation-Preferred Sampling Frequencies. The AES3-1992 has been designed to be a self-synchronizing digital interface scheme, which could carry additional data to identify, for example, the types of digital audio being carried, whether emphasis has been used, copy protection information, two time codes, and a number of other extremely useful identification bits.
A first step towards AES3-1992 was the definition of the AES3-1985, "AES Recommended Practice for Digital Audio Engineering-Serial Transmission Format for Linearly Represented Digital Audio Data" (also described in ANSI S4.40-1985 and IEC 958). Then, the AES3-1992 is a revision of AES3-1985. Also due to the EBU (European Broadcasting Union), which has republished the AES3 standard document, it is commonly accepted that digital I/O ports respecting this standard be named as "AES/EBU-Format" ports.
The format is intended for use with shielded twistedpair cable of conventional design over distances of up to 100 m without transmission equalization or any special equalization at the receiver. Longer cable lengths may be used, but with a rapidly increasing requirement for care in cable selection and possible receiver equalization or the use of active repeaters, or both.
The AES3-1992 standard does not cover connection to any common carrier equipment, nor does it specifically address any questions about the synchronizing of large systems, although by its nature the format permits easy synchronization of receiving devices to the transmitting device.
Specific synchronization issues are covered in AES11.
In this interface specification, mention is made of an interface for consumer use. The two interfaces are not identical.

Interface format:

Terminology

sampling frequency: Frequency of the samples representing an audio signal. When more than one audio signal is transmitted through the same interface, the sampling frequencies are identical.
audio sample word: Amplitude of a digital audio sample. Representation is linear in 2's complement binary form. Positive numbers correspond to positive analog voltages at the input of the analog-to-digital converter (ADC). The number of bits per word can be specified from 16 to 24 in two coding ranges (less than or equal to 20 bits and less than or equal to 24 bits).
auxiliary sample bits: 4 least significant bits (LSBs) which can be assigned as auxiliary sample bits and used for auxiliary information when the number of audio sample bits is less than or equal to 20.
validity bit: Bit indicating whether the audio sample bits in the subframe (time slots 4 to 27 or 8 to 27, depending on the audio word length as described below) are suitable for conversion to an analog audio signal .
channel status: Bits carrying, in a fixed format derived from the block (see below), information associated with each audio channel which is decodable by any interface user.
user data: Channel provided to carry any other information.
parity bit: Bit provided to permit the detection of an odd number of errors resulting from malfunctions in the interface.
preambles: Specific patterns used for synchronization. There are three different preambles (see below).
subframe: Fixed structure used to carry the information described in "sampling frequences and preambles (see also subframe format and frame format below).
frame: Sequence of two successive and associated subframes.
block: Group of 192 consecutive frames. The start of a block is designated by a special subframe preamble (see below).
channel coding: Coding describing the method by which the binary digits are represented for transmission through the interface.

Summary of the structure of format

As shown later, the data are divided into two subfrarnes, each 32 bits transmitted in one sample period at the standard sampling frequencies of 48 kHz, 44.1 kHz or 32 kHz. The sudframes, comprises a 4-bit preamble or sync group, 4 bits of auxiliary audio data (or 4 additional audio data bits for a possible maximum of 24 if the sample word extends beyond the 20 default bits) 20 bits representing the sampled digital audio, plus four ancillary bits for validity, user bits, channel status and parity. The digital data are sent LSB (least significant bit) first, with alternating subframes for Channel #1 then Channel #2. The Channel Code used to transmit the data is biphase mark from the family of self-clocking Manchester Codes (binary frequency modulation).
Of the four ancillary bits, the validity bit indicates whether the previous audio sample data bits are secure or error free (logic 0 = valid; 1= defective) and suitable for conversion to an analog audio signal. A parity bit is set to provide even parity over the current subframe. and hence enable simple detection of transmission errors. The user bit can be used freely by the sending equipment to designate hardware- or system-specific information. The channel status information is of major significance. It is sent over the AES3 interface as blocks of 192 bits as 24, eightbit bytes, which at a sampling frequency of 48 kHz equates to a repeat interval of 4 ms. As we shall see, it is the channel status that carries the unique and operationally useful data concerning copy protection, emphasis, sampling frequency, and a host of other information.
Subframes are separated by a unique 4-bit preamble or sync group, which designates the start of each data sequence. Three unambiguous types of preamble are used to designate the following unique conditions: the start of subframe A (channel #1), and hence a frame, but which also marks the start of a new Channel Status data block; the start of subframe A, and hence a frame; and the start of subframe B (channel #2).

Subframe format

Each subframe is divided into 32 time slots, numbered from 0 to 31 (see Figure 6).

Figure 6 - Subframe format of AES3

Time slots 0 to 3 (preambles) carry one of the three permitted preambles (see frame format and preambles; also see Figure 7).
Time slots 4 to 27 (audio sample word) carry the audio sample word in linear 2's complement representation. The most significant bit (MSB) is carried by time slot 27.
When a 24-bit coding range is used, the LSB is in time slot 4 (see Figure 6a). When a 20-bit coding range is sufficient, time slots 8 to 27 carry the audio sample word with the LSB in time slot 8. Time slots 4 to 7 may be used for other applications. Under these circumstances, the bits in time slots 4 to 7 are designated auxiliary sample bits (see figure 6b).
If the source provides fewer bits than the interface allows (either 20 or 24), the unused LSBs are set to logic 0.
Time slot 28 (validity bit) carries the validity bit associated with the audio sample word (see validity bit below).
Time slot 29 (user data bit) carries 1 bit of the user data channel associated with the audio channel transmitted in the same subframe (see User data format).
Time slot 30 (channel status bit) carries 1 bit of the channel status information associated with the audio channel transmitted in the same subframe (see Channel status format).
Time slot 31 (parity bit) carries a parity bit such that time slots 4 to 31 inclusive will carry an even number of ones and an even number of zeros (even parity).
NOTE - The preambles have even parity as an explicit property.

Frame format

A frame is uniquely composed of two subframes (see Figure 7). The rate of transmission of frames corresponds exactly to the source sampling frequency.

Figure 7 - Frame format of AES3

The first subframe normally starts with preamble "X". However, the preamble changes to preamble "Z" once every 192 frames. This defines the block structure used to organize the channel status information. The second subframe always starts with preamble "Y".
The modes of transmission are signaled by setting bits 0 to 3 of byte 1 of channel status.
Two-channel mode: In two-channel mode, the samples from both channels are transmitted in consecutive subframes. Channel 1 is in subframe 1, and channel 2 is in subframe 2.
Stereophonic mode: In stereophonic mode, the interface is used to transmit stereophonic audio in which the two channels are presumed to have been simultaneously sampled. The left, or "A", channel is in subframe 1, and the right, or "B," channel is in subframe 2.
Single-channel mode (monophonic): In monophonic mode, the transmitted bit rate remains at the normal two-channel rate and the audio sample word is placed in subframe 1. Time slots 4 to 31 of subframe 2 either carry the bits identical to subframe 1 or are set to logic 0. A receiver normally defaults to channel 1 unless manual override is provided.
Primary/secondary mode: In some applications requiring two channels where one of the channels is the main or primary channel while the other is a secondary channel, the primary channel is in subframe 1, and the secondary channel is in subframe 2.

Channel coding

To minimize the direct-current (dc) component on the transmission line, to facilitate clock recovery from the data stream, and to make the interface insensitive to the polarity of connections, time slots 4 to 31 are encoded in biphase-mark.
Each bit to be transmitted is represented by a symbol comprising two consecutive binary states. The first state of a symbol is always different from the second state of the previous symbol. The second state of the symbol is identical to the first if the bit to be transmitted is logic 0. However, it is different if the bit is logic 1 (see Figure 8).

Figure 8 - Channel coding of AES3

Preambles

Preambles are specific patterns providing synchronization and identification of the subframes and blocks.
To achieve synchronization within one sampling period and to make this process completely reliable, these patterns violate the biphase-mark code rules, thereby avoiding the possibility of data imitating the preambles.
A set of three preambles is used. These preambles are transmitted in the time allocated to four time slots at the start of each subframe (time slots 0 to 3), and are represented by eight successive states. The first state of the preamble is always different from the second state of the previous symbol (representing the parity bit). Depending on this state the preambles are as in Table VIII.

Table VIII.

Like biphase code, these preambles are dc free and provide clock recovery. They differ in at least two states from any valid biphase sequence.
Figure 9 represents preamble "X".

Figure 9 - Preamble X (11100010) of AES3

NOTE - Owing to the even-parity bit in time slot 31, all preambles will start with a transition in the same direction (see Subframe format). Thus only one of these sets of preambles will, in practice, be transmitted through the interface. However, it is necessary for either set to be decodable because a polarity reversal might occur in the connection.

Validity bit

The validity bit is logic 0 if the audio sample word is suitable for conversion to an analog audio signal, and it is logic 1 if it is not. There is no default state for the validity bit.

User data format

User data bits may be used in any way desired by the user.
Possible formats for the user data channel are indicated by the channel status byte 1, bits 4-7.
The default value of the user data bit is logic 0.

Channel status format

The channel status for each audio signal carries information associated with that audio signal, and thus it is possible for different channel status data to be carried in the two subframes of the digital audio signal. Examples of information to be carried in the channel status are: length of audio sample words, number of audio channels, sampling frequency, sample address code, alphanumeric source and destination codes, and emphasis.
Channel status information is organized in 192-bit blocks, subdivided into 24 bytes (see figure 10). The first bit of each block is carried in the frame with preamble "Z".

Figure 10 - Channel status data format of AES3

The specific organization follows, wherein the suffix 0 designates the first byte or bit.

Byte 0
bit 0

0 Consumer use of channel status block (see note below).

1 Professional use of channel status block.

bit 1

0 Normal audio mode.

1 Nonaudio mode.

bits 2-4

Encoded audio signal emphasis.

bit

2 3 4

state

0 0 0 Emphasis not indicated.

Receiver defaults to no emphasis with manual override enabled.

l 0 0 No emphasis. Receiver manual override is disabled .

1 1 0 50/15-µs emphasis. Receiver manual override is disabled .

1 1 1 International Telegraph and Telephone Consultative Committee (CCITT)

J.17 emphasis (with 6.5-dB insertion loss at 800 Hz).

Receiver manual override is disabled.

All other states of bits 2-4 are reserved and are not to be used until furtherdefined.

bit 5

1 Source sampling frequency unlocked.

0 Default and source sampling frequency locked.

bits 6-7

Encoded sampling frequency.

bit

6 7

state

0 0 Sampling frequency not indicated. Receiver default to 48 kHz and

manual override or auto set is enabled.

0 1 48-kHz sampling frequency. Manual override or auto set is disabled.

1 0 44.1-kHz sampling frequency. Manual override or auto set is disabled.

1 1 32-kHz sampling frequency. Manual override or auto set is disabled.

NOTE - The significance of byte 0, bit 0 is such that a transmission from an interface conforming to IEC 958 "consumer use" can be identified, and a receiver conforming only to IEC 958 "consumer use" will correctly identify a transmission from a "professional-use" interface as defined in this standard. Connection of a "professional-use" transmitter with a "consumer-use" receiver or viceversa might result in unpredictable operation. Thus the following byte definitions only apply when bit 0 = logic 1 (professional use of the channel status block).

Byte 1
bits 0-3

Encoded channel mode.

bit

0 1 2 3

state

0 0 0 0 Mode not indicated. Receiver default to two channel mode.

Manual override is enabled.

0 0 0 1 Two-channel mode. Manual override is disabled.

0 0 1 0 Single-channel mode (monophonic).

Manual override is disabled.

0 0 1 1 Primary/secondary mode (subframe 1 is primary).

Manual override is disabled.

0 1 0 0 Stereophonic mode (channel 1 is left channel).

Manual override is disabled.

0 1 0 1 Reserved for user-defined applications.

0 1 1 0 Reserved for user-defined applications .

1 1 1 1 Vector to byte 3. Reserved for future applications.

All other states of bits 0-3 are reserved and are not to be used until

further defined.

bits 4-7

Encoded user bits management.

bit

4 5 6 7

state

0 0 0 0 Default, no user information is indicated.

0 0 0 1 192-bit block structure. Preamble "Z" indicates the start of block.

0 0 1 0 Reserved for the AES18 standard.

0 0 1 1 User defined.

All other states of bits 4-7 are reserved and are not to be used until

further defined.

Byte 2
bits 0-2

Encoded use of auxiliary sample bits.

bit

0 1 2

state

0 0 0 Maximum audio sample word length is 20 bits (default).

Use of auxiliary sample bits is not defined.

0 0 0 Maximum audio sample word length is 24 bits.

Auxiliary sample bits are used for main audio sample data.

0 1 0 Maximum audio sample word length is 20 bits.

Auxiliary sample bits in this channel are used to carry a single

coordination signal (the signal coding used for the coordination channel

is described in the ending paragraph of this section).

0 1 1 Reserved for user-defined applications .

All other states of bits 0-2 are reserved and are not to be used until

further defined.

bits 3-5

Encoded audio sample word length of transmitted signal (see notes below).

bit

3 4 5 Audio sample word length Audio sample word length

if maximum length is 24 bits as if maximum length is 20 bits as

indicated by bits 0-2 above. indicated by bits 0-2 above.

state

0 0 0 Word length not indicated (default). Word length not indicated (default).

0 0 1 23 bits 19 bits

0 1 0 22 bits 18 bits

0 1 1 21 bits 17 bits

1 0 0 20 bits 16 bits

1 0 1 24 bits 20 bits

All other states of bits 3-5 are reserved and are not to be used until

further defined.

bits 6-7

Reserved and are set to logic 0 until further defined.

NOTES - The default state of bits 3-5 indicates that the number of active bits within the 20- or 24-bit coding range is not specified by the transmitter. The receiver should default to the maximum number of bits specified by the coding range and enable manual override or auto set.
The nondefault state of bits 3-5 indicates the number of bits within the 20- or 24-bit coding range which might be active. This is also an indirect expression of the number of LSBs that are certain to be inactive, which is equal to 20 or 24 minus the number corresponding to the bit state. The receiver should disable manual override and auto set for these bit states.
lrrespective of the audio sample word length as indicated by any of the states of bits 35, the MSB is in time slot 27 of the transmitted subframe as specified above.

Byte 3
bits 0-7

Vectored target byte from byte 1.

Reserved for future use as multichannel function description.

These bits are set to logic 0 at present.

Byte 4
bits 0-1

Digital audio reference signal (per AES11).

bit

0 1

state

0 0 Not a reference signal (default).

0 1 Grade 1 reference signal.

1 0 Grade 2 reference signal.

1 1 Reserved and not used until further defined.

bits 2-7

Reserved and are set to logic 0 until further defined.

Byte 5
bits 0- 7

Reserved and are set to logic 0 until further defined.

Bytes 6-9

Alphanumeric channel origin data. First character in message is byte 6.

bits 0-7

7-bit International Organization for Standardization (ISO) 646, American

(each byte)

Standard Code for Information Interchange (ASCII), data with no parity bit.

LSBs are transmitted first with logic 0 in bit 7. Nonprinted control characters

(codes 01 to 1F hex and 7F hex) are not permitted. Default value is logic 0

(code 00 hex, ASCII null).

Bytes 10-13

Alphanumeric channel destination data. First character in message is byte 10.

bits 0-7

7-bit ISO 646 (ASCII) data with no parity bit. LSBs are transmitted first

(each byte)

with logic 0 in bit 7. Nonprinted control characters (codes 01 to lF hex and

7F hex) are not permitted. Default value is logic 0 (code 00 hex, ASCII null).

Bytes 14-17

Local sample address code (32-bit binary with LSBs first).

Value is of first sample of current block.

bits 0-7

LSBs are transmitted first. Default value is logic 0.

(each byte)

NOTE - This has the same function as a recording index counter.

Bytes 18-21

Time-of-day sample address code (32bit binary with LSBs first).

Value is of first sample of current block.

bits 0-7

LSBs are transmitted first. Default value is logic 0.

(each byte)

NOTE - This is the time of day laid down during the source encoding of the signal and remains unchanged during subsequent operations. A value of all zeros for the binary sample address code is, for transcoding to real time, or to time codes in particular, to be taken as midnight (i.e., 00 h, 00 min, 00 s, 00 frame). Transcoding of the binary number to any conventional time code requires accurate sample frequency information to provide a sample accurate time.

Byte 22

Flag used to identify whether the information carried by the channel

status data is reliable. According to the following table, if reliable, the

appropriate bits are set to logic 0 (default); if unreliable, the bits are

set to logic 1.

bits 0-3

Reserved and are set to logic 0 until further defined.

bit 4

Bytes 0 to 5

bit 5

Bytes 6 to 13

bit 6

Bytes 14 to 17

bit 7

Bytes 18 to 2

1 Byte 23

Channel status data cyclic redundancy check character (CRCC).

bits 0-7

Generating polynomial is G(x) = x^(8) + x^(4) + x^(3) + x^(2) + 1.

The CRCC conveys information to test valid reception of the entire

channel status data block (bytes 0 to 22 inclusive). For serial implementations

the initial condition of all ones should be used in generating the check bits

with the LSB transmitted first. Default value is logic 0 for

"minimum" implementation of channel status only (see below).

Interface format implementation:

General

To promote compatible operation between items of equipment built to this specification it is necessary to establish which information bits and operational bits need to be encoded and sent by a transmitter and decoded by an interface receiver.

Documentation shall be provided describing the channel status features supported by the interface transmitters and receivers.

Transmitter

Transmitters shall follow all the formatting and channel coding rules established in earlier sections of this specification including all notes therein. Along with the audio sample word, all transmitters shall correctly encode and transmit the validity bit, user bit, parity bit, and the three preambles. The channel status shall be encoded to one of the implementations given below.

The following three implementations are defined: "minimum," "standard," and "enhanced." These terms are used to communicate in a simple manner the level of implementation of the interface transmitter involving the many features of channel status. Irrespective of the level of implementation, all reserved states of bits defined in the Section above shall remain unchanged.

"Minimum" implementation of channel status

The "minimum" implementation represents the lowest level of implementation of the interface that meets the requirements of this specification document. In the "minimum" implementation, transmitters shall encode and transmit channel status byte 0 bit 0 with a state of logic 1 signifying "professional use of channel status block". All other channel status bits of byte 0 to byte 23 inclusive shall be transmitted with the default state of all logic 0's. In this circumstance, the receiver will adopt the default conditions specified in bytes 0 to 2.

If additional bytes of channel status (which do not fully comply with the "standard" implementation, see below) are implemented as required by an application, the interface transmitter shall be classified as a "minimum" implementation of channel status.

It should be noted that the "minimum" implementation imposes severe operational restrictions on some receiving devices which may be connected to it. For example, receivers implementing byte 23 will normally show a cyclic redundancy check error when the default value of logic 0 is received as the CRCC. Also, reception of the default value for byte 0 bits 6-7 might cause improper operation in receiving devices not supporting manual override or auto set capabilities.

"Standard" implementation of channel status

The "standard" implementation provides a fundamental level of implementation which should prove sufficient for general applications in professional audio or broadcasting. In addition to conforming to the requirements described above for the "minimum" implementation, a "standard" implementation interface transmitter shall correctly encode and transmit all channel status bits in byte 0, byte 1, byte 2, and byte 23 (CRCC) in the manner specified in this document.

"Enhanced" implementation of channel status

In addition to conforming to the requirements described above for the "standard" implementation, the "enhanced" implementation shall provide further capabilities .

Receivers

Implementation in receivers is highly dependent on the application. Proper documentation shall be provided on the level of implementation of the interface receiver for decoding the transmitted information (validity, user, channel status, parity) and on whatever subsequent action is taken by the equipment of which it is a part.

Electrical requirements:

General characteristics

The electrical parameters of the interface are based on those defined in CCITT V.11 which allow transmission of balanced-voltage digital signals up to a few hundred meters distance.
The interconnecting cable shall be balanced and screened (shielded) with a nominal characteristic impedance of 110 Omega at frequencies from 0.1 to 6.0 MHz .

Connectors

The standard connector for both outputs and inputs shall be the circular latching three-pin connector described in IEC 268-12. (This type of connector is normally called XLR).
An output connector fixed on an item of equipment shall use male pins with a female shell. The corresponding cable connector shall thus have female pins with a male shell.
An input connector fixed on an item of equipment shall use female pins with a male shell. The corresponding cable connector shall thus have male pins with a female shell. The pin usage shall be:

Pin 1 Cable shield or signal earth;
Pin 2 Signal;
Pin 3 Signal.

(Note that the relative polarity of pins 2 and 3 is not important in the digital case.)

Equipment manufacturers should clearly label digital audio inputs and outputs as such, including the terms "digital audio input" or "digital audio output" as appropriate.

In such cases where panel space is limited and the function of the connector might be confused with an analog signal connector, the abbreviation DI or DO should be used to designate digital audio inputs and outputs, respectively.

Provision of additional, voice-quality channels via the digital audio interface

When a 20-bit coding range is sufficient for the audio signal, the 4 auxiliary sample bits may be used for a voicequality coordination signal (talk back).

The voice-quality signal is sampled at exactly one-third of the sampling frequency for the main audio, coded uniformly with 12 bits per sample represented in 2's complement form. It is sent 4 bits at a time in the auxiliary sample bits of the interface subframes. One such signal may be sent in subframe 1 and another in subframe 2. The "Z" preamble at the start of each block is used as a frame alignment word for the voice-quality signals. The two subframes of frame 0 each contain the 4 LSBs of a sample of their respective voice-quality signal, as shown in Figure 11. Figure 11 also shows two voice-quality signals, one in each subframe.

Figure 11 - Frame and block structure of AES3 for two voice-quality signals

[Index | Main Paragraph | Previous Paragraph | Next Paragraph ]