Serial transmission format for two-channel linearly represented digital audio data,
Revision of AES3-1985 (ANSI S4.40-1985)
Description
The format provides for the serial digital transmission of two channels of periodically sampled and uniformly quantized audio signals on a single shielded twisted wire pair. The transmission rate is such that samples of audio data, one from each channel, are transmitted in time division multiplex in one sample period. Provision is made for the transmission of both user and interface related data as well as of timing related data, which may be used for editing and other purposes. It is expected that the format will be used to convey audio data that have been sampled at any of the sampling frequencies recognized by AES5, "Recommended Practice for Professional Digital Audio Applications Employing Pulse-Code Modulation. Preferred Sampling Frequencies".
Contents
Interface format:
sampling frequency: Frequency of the samples representing an audio signal. When more than one audio signal is transmitted through the same interface, the sampling frequencies are identical.
audio sample word: Amplitude of a digital audio sample. Representation is linear in 2's complement binary form. Positive numbers correspond to positive analog voltages at the input of the analog-to-digital converter (ADC). The number of bits per word can be specified from 16 to 24 in two coding ranges (less than or equal to 20 bits and less than or equal to 24 bits).
auxiliary sample bits: 4 least significant bits (LSBs) which can be assigned as auxiliary sample bits and used for auxiliary information when the number of audio sample bits is less than or equal to 20.
validity bit: Bit indicating whether the audio sample bits in the subframe (time slots 4 to 27 or 8 to 27, depending on the audio word length as described below) are suitable for conversion to an analog audio signal .
channel status: Bits carrying, in a fixed format derived from the block (see below), information associated with each audio channel which is decodable by any interface user.
user data: Channel provided to carry any other information.
parity bit: Bit provided to permit the detection of an odd number of errors resulting from malfunctions in the interface.
preambles: Specific patterns used for synchronization. There are three different preambles (see below).
subframe: Fixed structure used to carry the information described in "sampling frequences and preambles (see also subframe format and frame format below).
frame: Sequence of two successive and associated subframes.
block: Group of 192 consecutive frames. The start of a block is designated by a special subframe preamble (see below).
channel coding: Coding describing the method by which the binary digits are represented for transmission through the interface.
As shown later, the data are divided into two subfrarnes, each 32 bits transmitted in one sample period at the standard sampling frequencies of 48 kHz, 44.1 kHz or 32 kHz. The sudframes, comprises a 4-bit preamble or sync group, 4 bits of auxiliary audio data (or 4 additional audio data bits for a possible maximum of 24 if the sample word extends beyond the 20 default bits) 20 bits representing the sampled digital audio, plus four ancillary bits for validity, user bits, channel status and parity. The digital data are sent LSB (least significant bit) first, with alternating subframes for Channel #1 then Channel #2. The Channel Code used to transmit the data is biphase mark from the family of self-clocking Manchester Codes (binary frequency modulation).
Of the four ancillary bits, the validity bit indicates whether the previous audio sample data bits are secure or error free (logic 0 = valid; 1= defective) and suitable for conversion to an analog audio signal. A parity bit is set to provide even parity over the current subframe. and hence enable simple detection of transmission errors. The user bit can be used freely by the sending equipment to designate hardware- or system-specific information.
The channel status information is of major significance. It is sent over the AES3 interface as blocks of 192 bits as 24, eightbit bytes, which at a sampling frequency of 48 kHz equates to a repeat interval of 4 ms. As we shall see, it is the channel status that carries the unique and operationally useful data concerning copy protection, emphasis, sampling frequency, and a host of other information.
Subframes are separated by a unique 4-bit preamble or sync group, which designates the start of each data sequence. Three unambiguous types of preamble are used to designate the following unique conditions: the start of subframe A (channel #1), and hence a frame, but which also marks the start of a new Channel Status data block; the start of subframe A, and hence a frame; and the start of subframe B (channel #2).
Each subframe is divided into 32 time slots, numbered from 0 to 31 (see Figure 6).
Figure 6 - Subframe format of AES3
Time slots 0 to 3 (preambles) carry one of the three permitted preambles (see frame format and preambles; also see Figure 7).
Time slots 4 to 27 (audio sample word) carry the audio sample word in linear 2's complement representation. The most significant bit (MSB) is carried by time slot 27.
When a 24-bit coding range is used, the LSB is in time slot 4 (see Figure 6a).
When a 20-bit coding range is sufficient, time slots 8 to 27 carry the audio sample word with the LSB in time slot 8. Time slots 4 to 7 may be used for other applications. Under these circumstances, the bits in time slots 4 to 7 are designated auxiliary sample bits (see figure 6b).
If the source provides fewer bits than the interface allows (either 20 or 24), the unused LSBs are set to logic 0.
Time slot 28 (validity bit) carries the validity bit associated with the audio sample word (see validity bit below).
Time slot 29 (user data bit) carries 1 bit of the user data channel associated with the audio channel transmitted in the same subframe (see User data format).
Time slot 30 (channel status bit) carries 1 bit of the channel status information associated with the audio channel transmitted in the same subframe (see Channel status format).
Time slot 31 (parity bit) carries a parity bit such that time slots 4 to 31 inclusive will carry an even number of ones and an even number of zeros (even parity).
NOTE - The preambles have even parity as an explicit property.
A frame is uniquely composed of two subframes (see Figure 7). The rate of transmission of frames corresponds exactly to the source sampling frequency.
Figure 7 - Frame format of AES3
The first subframe normally starts with preamble "X". However, the preamble changes to preamble "Z" once every 192 frames. This defines the block structure used to organize the channel status information. The second subframe always starts with preamble "Y".
The modes of transmission are signaled by setting bits 0 to 3 of byte 1 of channel status.
Two-channel mode: In two-channel mode, the samples from both channels are transmitted in consecutive subframes. Channel 1 is in subframe 1, and channel 2 is in subframe 2.
Stereophonic mode: In stereophonic mode, the interface is used to transmit stereophonic audio in which the two channels are presumed to have been simultaneously sampled. The left, or "A", channel is in subframe 1, and the right, or "B," channel is in subframe 2.
Single-channel mode (monophonic): In monophonic mode, the transmitted bit rate remains at the normal two-channel rate and the audio sample word is placed in subframe 1. Time slots 4 to 31 of subframe 2 either carry the bits identical to subframe 1 or are set to logic 0. A receiver normally defaults to channel 1 unless manual override is provided.
Primary/secondary mode: In some applications requiring two channels where one of the channels is the main or primary channel while the other is a secondary channel, the primary channel is in subframe 1, and the secondary channel is in subframe 2.
To minimize the direct-current (dc) component on the transmission line, to facilitate clock recovery from the data stream, and to make the interface insensitive to the polarity of connections, time slots 4 to 31 are encoded in biphase-mark.
Each bit to be transmitted is represented by a symbol comprising two consecutive binary states. The first state of a symbol is always different from the second state of the previous symbol. The second state of the symbol is identical to the first if the bit to be transmitted is logic 0. However, it is different if the bit is logic 1 (see Figure 8).
Figure 8 - Channel coding of AES3
Preambles are specific patterns providing synchronization and identification of the subframes and blocks.
To achieve synchronization within one sampling period and to make this process completely reliable, these patterns violate the biphase-mark code rules, thereby avoiding the possibility of data imitating the preambles.
A set of three preambles is used. These preambles are transmitted in the time allocated to four time slots at the start of each subframe (time slots 0 to 3), and are represented by eight successive states. The first state of the preamble is always different from the second state of the previous symbol (representing the parity bit). Depending on this state the preambles are as in Table VIII.
Table VIII.
Like biphase code, these preambles are dc free and provide clock recovery. They differ in at least two states from any valid biphase sequence.
Figure 9 represents preamble "X".
Figure 9 - Preamble X (11100010) of AES3
NOTE - Owing to the even-parity bit in time slot 31, all preambles will start with a transition in the same direction (see Subframe format). Thus only one of these sets of preambles will, in practice, be transmitted through the interface. However, it is necessary for either set to be decodable because a polarity reversal might occur in the connection.
The validity bit is logic 0 if the audio sample word is suitable for conversion to an analog audio signal, and it is logic 1 if it is not. There is no default state for the validity bit.
User data bits may be used in any way desired by the user.
Possible formats for the user data channel are indicated by the channel status byte 1, bits 4-7.
The default value of the user data bit is logic 0.
The channel status for each audio signal carries information associated with that audio signal, and thus it is possible for different channel status data to be carried in the two subframes of the digital audio signal. Examples of information to be carried in the channel status are: length of audio sample words, number of audio channels, sampling frequency, sample address code, alphanumeric source and destination codes, and emphasis.
Channel status information is organized in 192-bit blocks, subdivided into 24 bytes (see figure 10). The first bit of each block is carried in the frame with preamble "Z".
Figure 10 - Channel status data format of AES3
The specific organization follows, wherein the suffix 0 designates the first byte or bit.
Byte 0
bit 0
NOTE - The significance of byte 0, bit 0 is such that a transmission from an interface conforming to IEC 958 "consumer use" can be identified, and a receiver conforming only to IEC 958 "consumer use" will correctly identify a transmission from a "professional-use" interface as defined in this standard. Connection of a "professional-use" transmitter with a "consumer-use" receiver or viceversa might result in unpredictable operation. Thus the following byte definitions only apply when bit 0 = logic 1 (professional use of the channel status block).
Byte 1
bits 0-3
Byte 2
bits 0-2
NOTES - The default state of bits 3-5 indicates that the number of active bits within the 20- or 24-bit coding range is not specified by the transmitter. The receiver should default to the maximum number of bits specified by the coding range and enable manual override or auto set.
The nondefault state of bits 3-5 indicates the number of bits within the 20- or 24-bit coding range which might be active. This is also an indirect expression of the number of LSBs that are certain to be inactive, which is equal to 20 or 24 minus the number corresponding to the bit state. The receiver should disable manual override and auto set for these bit states.
lrrespective of the audio sample word length as indicated by any of the states of bits 35, the MSB is in time slot 27 of the transmitted subframe as specified above.
Byte 3
bits 0-7
Byte 4
bits 0-1
Byte 5
bits 0- 7
Bytes 6-9
Bytes 10-13
Bytes 14-17
NOTE - This has the same function as a recording index counter.
Bytes 18-21
NOTE - This is the time of day laid down during the source encoding of the signal and remains unchanged during subsequent operations. A value of all zeros for the binary sample address code is, for transcoding to real time, or to time codes in particular, to be taken as midnight (i.e., 00 h, 00 min, 00 s, 00 frame). Transcoding of the binary number to any conventional time code requires accurate sample frequency information to provide a sample accurate time.
Byte 22
Interface format implementation:
To promote compatible operation between items of equipment built to this specification it is necessary to establish which information bits and operational bits need to be encoded and sent by a transmitter and decoded by an interface receiver.
Documentation shall be provided describing the channel status features supported by the interface transmitters and receivers.
Transmitters shall follow all the formatting and channel coding rules established in earlier sections of this specification including all notes therein. Along with the audio sample word, all transmitters shall correctly encode and transmit the validity bit, user bit, parity bit, and the three preambles. The channel status shall be encoded to one of the implementations given below.
The following three implementations are defined: "minimum," "standard," and "enhanced." These terms are used to communicate in a simple manner the level of implementation of the interface transmitter involving the many features of channel status. Irrespective of the level of implementation, all reserved states of bits defined in the Section above shall remain unchanged.
The "minimum" implementation represents the lowest level of implementation of the interface that meets the requirements of this specification document. In the "minimum" implementation, transmitters shall encode and transmit channel status byte 0 bit 0 with a state of logic 1 signifying "professional use of channel status block". All other channel status bits of byte 0 to byte 23 inclusive shall be transmitted with the default state of all logic 0's. In this circumstance, the receiver will adopt the default conditions specified in bytes 0 to 2.
If additional bytes of channel status (which do not fully comply with the "standard" implementation, see below) are implemented as required by an application, the interface transmitter shall be classified as a "minimum" implementation of channel status.
It should be noted that the "minimum" implementation imposes severe operational restrictions on some receiving devices which may be connected to it. For example, receivers implementing byte 23 will normally show a cyclic redundancy check error when the default value of logic 0 is received as the CRCC. Also, reception of the default value for byte 0 bits 6-7 might cause improper operation in receiving devices not supporting manual override or auto set capabilities.
The "standard" implementation provides a fundamental level of implementation which should prove sufficient for general applications in professional audio or broadcasting. In addition to conforming to the requirements described above for the "minimum" implementation, a "standard" implementation interface transmitter shall correctly encode and transmit all channel status bits in byte 0, byte 1, byte 2, and byte 23 (CRCC) in the manner specified in this document.
In addition to conforming to the requirements described above for the "standard" implementation, the "enhanced" implementation shall provide further capabilities .
Implementation in receivers is highly dependent on the application. Proper documentation shall be provided on the level of implementation of the interface receiver for decoding the transmitted information (validity, user, channel status, parity) and on whatever subsequent action is taken by the equipment of which it is a part.
Electrical requirements:
The electrical parameters of the interface are based on those defined in CCITT V.11 which allow transmission of balanced-voltage digital signals up to a few hundred meters distance.
The interconnecting cable shall be balanced and screened (shielded) with a nominal characteristic impedance of 110 Omega at frequencies from 0.1 to 6.0 MHz .
The standard connector for both outputs and inputs shall be the circular latching three-pin connector described in IEC 268-12. (This type of connector is normally called XLR).
An output connector fixed on an item of equipment shall use male pins with a female shell. The corresponding cable connector shall thus have female pins with a male shell.
An input connector fixed on an item of equipment shall use female pins with a male shell. The corresponding cable connector shall thus have male pins with a female shell. The pin usage shall be:
Equipment manufacturers should clearly label digital audio inputs and outputs as such, including the terms "digital audio input" or "digital audio output" as appropriate.
In such cases where panel space is limited and the function of the connector might be confused with an analog signal connector, the abbreviation DI or DO should be used to designate digital audio inputs and outputs, respectively.
When a 20-bit coding range is sufficient for the audio signal, the 4 auxiliary sample bits may be used for a voicequality coordination signal (talk back).
The voice-quality signal is sampled at exactly one-third of the sampling frequency for the main audio, coded uniformly with 12 bits per sample represented in 2's complement form. It is sent 4 bits at a time in the auxiliary sample bits of the interface subframes. One such signal may be sent in subframe 1 and another in subframe 2. The "Z" preamble at the start of each block is used as a frame alignment word for the voice-quality signals. The two subframes of frame 0 each contain the 4 LSBs of a sample of their respective voice-quality signal, as shown in Figure 11. Figure 11 also shows two voice-quality signals, one in each subframe.
Figure 11 - Frame and block structure of AES3 for two voice-quality signals
[Index | Main Paragraph | Previous Paragraph | Next Paragraph ]