DIGITAL AUDIO: AIFF-C

Draft date: 7/30/90
Apple Computer, Inc.

Appendix D. Sound Accelerator (Saxel) Chunk

*** Caution ***

The use of a Sound Accelerator Chunk (Saxel) and the specific implementation of a Saxel have not yet been finalized! The draft version of this document is being prepared to get early developer feedback on the other sections of this specification. Such early feedback is considered to be more valuable than waiting for the details on this particular Chunk to be finalized. However, we do have something to say about what Saxels could look like and we would like your input on this topic as well. The following section is NOT finalized - it contains a possible, not a probable, implementation of a Saxel. Its inclusion in this specification is primarily to give you, the developer, an opportunity to learn about the purpose of a Saxel and to send us your considered feedback on this topic.

**************

Saxel Definition

Audio de-compression algorithms contain internal parameters which track the behavior the sound being expanded. As these internal parameters depend on the history of the previous sound samples, a simple attempt to begin playback at arbitrary positions in the compressed sound data would result in artifacts and distortion of the initial portion of the expanded sound. A Saxel stores information about the compressed sound at a Marker position, thus providing a means for high quality playback of random selections of compressed sound data.

Background

Generally, a decompressor must start from the beginning of the compressed data stream. It requires running state (e.g. internal filter parameters or recently de-compressed samples) to decompress the next sample. To start playback at a marker point somewhere within the audio stream, you could:

(a) decompress the data from the beginning and start playing once you reach the marker, or
(b) use additional data to locate the marked point within the compressed data stream and load up the decompressor state, then start playing, or
(c) compute the marked point within the compressed data stream (only possible for fixed-ratio compression types), initialize the decompressor as if it were starting at the beginning, and ignore the startup transient (only useful for decompressors that would "settle down" in this case).

Method (a) is always possible as a fall-back. Method (b) is much faster, if you have the required data. And that's what Saxel (Sound Accelerator) chunks are for. Method (c) may be acceptable for certain applications and/or certain classes of audio compression. At this time, no firm decision has been made on which method to implement. The following is a tentative implementation of a variation of method (b) although there is no commitment to using this approach. We would value your feedback on this.

A Sound Accelerator (Saxel) chunk is used in combination with a Marker when the sound data is compressed. The saxel carries the required data to locate a point in the compressed data stream and to initialize the decompressor. Saxels enable method (b) and a modified method (a):

(d) decompress the data from the previous marker that has a Saxel and start playing once you reach the desired marker.

The data format for a Saxel is inherently specific to the compression type. Here, we specify saxels for the currently supported Apple compression techniques listed below. For other compression algorithms supported by developers, other schemes for a Saxel may be employed. Applications which support compressed audio need to understand how to process the Saxel Chunks for each compression type they support. We are primarily interested in your feedback on the following:

Is the Saxel Chunk as described below for the Apple's audio compress algorithms suitable?
What would you need in a Saxel for another compression algorithm you want to support?

Saxel

A Saxel has the following format:


typedef struct { 

MarkerId id; /* link accelerator data to a marker */ 
unsigned short size; /* size of saxelData */ 
char saxelData[]; /* algorithm-specific accelerator data */ 

} Saxel;

id identifies the marker for which the sound accelerator data is to be used. It's considered good practice to supply a saxel for every marker. That way, you don't have to guess which markers will be used as playback points.

size indicates the length in bytes of the sound accelerator data, saxelData. The data must be padded with a byte at the end as needed to make it an even number of bytes long. This pad byte, if present, is not included in size

saxelData contains the specific sound accelerator data which is compression-type specific. See appendix C for a description of SaxelData formats for the Macintosh and Apple IIGS compression types.

Saxel Chunk Format

The format for the data within a Saxel Chunk is shown below.


#define SaxelID /* ckID for Saxel Chunk */ 
typedef struct { 

ID ckID; /* 'SAXL' */ 
long ckDataSize; 
unsigned short numSaxels; 
Saxel saxels[]; 

} SaxelChunk;

ckID is always 'SAXL'. ckDataSize is the size of the data portion of the chunk, in bytes. It does not include the 8 bytes used by ckID and ckDataSize.

numSaxels is the number of saxels in the Saxel Chunk. Multiple Saxel Chunks are allowed in a single FORM AIFC file. Since the total amount of Saxel data for a heavily-edited sound file may be quite large, it may be easier for an application to store the various Saxels independently of each other.

numSaxels, if non-zero, is followed by the saxels themselves. Since each saxel occupies an even number of bytes, the saxels are packed together with no unused bytes between them. The saxels need not be ordered in any particular manner.

The Saxel Chunk is optional. Any number of Saxel Chunks may appear in a FORM AIFC.

Saxels for ACE & Macintosh Audio Compression Types

An application which is requested to begin playback at a specific marker will first pass the data contained in saxelData[] to the buffered_expansion_playback routine without doing an audible playback. After this, the internal parameters in the expansion algorithm will have reached stability and expansion_playback of compressed data beginning at the marker position may begin. See the section on Markers for setting markers into compressed sound data. See appendix C for specific information on compaction methods for Macintosh and Apple IIGS compressed sounds.

The saxelData for the compression types 'ACE2', 'ACE8', 'MAC3', and 'MAC6' consist of the previous 48 sample frames of compressed sound data. That is, saxelData[] contains the 48 sample frames of compressed sound preceding the Marker. If the Marker position is such that there are less than 48 sample frames of compressed sound data before the expanded sample would be encountered, then the Saxel would contain the compressed sound data from the beginning up to, but not including, the compressed sample frame containing the initial sample to be played, and the Saxel size is set accordingly. A Saxel is not necessary for a Marker which references a sample to be expanded from the first sample frame of the compressed sound data.
Here are some examples:

Macintosh 3:1 single-channel compressed sound (Frame size = 2 bytes)

Macintosh 6:1 single-channel compressed sound (Frame size = 1 byte)

ACE 2:1 single-channel compressed sound (Frame size = 1 byte)

ACE 8:3 single-channel compressed sound (Frame size = 3 bytes)

Macintosh 6:1 Stereo (Frame size = 2 bytes) Refer to the paragraph on multichannel compressed sound storage in this appendix for clarification of which bytes are used to store the Saxel data. Remember that Marker positions reference uncompressed sample frames (see Marker Chunk).

Saxels and Markers

In general, whenever a Marker is created, a Saxel should be created for that Marker (except in the case where the Marker position is within the first sample frame of compressed sound data). Whenever a Marker is deleted, the Saxel for that Marker should be deleted. If a Marker exists within a portion of sound data which has been relocated, both the Marker position and Saxel for that Marker need to be updated.

If the Marker is at or near the beginning of the sound data which has been relocated, you may want to consider the following information when updating its Saxel. In the following figure, segment A of sound data containing marker A and its associated Saxel A is to be cut and pasted into the sound data stream at position B. Since Marker A is near the beginning of segment A, Saxel A contains sound data which is outside segment A. Updating Marker A to its new position would normally cause the sound data indicated by potential Saxel B to be used to refer to the new marker position.

Saxel A, however, contains the natural progression of sound which leads to marker A and creates a smooth transition to the sound data beginning at Marker A. The sound data located at potential Saxel B is not necessarily related to the sound data in segment A and may actually cause a discontinuity if used as a Saxel for the updated Marker A. Thus, to preserve sound quality for playback beginning at the updated Marker A, the original contents of Saxel A may be used for the Saxel referring to the updated Marker A.

A potential problem with this may be seen in the following two scenarios:

(1) Suppose that the original Saxel A data were kept to refer to the updated Marker A which is now near position B. The user now deletes the updated Marker A, then inserts a new Marker at the same position. A new Saxel corresponding to the Marker is created using the data indicated by potential Saxel B. Sound playback beginning at the Marker position now sounds different. The reason for this may not be obvious to the user who simply cut one Marker and created another in its place.
(2) A new Marker with a Saxel using approximately the sound data at potential Saxel B is created one sample position after the updated Marker A which uses the original Saxel A data to refer to it. Playback beginning at the new Marker sounds quite different than playback beginning at the updated Marker A.

The choice as to the selection of which Saxel data to use in this (hopefully unlikely) corner case depends on the contents of the sound, the position of the Marker relative to the beginning of the relocated segment and the expectations and sophistication of the user. Your application should consider these criteria and make the best choice it can, or pass the choice on to the user if required.

[ Index | Previous Paragraph | Next Paragraph ]