Fall 1995
© IEEE Computer Society Press.


File formats

Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats -- sometimes losing information, however.

File formats are a separate issue from device characteristics. There are two types of file formats: self-describing formats, where the device parameters and encoding are made explicit in some form of header, and "raw" formats, where the device parameters and encoding are fixed.

Self-describing file formats generally define a family of data encodings, where a header fields indicates the particular encoding variant used. Headerless formats define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample).

The header of self-describing formats contains the parameters of the sampling device and sometimes other information (e.g. a human-readable description of the sound, or a copyright notice). Most headers begin with a simple "magic word". (Some formats do not simply define a header format, but may contain chunks of data intermingled with chunks of encoding info.) The data encoding defines how the actual samples are stored in the file, e.g. signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Strictly spoken, channel interleaving is also part of the encoding, although so far I have seen little variation in this area.

Some file formats apply some kind of compression to the data, e.g. Huffman encoding, or simple silence deletion.

Here's an overview of popular file formats.

Self-describing file formats

extension, name origin variable parameters (fixed; comments)
.au or .snd NeXT, Sun rate, #channels, encoding, info string
.aif(f), AIFF Apple, SGI rate, #channels, sample width, lots of info
.aif(f), AIFC Apple, SGI same (extension of AIFF with compression)
.iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits)
.voc Soundblaster rate (8 bits/1 ch; can use silence deletion)
.wav, WAVE Microsoft rate, #channels, sample width, lots of info
.sf IRCAM rate, #channels, encoding, info
none, HCOM Mac rate (8 bits/1 ch; uses Huffman compression)
none, MIME Internet (see below)
none, NIST SPHERE DARPA speech community (see below)
.mod or .nst Amiga (see below)

Note that the filename extension ".snd" is ambiguous: it can be either the self-describing NeXT format or the headerless Mac/PC format, or even a headerless Amiga format.

I know nothing for sure about the origin of HCOM files, only that there are a lot of them floating around on our system and probably at FTP sites over the world. The filenames usually don't have a ".hcom" extension, but this is what SOX (see below) uses. The file format recognized by SOX includes a MacBinary header, where the file type field is "FSSD". The data fork begins with the magic word "HCOM" and contains Huffman compressed data; after decompression it it is 8 bits unsigned data.

IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc). Compression is optional (and extensible); volume is variable; author, notes and copyright properties; etc.

AIFF, AIFC and WAVE are similar in spirit but allow more freedom in encoding style (other than 8 bit/sample), amongst others.

There are other sound formats in use on Amiga by digitizers and music programs, such as IFF/SMUS.

Appendices describes the NeXT and VOC formats; pointers to more info about AIFF, AIFC, 8SVX and WAVE (which are too complex to describe here) are also in appendices.

DEC systems (e.g. DECstation 5000) use a variant of the NeXT format that uses little-endian encoding and has a different magic number (0x0064732E in little-endian encoding).

Standard file formats used in the CD-I world are IFF but on the disc they're in realtime files.

An interesting "interchange format" for audio data is described in the proposed Internet Standard "MIME", which describes a family of transport encodings and structuring devices for electronic mail. This is an extensible format, and initially standardizes a type of audio data dubbed "audio/basic", which is 8-bit U-LAW data sampled at 8000 samples/sec.

The "IRCAM" sound file system has now been superseded by the so-called "BICSF" (for Berkeley/IRCAM/CARL Sound File system) software release. More recently, there has been an effort at Princeton (Prof. Paul Lansky) and Stanford (Stephen Travis Pope) to standardize several extensions to BICSF. A description of BICSF and the Princeton/Stanford extensions is available by anonymous ftp from ftp.cwi.nl, in directory /pub/audio/BICSF-info. This file contains further ftp pointers to software.

A sound file format popular in the DARPA speech community is the NIST SPHERE standard. The most recent version of the SPHERE package is available via anonymous ftp from jaguar.ncsl.nist.gov in compressed tar form as "sphere-v.tar.Z" (where "v" is the version code). The NIST SPHERE header is an object-oriented, 1024-byte blocked, ASCII structure which is prepended to the waveform data. The header is composed of a fixed-format portion followed by an object-oriented variable portion. I have placed a short description of NIST SPHERE on ftp.cwi.nl:/pub/audio/NIST-SPHERE.

Finally, a somewhat different but popular format are "MOD" files, usually with extension ".mod" or ".nst" (they can also have a prefix of "mod."). This originated at the Amiga but players now exist for many platforms. MOD files are music files containing 2 parts: (1) a bank of digitized samples; (2) sequencing information describing how and when to play the samples. See the appendix "The Amiga MOD Format" for a description of this file format (and pointers to ftp'able players and example MOD files).

Headerless file formats

extension
or name
origin parameters
.snd, .fssd
unsigned
Mac, PC variable rate, 1 channel, 8 bits
.ul
encoding
US telephony 8 k, 1 channel, 8 bit "U-LAW"
.snd?
signed
Amiga variable rate, 1 channel, 8 bits

It is usually easy to distinguish 8-bit signed formats from unsigned by looking at the beginning of the data with 'od -b The Apple IIgs records raw data in the same format as the Mac, but uses a 0 byte as a terminator; samples with value 0 are replaced by 1.

Sound formats and the Apple Macintosh

(Thanks to Bill Houle, <Bill.Houle@SanDiegoCA.NCR.COM>)

SOX/DOS MAC
Sound Formatfile exttypeMac program to convert to 'snd'
Mac snd .snd sfil [n/a]
Amiga IFF/8SVX .iff AmigaSndConverter, BST
Amiga SoundTracker .mod STrk ModVoicer
Audio IFF .aiff AIFF SoundExtractor, Sample Editor, UUTool, BST, M5Mac
DSP Designer DSPs SoundHack
IRCAM .sf IRCM SoundHack
MacMix MSND SoundHack
RIFF WAVE .wav SoundExtractor, BST, Balthazar
SoundBlaster .voc SoundExtractor, BST
SoundDesigner/AudioMedia Sd2f SoundHack
Sound[Edit|Cap|Wave] .hcom FSSD SoundExtractor, SoundEdit, Wavicle, BST
Sun uLaw/Next .snd .au/.snd NxTS SoundExtractor, SoundHack, au<->snd, UUTool, BST


[ Index | Previous Paragraph | Next Paragraph ]