< Previous PageNext Page >

Hide TOC

Legacy Document

Important: The information in this document is obsolete and should not be used for new development.

Core Audio Overview

This chapter will provide you with an understanding of the architecture of Core Audio, and how the various pieces fit together functionally.

In this section:

Apple’s Objectives
Introduction to Core Audio
Using Core Audio

Apple’s Objectives

In creating Core Audio, Apple’s objective in the audio space has been twofold. The primary goal is to deliver a high-quality, superior audio experience for Macintosh users. The second objective reflects a shift in emphasis from developers having to establish their own audio and MIDI protocols in their applications to Apple moving ahead to assume responsibility for these services on the Macintosh platform.

Some of the key features of the Core Audio architecture available in Mac OS X include:

A flexible audio format
Multichannel audio I/O
Support for both PCM and non-PCM formats
32-bit floating point native-endian PCM as the canonical format
Fully specifiable sample rates
Multiple application usage of audio devices
Application determined latency
Ubiquity of timing information
Both C and Java APIs

Figure 2-1 illustrates the Core Audio architecture in Mac OS X and its various building blocks.

Figure 2-1 The Core Audio Architecture

The theory of operation behind the Core Audio architecture is discussed in subsequent chapters of this document.

Introduction to Core Audio

Hardware Abstraction Layer (HAL)

Note: In its preliminary form, this document does not yet contain documentation for the Hardware Abstraction Layer. The final document will contain information on this technology.

The Hardware Abstraction Layer (HAL) is presented in the Core Audio framework and defines the lowest level of audio hardware access to the application. It presents the global properties of the system, such as the list of available audio devices. It also contains an Audio Device object that allows the application to read input data and write output data to an audio device that is represented by this object. It also provides the means to manipulate and control the device through a property mechanism.

The service allows for devices that use PCM encoded data. For PCM devices, the generic format is 32-bit floating point, maintaining a high resolution of the audio data regardless of the actual physical format of the device. This is also the generic format of PCM data streams throughout the Core Audio API.

An audio stream object represents n-channels of interleaved samples that correspond to a particular I/O end-point of the device itself. Some devices (for example, a card that has both digital and analog I/O) may present more than one audio stream.

The service provides the scheduling and user/kernel transitions required to both deliver and produce audio data to and from the audio device. Timing information is an essential component of this service; time stamps are ubiquitous throughout both the audio and MIDI system. This provides the capability to know the state of any particular sample (that is, “sample accurate timing”) of the device.

Audio Unit

An audio unit is a single processing unit that either is a source of audio data (for example, a software synthesizer), a destination of audio data (for example an audio unit that wraps an audio device), or both a source and destination (for example a DSP unit, such as a reverb, that takes audio data and processes or transforms this data).

The Audio Unit API uses a similar property mechanism as the Core Audio framework and use the same structures for both the buffers of audio data and timing information. Audio unit also provides real-time control capabilities, called parameters, that can be scheduled, allowing for changes in the audio rendering to be scheduled to a particular sample offset within any given “slice” of an audio unit’s rendering process.

An application can use an AudioOutputUnit to interface to a device. The DefaultOutputAudioUnit tracks the selection of a device by the user as the “default” output for audio, and provides additional services such as sample rate conversion, to provide a simpler means of interfacing to an output device.

Audio Codec

Audio codecs are the encoders and decoders available to the system for audio compression and decompression. Using the Audio Codec API for conversion between audio formats is deprecated in favor of using the Audio Converter API, described in the “Audio Toolbox” section.

Deploying an audio codec is performed by subclassing either the ACBaseCodec class or the ACSimpleCodec class, both provided in the Core Audio SDK. Once subclassed, the abstract methods (those set equal to zero) need to be provided, and the methods designated as virtual may be overridden as needed.

Audio Toolbox

This framework currently provides five primary services:

Audio Converter provides format conversion services. When encoding or decoding audio data, Audio Converter should be utilized, as it allows for many different type and format conversions. It also allows for conversions between linear PCM data and compressed audio data.
Audio Format is provided to help handle information about different audio formats. It is able to inspect AudioStreamBasicDescription instances to provide information about various aspects of an audio stream. Also, the Audio Format API can provide information about the encoders and decoders available on the system.
Audio File provides file services for dealing with creating, opening, modifying, and saving audio files. It features file-creation and format-specification capabilities, as well as reading and writing mechanisms and the ability to open files in the file system. Audio File uses a property system to keeps track of a file’s file format, data format, channel layout, and more.
AUGraph allows for the construction and management of a signal processing graph of Audio Units, managing the connections and run-time state of the units that comprise a particular graph, including run-time management of inserting or removing nodes. The ubiquitous timing information in the signal chain deals with both feedback and fanning.
Music Sequence services provide a sequence object made up of one or more tracks of music events (both system-provided and user-defined). Track data can be edited while a sequence is playing, and its data can be iterated over. A music sequence typically addresses a graph of audio units, where tracks can be addressed to different nodes (audio units) of its graph, or a MIDI endpoint. A music player is responsible for the playing of a sequence.

MIDI Services

Note: In its preliminary form, this document does not yet contain documentation for MIDI Services. Please consult the Core Audio SDK, available from http://developer.apple.com/audio, for more information on developing MIDI Services.

This framework provides the representation of MIDI hardware and the interapplication communication of MIDI data to an application. The MIDIDevice object presents a MIDI-capable piece of hardware. A discrete MIDI source or destination (16 channels of MIDI data) is represented by the MIDIEndpoint object. This may be a real device or another application that is presented to your application as a virtual MIDIEndpoint, thus providing the interapplication communication of MIDI data.

The framework provides the I/O service and hosts the drivers that are supplied by both Apple and third-party companies to represent that hardware within the system.

Core Audio Types

Core Audio utilizes a series of structures and constants to encapsulate various pieces of information. CoreAudioTypes.h includes these structures, used consistently throughout Core Audio:

AudioBufferList encapsulates buffer data.
AudioStreamBasicDescription encapsulates formatting information.
AudioTimeStamp holds time stamp information.
AudioChannelLayout specifies the layout of an audio sample’s layout.

In addition, many constants are declared, including channel layout constants, used in identifying the layout of audio sources, and format ID constants, useful when specifying the format of the audio data.

Using Core Audio

There are many tasks that you can accomplish with Core Audio. This section will outline the architecture of Core Audio, highlighting the various uses of Mac OS X’s audio technology.

Audio Data Operations

One of the main functions of Core Audio is to work with and manipulate audio data, that is either stored on disk or already in memory. Effects can be applied to the data, and data sources mixed. Beyond that, Core Audio is also responsible for pulling data from input devices, and outputting data back out. Finally, data can be put back out to disk as a file, and may be converted to another format.

Figure 2-2 Reading in an audio file

In order to use audio data from a file, it first must be read in. The Audio File API is provided for this purpose. An audio file instance can be created to act as a proxy for the file on disk, or for a buffer in memory (using callbacks).

Once the audio file has been created and bound to a file or memory, its data can be read in. If the data in the file is encoded, an audio converter is needed to convert the data into 32 bit floating-point Pulse Code Modulated (PCM) native-endian data, also known as the canonical format. Once the data is in this format, it is ready to be used in an audio unit or by another portion of Core Audio.

It is worth noting that an audio converter instance inherently uses the audio codecs available on the system. Using an audio codec directly for this kind of data conversion is discouraged, since the Audio Converter API takes care of the actual buffering and other considerations that need to be considered during a conversion.

To write a file back out to disk, simply reverse this process. Data output in the canonical format can be converted to an encoded format with an audio converter, and then saved to disk or memory via the Audio File API.

Figure 2-3 Converting audio files

Converting a file uses a process similar to the previous example. The Audio File API is used to open the file off of disk, an audio converter takes the incoming data and converts it to the desired format, and another audio file instance is used to save the data out to disk. Again, the codecs needed to decode the incoming data and encode the outgoing data are used automatically by the converter; for instance, it is not necessary for you to read in the encoded data, convert it to the canonical format, and then encode it in the resulting format before writing it out. This service provided by the Audio Converter API.

Figure 2-4 Playing back audio files

Playing back the contents of an audio file is one of the most common tasks that developers perform. In Core Audio, this is accomplished by reading in the data using an audio file instance. Once the instance is set up, an I/O unit instance can pull on the file, extracting the audio data and outputting it to the assigned audio device. If the data is encoded in the canonical format, no further decoding is needed to output the sound. If the data is encoded, it will need to be converted into the canonical format before it can be played back.

An I/O unit is a type of audio unit that acts as a proxy for an audio device. When data is sent to it, it will be relayed to the device that it represents. The most common use of this is to send data to the default output, as specified by the user. The unit to use in this case is the Default Output unit. A System Output unit is also provided, which is discussed is the next example.

Figure 2-5 I/O unit hierarchy

Each I/O unit inherits from AUConverter, an audio unit which owns an audio converter instance; this unit can be used in a graph to convert data between formats, sample rates, and the like. A GenericOutput unit implements adds the ability to start and stop the pulling of data to the output device.

When playing out to any piece of hardware, an AUHAL unit is needed. An instance of AUHAL can be attached to any audio device, making the instance a proxy for getting input and providing output to that device.

The Default Output unit is provided to play audio out to the user’s prefered output, as designated in the System Preferences. Likewise, the System Output unit is provided to play back to the current system out device.

Figure 2-6 Using an I/O unit for input and output

Any of these I/O units may be used to pull input data from its associated audio device, through any number or combination of audio units or audio unit graphs, and output back through the I/O unit. The I/O unit itself has two busses: 0 and 1, where the 0 bus is designated as the output, and the 1 bus is the input bus. The connections between the output of the 0 bus and the audio device and the input of the 1 bus and the audio device are made when the unit is associated with the device.

To process data from a device and play it back through, simply associate the device with the unit, connect the output of the 1 bus with whatever audio units or graphs are being used to process the data, and connect the output of those units to the input of the 0 bus on the I/O unit. To start the render, tell the I/O unit to render. This, in turn, will cause the unit to ask the units attached to it to render, eventually leading back to the I/O unit’s input bus, which will pull from the audio device. The data will pass through the input bus and will work its way through all of the attached units until it reaches the I/O units output bus, where it will automatically be output to the audio device.

Figure 2-7 Audio Format Services

When working with streams of audio data, information about the data and the formats that the system has available become important. The Audio Format API provides a mechanism to get information about audio data, like the available codecs for encoding and decoding information, the encoding information for channel layouts, and panning information for use with the Matrix Mixer audio unit. Also, the Audio File API provides a function useful for determining the available file formats on the system.

MIDI Data Operations

MIDI stands for Music Instrument Digital Interface. Established as the standard method of communication between music devices, Core Audio features full-fledged MIDI support, including provisions for communication with MIDI devices and reading-in and playback of standard MIDI files.

Figure 2-8 Reading in a standard MIDI file

The Music Sequence API is provided to sequence events for MIDI endpoints and audio units. One of its functions, though, is the ability to read in MIDI files and parse their contents into its tracks. Normally, each channel of MIDI data in the file can be made into one track in the sequence, allowing each track, and therefore, each channel of data, to be targeted at a different MIDI endpoint.

Figure 2-9 Music Sequence play through

To playback the MIDI file as audio data, a music player is assigned to a sequence, and the sequence’s tracks are assigned to a music device. A music device is a particular type of audio unit that generates audio data by having its parameters altered; in this case, the event track is assigned to a music device which is part of a graph, and the events in that track contain the parameter changes needed to affect the output of the music device. The graph itself is assigned to a sequence, so that the sequence knows which instances its tracks are assigned to. Beyond that, the music player assigned to the sequence communicates with the I/O unit at the head of the graph, to ensure that all timing issues for outputting sound to the unit’s assigned device are taken care of. This is done inherently when the sequence is assigned to the graph, and so no extra steps need to be taken in order for this synchronization to happen. The compressor is included in order to make sure a constant stream of data is being supplied to the I/O unit.

Figure 2-10 Music Sequence play through

To play MIDI data back through an attached MIDI device, an event track needs to be assigned to a MIDI endpoint, a proxy for a MIDI device. As with the previous example, the music player will inherently communicate with Core MIDI to ensure all timing issues are solved and that a constant amount of data is being fed to the MIDI sever, and therefore, the MIDI device.

Figure 2-11 MIDI device input

When a MIDI control surface is being used to control the properties of a software component, like an audio unit, it will be assigned to an endpoint, which in turn, is assigned to an AUMIDIController, which will parse the incoming MIDI signals into parameter changes for use with an audio unit.

To playback the signals generated by a MIDI keyboard, a similar scheme is used. An endpoint is assigned to the keyboard, and the signals coming from the keyboard are assigned to an AUMIDIController, which, in turn, will issue parameter changes to a music device. The music device will synthesize the audio data, based on the parameters given to it via the AUMIDIController.

Figure 2-12 MIDI input parsing

To take in MIDI data for saving, it is common to have already-existing data playing, while new data comes in and is recorded. The playback of existing data is handled as before, with the track being assigned to a music device, which outputs its data, via a graph, to an I/O unit. Beyond that, however, the data coming in from any MIDI device needs to be parsed and placed in another event track within the sequence. The Music Player API provides functions for determining when to place the events, based on the time the event happens.

Higher Level Audio Operations

Often, elements from the audio data operations and the MIDI data operations come together to provide a complete audio experience. These examples look at some cases where MIDI data is synthesized and also output to a MIDI device concurrently, or when events control a music device’s synthesis of audio data and the parameters of an audio unit, all while mixing in data from an encoded file being read off of disk.

Figure 2-13 MIDI synthesis and output

In this example, you can see a music sequence being used to control the synthesis of sound, via an audio unit graph containing a music device, while additional events are sent to a MIDI endpoint, which, in turn, are assigned to MIDI devices. This is common when using the Mac as another MIDI device, generating synthesized data to accompany an external MIDI device. Note that the music player automatically takes care of all timing issues between the different outputs, ensuring that the output remains in sync.

Figure 2-14 Mixing MIDI and audio data

This example focuses on an audio unit graph, which is used to mix synthesized MIDI data, via a music device, and audio data coming in from a file. This scenario is common in gaming situations, where ambient noises are saved as MIDI data, and the sound track is an encoded file on disk. Note that the sequence controls a 3D Mixer audio unit, often used to mix various audio sources and to provide a spacial orientation for the sources and the output. As with the previous example, the music player will ensure that the output is in sync with the sequence.

Interfacing with Hardware

Most of the processing done with audio and MIDI data in Core Audio will eventually be played back via audio or MIDI hardware. As a developer, it will be helpful to you if you understand the architecture behind the hardware interfaces, even if an abstraction is used when developing an application.

Figure 2-15 Audio hardware architecture

When accessing audio hardware, whether it be via on-board audio inputs and outputs, USB, or other means, a driver must exist to handle the exchange of data between the hardware and the Mac. In order for the driver to be used by Core Audio, it must conform to the IO Audio Family of IOKit drivers; this means that the driver music implement IO Audio Device functionality within the driver, in order for proper communication to exist between itself and the Hardware Abstraction Layer.

The Hardware Abstraction Layer, or HAL, is provided to make discovery and access to audio hardware simpler. Each driver in the IO Audio Family is represented as an audio device in the HAL. To make communication with audio devices easier, and I/O unit may created and bound to an audio device, allowing a device to be used as a source, destination, or both in connections with audio units and audio unit graphs. This is common and encouraged when working with audio hardware.

Figure 2-16 MIDI hardware architecture

The MIDI hardware architecture is different than that of audio hardware, in that MIDI drivers are in user space, usually working with default drivers provided by the operating system. This means that raw incoming and outgoing data is passed between the hardware and the MIDI driver, and the MIDI driver takes care of the formatting and preparation of the data. The MIDI Server than works with Core MIDI, routing MIDI data via endpoints, the abstraction provided to allow for easy access to MIDI devices.

< Previous PageNext Page >

Hide TOC

Did this document help you?
Yes: Tell us what works for you. It’s good, but: Report typos, inaccuracies, and so forth. It wasn’t helpful: Tell us what would have helped.