Spatial Sound: An Introduction

Spatial hearing

The hearing scientist Georg von Békésy once said that the purpose of the ears is to point the eyes. As with vision, hearing is three dimensional. We not only hear sounds to the left or right, but also up or down and near or far. How we do this has been studied for a long time, and although some mysteries remain, the major mechanisms are well understood. For example, it is well known that the primary right/left or azimuth cue comes from the difference in the times at which sound waves arrive at the two ears, and the primary up/down or elevation cues come from the spectral changes produced by the outer ears or pinnae. By manipulating these cues, it is possible to change the apparent location of a sound in space.


In the past few years, interest in the computer synthesis of 3-D sound has increased significantly. In several important areas, accurately synthesized spatial sound is of great value and of growing importance: human/computer interfaces for workstations and wearable computers, sound output for computer games, aids for the vision impaired, virtual reality systems, "eyes-free" displays for pilots and air-traffic controllers, spatial audio for teleconferencing and shared electronic workspaces, and auditory displays of scientific or business data.

Multichannel versus two-channel methods

The "3-D" sound cards in many personal computers are remarkably effective at controlling the azimuth (left/right location) of synthesized sounds. However, controlling the elevation and the range is still problematic.

The simplest way to produce three-dimensional sound is to physically position loudspeakers at many different points in space. However, this multi-channel approach is both cumbersome and expensive. Fortunately, because we have only two ears, it is also possible to generate fully three-dimensional sound using only two-channels.

HRTFs -- Head-Related Transfer Functions

The key to this binaural approach to generating synthetic spatial sound is the so-called Head-Related Transfer Function or HRTF, for short. The HRTF captures the location-dependent spectral changes that occur when a sound wave propagates from a sound source to the listener's ear drum. These spectral changes are due to diffraction of the sound wave by the torso, head, and outer ears or pinnae, and their character depends on the azimuth, elevation, and range from the listener to the source. In general, the HRTF is a complex function of the location of the source relative to the listener, as well as the physical size and shape of the particular listener. When a sound signal is filtered by accurate HRTFs and sent to the listener's two ears (for example, over headphones), the synthesized sound is experienced as a virtual source at the desired location in space.