Archives - 1986

Archives Main | 1986

 

Byte

June 1986
Digital Music Synthesis
By Robert Moog

The many different shapes of the waveform of the present

MUSIC IS ONE of the most information-rich (wide-bandwidth) forms of human communication. A compact disk, for instance, uses nearly 1.5 megabits per second to faithfully transmit a stereo recording, as opposed to the several hundred bits per second needed to transmit a written message as fast as you can read it. Only video, the faithful transmission of which requires over 50 megabits per second, has a significantly higher information density.
Music is also a highly structured form of human communication. The hierarchy of a piece of music may be as deep as that of the federal bureaucracy: Notes, phrases, lines, sections, and movements are carefully arranged to heighten and clarify the intent of the music.
These two general properties of music, wide bandwidth and complex structure, happen to match the information-handling capabilities of today's personal computers. In addition to the personal computer's wide bandwidth (or high speed) and information-organizing and -processing capabilities, you can access a growing list of instruments, accessories, and components designed specifically to produce musical tones in response to high-level digital instructions. These devices owe their existence to rock 'n' roll, the consumerization of digital audio, and dramatic advances in LSI (large-scale
integration). They employ a wide variety of sound-producing techniques, each with its own set of features and limitations. This article discusses the general attributes of musical sound and how to produce it, the capabilities of specific sound synthesis techniques, how musicians are using these techniques, and what you can expect the future to bring to digital music.

THE PROPERTIES OF MUSICAL SOUND

Music is an arrangement in time (and to a lesser but still important extent, in space) of a collection of sonic events generally called notes. This is actually a subjective description of music. We hear individual notes only because our ears and mind pick acoustic information apart into events that we perceive to be distinct. What actually exists outside our ears is an ongoing series of vibrations of the air. The graph of air pressure versus time is the waveform, an unbroken pattern, present even in the quietest of sound-proof rooms. The ratio between the height of a sound waveform that you can barely hear and one that is so loud that it begins to hurt is about one to a million. That's about 120 decibels. Music, speech, and other normal sounds occur in the upper 60 dB of your hearing range.
You can describe any waveform that tends to repeat as a collection of frequency components, each one of which has a sine waveform. This is the spectrum of a sound. Thus, any sound has two complementary equivalent representations: its waveform and its spectrum. The waveform is the sound's time-domain representation; the spectrum is its frequency-domain representation. Together they are capable of full describing a sound. The relation of the waveform to the spectrum is described by a mathematical relationship called the Fourier theorem.
The waveform of a very simple sound-for example, a tone from a laboratory audio oscillator-does not change with time. Real musical sounds, however, are never steady.
They constantly change as they evolve. Piano tones, for instance, begin loud and bright and then decay to silence in a complex way that's characteristic of the instrument. These variations are essential determinants of the sound's characteristic tone color. They are neither entirely random nor entirely regular, nor are they undesirable deviations from a perfectly steady tone.
In the early days of musical acoustics, the importance of the details of a tone's evolution was not generally recognized. Acoustics text-books often showed a single cycle of a waveform and labeled it violin or oboe. Musical-instrument engineers now recognize that, in determining a sound's tone quality, or timbre, the steady-state waveform takes a back seat to the parameters that describe how sound changes as it evolves. For this reason, many synthesis techniques are important primarily because they allow important parameters of the generated sounds to be precisely and continuously varied. The exact shapes of the sound parameter variations may be generated explicitly by a musician's real time control or may be determined by a set of function generators that are connected to, but separate from, the sound-waveform generator itself. The rightmost block indicates the audio-waveform generator which produces the audio waveform itself. It may, for instance, be an analog synthesizer module, a hard-wired digital oscillator, or a waveform-generating routine run by a microprocessor. The block to its left represents the control-function generators, a set of time-varying function generators whose outputs continuously control the properties of the audio waveform. In general, the control functions are simpler and more slowly moving than the sound waveform and are often (but not always) produced by software routines.
The leftmost box on the bottom, representing coefficients and boundary conditions, produces the commands that specify the shapes of the control functions. Generally (but not always) these commands are a brief set of time-invariant numbers that provide the initial boundary conditions and coefficients of the control functions. Finally, the box on the top, real-time control, represents time-varying functions of arbitrary shape, such as those that a musician may wish to impart by hand. Real-time control may change the coefficients of the control functions or may be added to the control functions themselves to directly modify the sound waveform. The waveform generator is an analog voltage-controlled oscillator (VCO) coupled to a voltage-controlled amplifier (VCA). Control functions consist of a low-frequency oscillator (LFO), two exponential rise-and-fall- or envelope (ENV)-generators and slowly varying functions that control another VCA. The tone setup is a "tone-color preset" whose numbers give the time constants of the envelopes and the frequency of the LFO output. The real-time control is a keyboard controller whose outputs give the VCO's center frequency (musical pitch) and provide the trigger that starts the ENVs. The resultant tone has a frequency modulation (vibrato) that builds up at the rise time of the first ENV output; the tone itself builds up at the rise time of the second ENV output.

 

DIGITALLY CONTROLLED ANALOG CIRCUITRY

The first synthesizers were analog. In analog synthesizers, VCOs produce waveshapes such as sawtooth and square because they are easy to produce and are rich in harmonics-spectrum components whose frequencies are whole-number multiples of the waveform's repetition frequency. One or more VCFs (voltage-controlled filters) alter the relative strengths of the harmonics, thereby modifying the overall brightness or quality of the sound. VCAs dynamically shape the amplitude of the tone as well as the amplitudes of control signals. The resultant class of tone colors from analog synthesis includes some interesting approximations of traditional instrumental sounds. More commercially important, however, were new sounds that fit into the emerging electronic pop music of the
sixties and seventies. Smooth pitch glides of swept VCOs, the vocal-like 'wow' sounds of swept VCFs, and the fat, rolling sound of several sawtooth waveforms at nearly the same frequency became basic weapons in the rock n roll keyboardist's arsenal.
Microprocessor-controlled analog synthesizers first appeared commercially just eight years ago and continue to be popular today The Oberheim Xpander, for example, is an advanced six-voice instrument with its own self-contained microprocessor-based programming panel. The Xpander is specifically designed for sophisticated communication with the outside world through MIDI. In fact, you can activate virtually all of the Xpander's panel features externally through Oberheim's MIDI system-exclusive code set. The panel features provide complete control over 15 analog operating parameters per voice (one of which is a 15-position filter-mode selector), as well as literally hundreds of microprocessor-computed control functions. Thus, although the actual generation and modification of the Xpander's musical tones are performed by analog circuitry the amount of control that is accessible via MIDI gives this instrument (and many other contemporary analog synthesizers as well) the same order of programmability and versatility as many all-digital synthesizers.

PHASE DISTORTION: THE CASIO CZ SERIES

About three years ago, Casio introduced the CZ-l0l, the first in its line of fully digital, fully programmable synthesizers. It is a four-voice multitimbral instrument that has some similarity to analog synthesizers, both in the way it is programmed and in the sorts of sounds that result.
Like analog sound chains, the CZ algorithm has one parameter that determines the tone's pitch, a second that determines its brightness or tone color, and a third that determines its overall loudness. The main difference between the CZ and analog synthesizers lies in exactly how the tones brightness is shaped. In the analog world, the VCF performs brightness control. Analog filtering is a frequency-domain operation, which in analog technology, is no harder than time-domain operations.

The digital world however, generally avoids frequency-domain operations
because of the expensive hardware required to perform the many high precision multiplications per wave-form point. The time-domain operation of waveshaping, on the other hand, can achieve the same sort or spectral variation as dynamic filtering with little or no multiplication. Casio engineers designed their algorithm to produce waveforms whose shapes can be swept continuously from pure sine to one of eight user-selectable high-brightness 'analog sound-alikes.' The algorithm centers around a look-up table in which the instantaneous amplitude of a cycle of a sine wave is plotted against uniform increments of the sine waves phase angle. When a pure sine-wave output is desired the phase angle is advanced in equal increments per unit time: when a waveform of higher harmonic content (i.e.. a somewhat distorted sine wave) is desired, the phase angle is incremented first more rapidly, then more slowly during each cycle. Casio calls this algorithm PD. for phase distortion.
By using the concepts of analog synthesis as a starting point but employing an algorithm that is efficiently matched to the capabilities of digital technology, the Casio CZ series instruments offer the musician many stock synthesizer effects, with the versatility and accuracy of control that you associate with any well-designed microprocessor-based operating system, at a low price. The CZ-l0l, for instance, sells for less than $500, an amount that, lust 10 years ago, would barely have bought a medical minimum analog synthesizer with one voice and no program memory.

 

FM SYNTHESIS: THE YAMAHA DX SERIES

Frequency modulation (FM) is the variation of the frequency of one repeating waveform, the carrier, by an amount proportional to the instantaneous amplitude of a second wave-form, the modulating wave. The simplest application of FM is the modulation of one sine wave with another. The mathematical expression that describes this is W(t)=P sin(At-sin Bt). This equation tells us that waveform W is a sine wave of peak amplitude P and frequency A and is being sped up and slowed down a peak amount I at a frequency B. A is the carrier frequency, B is modulating frequency, and I is called the modulation index.
The spectrum of W is a series or sidebands, or sum and difference frequencies:A ±B:A ± 2B.A + 3B and so on. Calculation of the amplitudes of each of the sidebands requires an understanding of Bessel functions which are mathematical functions describing how the amplitude of a harmonic changes. This is a complex subject in it self. The general results of these calculations however, can be stated simply.
1. As I increases, the amount of energy in A goes down, and the amount of energy in the sidebands goes up.
2. As I increases, more and more frequencies become audible. In other words, the bandwidth of the total spectrum of W increases.
If you set the modulating frequency B equal to the carrier frequency A, the sideband frequencies are then whole-number multiples or harmonics of the carrier frequency. Starting with two sine-wave generators and tying the instantaneous frequency of one to the instantaneous amplitude of the other, you can generate a single complex tone with a large number of harmonics. Furthermore, you can change the overall harmonic content of the tone simply by varying one parameter, the modulation index I. By invoking this simple algorithm that operates in the time domain, you gain convenient control over the sounds spectrum.
The Casio PD algorithm uses time-domain processing to generate and control harmonics too. But the advantages of FM over PD lie in what you can do in FM by changing the ratio between the carrier and modulating frequencies.
Most acoustically generated musical sounds have a complex internal motion that makes them interesting and pleasant to listen to. An important part of this motion is due to the slight deviation in the frequencies of the harmonics from perfect whole number ratios with the fundamental pitch. For instance the harmonics of a piano tone are all slightly sharp (high). Translated into the time domain, this means that a piano waveform does repeat exactly every cycle but changes slowly and continuously as the tone evolves. The ability to detune the harmonics by slightly shifting the modulating frequency gives FM the ability to generate a wide variety of continuously changing waveforms that musicians often describe as warm, fat or acoustic. Serious synthesists prize this capability and spend a lot of time exploiting it.
The advantages of FM were understood by analog-synthesizer designers, but the analog technology of the sixties and seventies did not permit accurate, wide-range, and efficient production of FM waveforms. John Chowning was one of the first people to experiment with digital production of FM sounds. Using a research computer at Stanford University in the early seventies, he systematically explored the relationships between the values of the coefficients of the FM algorithm (A. B. and I) and the resultant tone colors. His work led to the development of a series of commercial keyboard instruments by Yamaha, the latest of which are the DX and TX series digital synthesizers. One of these, the Yamaha DX-7, has become enormously popular among electronic keyboardists; well over 100,000 DX-7s reportedly have been sold.
In the DX-7, the basic algorithmic element is a digital oscillator whose output is shaped by a four-segment envelope. Yamaha calls this element an "operator." Six operators are available for each voice; the complete instrument can simultaneously produce up to 16 voices. The musician may choose one of 32 preprogrammed algorithms, which are configurations of operators.

HARMONIC SYNTHESIS

The most powerful of all synthesis techniques, and the least amenable to intuitive exploration, is harmonic synthesis. This is where the musician explicitly specifies the amplitude envelope and frequency of each harmonic of the tone. In theory, harmonic synthesis is the only way to accurately synthesize arbitrarily complex, pitched tones. In order to do it, however, you have to specify as many as 100 or more harmonic amplitude envelopes for every tone color. In the mid-seventies, Dr. Hal Alles of Bell Laboratories developed a sophisticated music system based on harmonic synthesis. The harmonics were generated by incrementing through a high-precision sine-wave lookup table at different rates. The problem became how to shape the amplitudes of all those harmonics without spending a fortune on high-speed multipliers. Alles's solution was ingenious: For every harmonic, read out two sine waves that are of the same frequency but displaced by a slowly varying phase angle. Subtract one from the other. The result is a sine wave whose amplitude is determined by the phase angle. Alles's design eventually entered the marketplace as the computer-based General Development System and later as the Synergy a keyboard synthesizer with limited internal programming capability but with a computer interface that provides full programming access. Few people ever met the programming challenge of these instruments. One person who did is Wendy Carlos, perhaps best known as the producer of Switched-on Bach, A few years ago. Wendy combined her programming skills with unique musical intuition to develop a set of orchestral-like tone colors for the Synergy. She spent some 3000 hours over a two-year period to develop the sounds, which have been made available to Synergy owners. These sounds can be heard on Carlos's record Digital Moonscapes (the compact disk is Columbia MK 39340).
More recently, harmonic synthesis is being used in commercially available musical instruments whose sounds are preprogrammed by the manufacturer, one example is the Kurzweil 150, a MIDI-controlled expander that produces high-quality piano and similarly complex sounds by using proprietary synthesis techniques in addition to harmonic synthesis. Another recently announced product along the same lines is the Roland MKS-20. Both instruments provide the musician with access to a few global sound parameters but not to the fine details of the envelope of the individual harmonics, which are factory-programmed,

SAMPLING INSTRUMENTS

A sampling instrument records, encodes, and stores one or more musical sounds from the external "real world" and then replays those sounds on command. Some sampling instruments- for example, the Kurzweil 250, "The Kurzweil 250 Digital Synthesizer"- use proprietary data-compression schemes to reduce the amount of waveform memory without degrading the quality of the sound. All, however, produce their sounds from completely general digital representations that allow any sound short enough to fit in the instrument's memory to be played back. The differences among the various sampling instruments lie in the sound quality an instrument's hardware is capable of and in the sound modification and manipulation algorithms it can perform.
Some musicians assert that sampling instruments are not really synthesizers because the waveforms are not generated by algorithms. I don't believe that algorithmic generation of waveforms is a necessary feature of synthesizers. The term synthesize means "to produce by combining separate elements". The more sophisticated sampling instruments enable the musician to mix waveforms, reverse their direction in time, displace them both in time and in frequency and impart slow frequency modulation and complex envelopes. All of these are perceived as "separate elements" that the musician combines at his or her discretion. Ergo, sampling instruments are definitely synthesizers,

CHIP-LEVEL SYNTHESIS HARDWARE

There are also some music synthesis chips available if you would like to experiment with high-quality music synthesis but would rather build it yourself.
Complete high-performance analog functional modules exist as single chips that require a minimum of support circuitry. Voltage-controlled oscillators, filters, and amplifiers, and an assortment of other musical functions are available from Curtis Electromusic and Solid State Microtechnology. You program these chips with analog-control voltages, so you will need high-resolution (at least 12-bit) multi-channel D/A converter to go between your computer and the chips. The chip outputs are high-quality audio,
In the class of digitally controlled synthesis chips, there are programmable waveform generators that accept high-level mode-select and frequency commands and deliver waveform points in real time. The Cybernetic MicroSystems CY360, for instance generates all the stock synthesizer waveforms, and much more, over the audio and subaudio frequency range.
If you'd like to try your hand at some sampling hardware, consider the old MSM 5218 real-time data-compression/expansion chip. Use with a conventional audio A/D converter and a modest amount of support circuitry this chip reduces a 12-bit data stream (representing the uncompressed audio waveform) into a 3- or 4-bit data stream for efficient storage in your computer's memory and then restores the audio to its 12-bit glory upon playback.

There are many more chips that fulfill music synthesis functions. Many are proprietary designs that are used in commercial products. Generally, neither the applications data nor the chips are available to experimenters. For those of you who enjoy reverse-engineering custom LSI, that's an irresistible challenge.

SOFTWARE MUSIC SYNTHESIS ALGORITHMS

Most music synthesis devices use dedicated high-speed hardware to produce sound waveforms in real time. If you are interested in synthesizing music off-line at slower than real time, you need little more than a personal computer with plenty of memory and a high-quality D/A converter to turn the computer's waveform data into audio. This is where synthesis programs come into play.

Pressure-sensitive keyboards will allow keyboardists to control every note expressively.

Vosim, which stands for voice simulation, is a program that assembles waveforms from single sine cycles to create a wide variety of vocal-like tones. You specify three slowly varying control parameters: the number of sin cycles per waveform cycle, the time spacing between the sine cycles, and the rate at which the sine cycles die out within a single waveform cycle.
The Karplus-Strong algorithm provides an easy way of synthesizing sounds that evolve from bright to muted, like a plucked string, You start with a wave table of a single waveform cycle, then read the numbers out to create the sound, As you take a number from the table, you replace it with an average of that number and the one that was pulled out before it. This smooths out the sound waveform as it evolves, thereby reducing the high-frequency content of the sound.
FM produces a waveform by varying the rate at which numbers from a sine-wave table are read out. A wave table is a one-dimensional array. If you used a two-dimensional array of numbers instead and superimposed a closed, or almost closed, curve on that array, you could read numbers from it by following the curve. By changing the shape, size, and position of the curve you change the resulting waveform. This is called "synthesis by functions of two variables". This is a wide-open area for exploration and would certainly yield sounds that we haven't heard yet.

TRENDS FOR THE FUTURE

As semiconductor memory prices continue to drop, it becomes feasible to increase the amount of memory devoted to waveform storage in sampling synthesizers and control-function storage in harmonic synthesis instruments. In both cases, the achievable sound quality improves.
Along with semiconductor memory, bulk data storage prices are dropping dramatically. In particular, the storage capacity of CD-ROMs (over 500 megabytes) allows instrument manufacturers to supply enormous sound libraries for any digital synthesizer, but especially for sampling synthesizers. In the near future, we can expect CD-ROMs to be common components in many types of synthesizers.
As the synthesis capability and sound quality of synthesizers continue to grow and the hardware goes down in price while traditional acoustic instruments continue to go up in price, microprocessor-based musical instruments will assume an increasingly larger role in our everyday music making. The next few years will see wide acceptance of home synthesizers, complete with authentic simulations of traditional tones, user friendly operating systems, and provision for computer interfacing through MIDI.
More and more musicians of all persuasions will come to regard computers as basic music-production tools. Music will be composed directly on the monitor screen, and publication-quality scores will be generated on a laser printer.
Finally, the proliferation of high-performance, 16-bit microprocessors enables musical instrument designers to build in sensitive, real-time performance control. More and more keyboards will be pressure-sensitive, allowing keyboardists to control every note expressively. And, as musicians accept the idea of pressure-sensitive keyboards, adventurous experimenters will design and build multi-dimensional user interfaces, allowing musicians to control several tone parameters with each finger. An example of this potential exists today in the Notebender, a keyboard on which each key moves up and down and back and forth, thus allowing the player to continuously control two parameters of each key that he or she plays.
The popular attitude is that there is something subhuman and mechanical about digital electronics. Today's musicians know that just the reverse is true-that the fantastic capabilities of microprocessors and synthesizers, and all the devices they connect to, offer musicians new and exciting resources, greater human control, and heightened creative potential.