Abstract
We discuss some theoretical and practical aspects of real-time granulation of sampled
sounds, such as windowing, grain overlap, synchronicity, and control through high
level events. Our analysis of the trapezoidal window has shown that it approximates
the response of a Gaussian window, with the addition of comb-shaped spectral effects.
Zeros are proportional to the position of the 'corners' of the window. Therefore, we
call the artifacts as the 'corner effect.' |
This paper discusses some of the processes involved in granular synthesis (GS), in an effort to identify relevant variables in granular temporal and spectral transformations. Windowing, AM effects, grain overlap and their interaction produce complex time-varying spectral profiles. We address these issues in relation to the implementation of MacPod [11], a real-time GS system for the Macintosh PowerPC which is based on Truax's (1988) POD system. Furthermore, we discuss some new concepts and techniques relevant to the development of ecologically-based sound resynthesis, namely, the use of local or global parameters to define granular events, the control of phase-synchronicity among streams, and the simplification of windowing by using pre-stored grains [5].
The implementation of a real-time granular synthesis (GS)
system on a personal computer presents two basic
challenges: (1) an efficient use of computational resources
to generate high grain densities, (2) a simple and intuitive
organization of synthesis parameters to facilitate real time
control. The first issue is directly related to the synthesis
engine of the system, that is, how the source sounds are
windowed and mixed. The second issue corresponds to the
control level of the system, which concerns how the
synthesis parameters are generated and how the user
performer-composer interacts with them.
The complex
The interaction among processes in asynchronous GS
generates rich sound results with fairly little source
material. We have identified four causes for the increased
complexity of granulated sound. (1) By applying an
envelope, or window, we produce a signal equivalent to
the convolution of the impulse response of the window
and the sampled sound. In other words, the window applied
on the original signal causes a resonant main lobe and
several spectral sides lobes which smear the original
spectrum. (2) At subaudio grain rates, amplitude
modulation adds upper and lower components to the
granulated sound. The spectral modifications are
proportional to the spectral content of the signal and the
grain rate applied. (3) The overlap among grains in
different voices produces time-varying cancellation and
reinforcement which also modify the spectrum of the
original signal. (4) When time-stretching is applied to a
single sound file, time-delayed copies of the granulated
signal are overlaid. This process produces temporal and
spectral effects that depend on the stretch-ratio being used.
The window
Windowing effects in audio signal processing are generally well understood. Window functions used in spectral analysis, such as von Hann and Hamming, minimize unwanted artifacts but increase the computation time [7, p. 149]. To reduce computational cost, the earliest real-time granular synthesis systems [12] used simple trapezoidal windows to good aural effect.
We have focused our research on the effects of the lowly trapezoidal grain window, within the context of asynchronous granular synthesis. The trapezoidal window, in fact, resembles the popular Gaussian window, with the addition of ripples that produce an effect aurally similar to comb-filtering. Zeros are proportional to the position of the 'corners' of the window and hence, we call the artifacts as the 'corner effect.' (Fig. Spectrum of a trapezoidal window).
While undesirable for most signal processing applications,
this filtering effect is unobtrusive in GS. As we will
discuss in 'The Overlap' section, aurally similar
modifications of the signal are inherent in the GS
technique due to the delay between overlapping grains.
Therefore, we can confidently state that complex
windowing is unwarranted for granular synthesis at
medium-to-high grain densities. We invite the reader to
compare the spectral effect of a triangular window with a
trapezoidal window using identical synthesis settings.
Both spectrograms show very similar results, with a slight
'smearing' of the spectrum when the trapezoidal window is
used.
0dB
![]() 0kHz 20kHz
-96dB
Spectrum of a triangular window.
|
0dB
![]() 0kHz 20kHz
-96dB
Spectrum of a trapezoidal window.
|
Our particular focus is the application of GS to modeling
environmental sounds. High grain densities (approaching
one thousand grains per second) are needed to model
complex, time-varying sound events. The chief objection
to densities of this magnitude in real-time systems is the
inefficiency of windowing and mixing the grains [2].
Using the trapezoidal function, however, we achieve real
time GS with the required density on a standard Macintosh
PowerPC.
The overlap
The grain overlap is defined as the time interval during which two or more grains are sounding simultaneously [4]. An average grain overlap can be estimated by the difference between the average grain rate and the grain duration. If grain duration is longer than the grain rate, overlap occurs. Thus, there are three possible configurations: (1) negative overlap, there is a delay between the end of a grain and the onset of the following grain, (2) no overlap, a grain starts when the previous ends; (3) positive overlap, before a grain ends the next one starts. In batch implementations, there can be as many overlapping grains as memory and patience allow. On the other hand, real-time constraints place a limit on the number of simultaneously sounding grains. MacPod can achieve up to 20 simultaneous grain streams, with a minimum grain rate of one millisecond.
Although some GS systems combine several grain streams into a single voice [1], it is conceptually clearer to conceive each voice as a separate stream. Thus, overlap can be controlled from a unique parameter which stands for the coincidence [3], or phase-synchronicity, among grain onsets in all active voices. Following the central limit theorem [8, p. 174), it is reasonable to state that if each grain stream is defined as an independent random process, the overlap distribution will eventually approach a Gaussian probability distribution.
Careful control of phase-synchronicity among grain onsets
in different streams produces transformations in the
temporal and spectral profile of the granulated sound. With
very fast grain rates - under 5 ms. - using pitched sample
material, we obtain formants akin to those produced by
FOF synthesis. A small delay between grain onsets adds
volume (as defined in [13]) to the original signal,
producing an effect akin to early reflections in a
reverberant space. Of course, we must keep in mind that
all these processes are independent from the asynchronous
grain rate established for each stream.
Synchronous stream |
Asynchronous stream |
Phase-synchronous streams |
Phase-asynchronous streams |
Within the context of ecologically-oriented resynthesis,
phase-synchronicity is especially meaningful in the
simulation of attacks. In stricking a solid object, most
resonant frequencies will be excited in the first fifty
milliseconds or less. Contrastingly, if the excitation is
produced by several small objects, each impact will excite
different frequencies at various time delays causing a
granular sound texture. This type of sound can be heard
when walking on glass pieces or on snow.
The stream
A grain stream generator produces a series of grains with a
given frequency, amplitude and duration. These parameters
can vary in time. The concept of grain generator implies
that only a single grain can be produced at a time. Thus,
when more than one simultaneous grain is desired (to
produce overlaps) several grain generators have to be used.
This introduces the need to define the phase relationship
between the grain streams. The phase-asynchronous
implementation, as found in asynchronous GS, produces
streams which are completely independent. If the time
among the grains in different streams is to be controlled, a
phase-synchronous approach is necessary. As we stated
before, in this case the grain onsets can be synchronized
across streams or a short delay may be used. Therefore,
there are three possible configurations: (1) a single stream
generator, (2) multiple phase-asynchronous stream
generators, and (3) multiple phase-synchronous stream
generators.
The waveform
GS techniques have used different types of source material: (1) sine waves, in FOF synthesis [10]; (2) FIR filters derived by spectral analysis, in pitch-synchronous granular synthesis; and (3) sampled sounds, in asynchronous GS [12], FOG, and pulsar synthesis [9]. Ecologically-based resynthesis adds the option of using pre-stored sample grains [6].
More especifically, in ecologically-based GS we create a
grain pool before the synthesis stage, instead of retrieving
arbitrary segments of the sound file. The samples keep the
spectro-temporal characteristics of the short original
sounds, avoiding the 'blurring' effect that occurs in
asynchronous GS [9]. These samples are placed on a time
frequency grid according to meso-level time patterns which
are, in turn, designed to match the temporal characteristics
of naturally occurring sounds, e.g., bounce [6]. Given that
this approach simplifies the windowing process, it may
provide a good alternative to existing real-time methods.
The pointer
GS systems access the sound database contents in four different ways to: (1) incremental, the file is read from beginning to end; (2) loop, the file is read repeatedly from beginning to end; (3) cycle, the file is read repeatedly from beginning to end and backwards; and (4) random, the file is read at random locations.
The current implementation of MacPod, following the
POD model, uses a single pointer to source material.
Interestingly, the effect of the overlapping grains can be
simply explained as a comb-filter delay. If one assumes a
fixed grain envelope, an asynchronous grain six
milliseconds later than the original is simply a six-
millisecond delay mixed in with the original signal. By
keeping the resolution at a sample level, we are able to
explore a variety of spectral transformations - at subaudio
rates - and reverb-like effects at slower rates.
The event
A logical implication of the ecological approach to sound
resynthesis is to establish the sound event [6] as a high
order unit of sound generation. Resynthesis parameters are
thus directly linked to a finite time length. Rate of change
is scaled according to the length of this event. Instead of
fine-tuning unrelated parameters (such as amplitude or
frequency of a given grain stream), transformations of a
sound event are carried out along correlated variables
within ecologically valid time ranges.
We point out two possible strategies: (1) High-level
events are defined by global settings. These settings define
ranges of possible values for the local parameters. (2)
Local parameters determine the overall behavior of the
high-level event. For example, the density of an event can
be defined by two global parameters: duration and quantity
of grains. If grains with fixed duration are evenly scattered
along a predefined time span, we get an invariant average
density. But let's say that we want to have a dense
distribution that changes linearly to a sparse one:
If synthesis parameters vary independently, we will spend
several trials until we find the right amount of grains and
the right rate of change in distribution. On the other hand,
by using grain overlap as the only control variable and
letting the quantity of grains and the overall duration
change accordingly, we will be dealing directly with the
relevant perceptual parameters. In this example, the only
high-level variable that needs to be defined is the rate of
change in grain overlap.
The conclusion
We have investigated several issues involved in the implementation of a real-time granular synthesis application. The focus of our work has been the efficient use of computational resources, and a simplified method for synthesis parameter control.
Our results point to two effective approaches to windowing: (1) the use of a trapezoidal function, as suggested by Truax (1988), (2) the use of a grain sample pool, as implemented in ecological sound resynthesis. By applying a trapezoidal window, we obtain aurally effective results with a drastic reduction of computational time. This type of window produces a spectral profile which depends on the placement of the 'corners' of the trapezoid. Thus, what has been regarded as an unwanted artifact by DSP theory, becomes a useful parameter for sound synthesis.
Our current efforts are concentrated on bringing the
ecological perspective to the real-time realm. By using
events instead of low-level control parameters, we pave the
way to a more intuitive interface between user input and
sound output. At the other end, the independence in grain
rate control and the resolution of grain overlap at a sample
level permit not only to work on the temporal
characteristics of the sound, but also to shape its spectral
profile.
[1] Behles, G., Starke, S., & Röbel, A. (1998). Quasi synchronous and pitch-synchronous granular sound processing with Stampede II. Computer Music Journal, 22(2), 44-51.
[2] Cook, P.R. (1997). Physically informed sonic modeling (PhISM): synthesis of percussive sounds. Computer Music Journal, 21(3), 38-49.
[3] Dziech, A. (1993). Random Pulse Streams and their Applications. Warszawa: Elsevier.
[4] Jones, D.L., & Parks, T.W. (1988). Generation and combination of grains for music synthesis. Computer Music Journal, 12(2), 27-33.
[5] Keller, D. (1998). ". . . soretes de punta." Compact disc Harangue II. Burnaby, BC: Earsay. http://earsay.com
[6] Keller, D., & Truax, B. (1998). Ecologically-based granular synthesis, Proceedings of the International Computer Music Conference. Ann Arbor, MI: University of Michigan. http://www.sfu.ca/~dkeller
[7] Lynn, P.A., & Fuerst, W. (1998). Introductory Digital Signal Processing with Computer Applications. Chichester: John Wiley.
[8] Mix, D.F. (1995). Random Signal Processing. Englewood Cliffs: Prentice Hall.
[9] Roads, C. (1997). Sound transformation by convolution, Musical Signal Processing, C. Roads, S.T. Pope, A. Piccialli, & G. De Poli (Eds.). Lisse: Swets & Zeitlinger, 411-438.
[10] Rodet, X. (1984). Time-domain formant wave function synthesis. Computer Music Journal, 8(3), 9-14.
[11] Rolfe, C. (1998). MacPod. Real-time asynchronous granular synthesis software for the Macintosh PowerPC. Vancouver, BC: Third Monk Inc. http://www3.bc.sympatico.ca/thirdmonk
[12] Truax, B. (1988). Real-time granular synthesis with a digital signal processor. Computer Music Journal, 12(2), 14-26.
[13] Truax, B. (1992). Electroacoustic music and
soundscape: the inner and outer world, Companion to
Contemporary Musical Thought, Vol. 1, J. Paynter, T.
Howell, R. Orton, & P. Seymour (Eds.). London:
Routledge, 374-398.
Référence: http://www.sfu.ca/~dkeller/CornerEffect/CornerEffect.html