homestudio - revue audiolab

Stockhausen's New Morphology of Musical Time, Introduction: C. Koenigsberg December 1991

Scientific Commentary

In this section, we discuss some commentary and criticism, by physicists and others, about Stockhausen's article "...how time passes...". His unconventional, perhaps objectionably incorrect use of terminology like "phase", "quantum", and "formant" was unacceptable to the physicists, and some of his unclear or faulty assumptions need repairing before his conclusions can be accepted. But aside from the technical problems, does Stockhausen's article have any remaining scientific value? Is there, perhaps, musical value, apart from scientific value, even if it lacks the latter?

To answer this question, we present commentary from the British magazine Composer, a scientific critique from the premiere issue of the American journal Perspectives of New Music, by physicist John Backus, and articles from Stockhausen's journal Die Reihe, by physicist Adriaan Fokker and composer Gottfried Michael Koenig, .

In the British music journal "Composer", Alan Walker wrote

"The technical jargon in Die Reihe is notoriously difficult. Purporting to explain the latest developments in the theory and practice of new music, from serialism to electronic music and beyond, the pages of Die Reihe comprise a rich, terminological jungle through which I, for one, have rarely been able to hack my way. There are, of course, two rational explanations for my failure. Either I miss the point, or there is no point to miss...." ([Walker64a], p.24)

Physics professor John Backus, in his scathing criticism, wrote

"Repeated reading and persistent study of many passages [in the various issues of Die Reihe, including the one with "...how time passes..."] leave us still ignorant of their intended meanings. We are continually baffled by a technical language with which we are unfamiliar. In our frustration we may begin to wonder if perhaps the authors are as confused as their language appears to be, and if the unintelligibility is our fault or theirs....

"... We wish to see if the scientific terminology is properly used, to see if the charts, graphs and tables have any real significance, and to determine the technical competence of the material from the scientific standpoint. If it measures up creditably to these criteria, all well and good; if it does not, we will quite justified in dismissing as worthless all of it that does not make sense by ordinary standards." ([Backus1962], p. 16)

At the very beginning of the article in question, Stockhausen states that "Time-intervals between alterations in an acoustic field are denoted as 'phases'" [p. 10]. Thus begins the first disagreement for Backus. Stockhausen is referring to the fluctuations in air pressure that we perceive, when they reach our eardrums, as sound waves, and to the time intervals between maxima or minima of this pressure.

According to Backus, in a system undergoing some sort of periodic vibration or oscillation, "the phase of the periodic quantity, for a particular value of the independent variable, is the fractional part of a period through which the independent variable has advanced, measured from an arbitrary reference" ([Backus1962] p. 18). This is attributed by Backus to the reference book American Standard Acoustical Terminology, definition 1.18.

In other words, the term "phase" is conventionally used to denote a fractional part of one vibrational period in a simple harmonic, or "periodic", motion. Around a circle, for example, where the total length of one circular revolution, or period of rotation, is subdivided and measured usually by 2pi "radians" or 360 "degrees", a position 1/4 of the way around from a chosen point or origin (conventionally the rightmost point on the circle, where it intersectsthe "x-axis" if the zero is at the circle's center) would be said to have a phase of pi/2 radians, or 90 degrees (a "right angle"). No matter whether the position was reached through 1/4 of a revolution, or 1 1/4 revolutions, or 29 1/4 revolutions, the phase would still be the same, 1/4 of a period, pi/2 radians.

Stockhausen is apparently using "phase" to denote the whole period, rather than the fractional part that the term usually refers to (and he is also trying to generalize it to denote non-periodic motions too, which further complicates matters), so in his terminology, one might refer to the "phase duration" of 2pi radians for what the rest of the world denotes as one period. Backus questions why Stockhausen is deliberately misusing standard acoustical terminology in this fashion. ([Backus1962], p. 18)

Adriaan Fokker, another physicist, writes

"The interval of time between two repetitions of the same phase is called, simply, a period. There is no need whatever for a new word. After all, as has already been said, phase is not a new word at all. It has a well defined and generally accepted meaning. Something like sepha -- with the syllables reverted -- would have been a new word.

"It seems that the author has some motives in avoiding the word 'period'. Has it the stigma of being handed down by tradition? Let us look, then, for another term. I find in my dictionary the English word 'while' for the Dutch word 'poos'. I venture this proposition: let time-intervals be called 'whiles'...." ([Fokker], p. 68)

Fokker later rephrases one of Stockhausen's complicated serial examples, using his "while" in place of Stockhausen's mis-appropriated "phase", and we discuss this in the next section.

Backus next objects to Stockhausen's statement "Proportions serve for more exact definition -- one phase is twice, thrice as long as another. In order to fix proportions, one chooses a unit-quantum, and this is usually based on time as measured by the clock; we say one phase-duration lasts one second, two seconds, a tenth of a second...." ([Stockhausen1957], p. 20) because, in Backus' words, "in physics, a quantum is an undivisible unit; there are no quanta in acoustical phenomena, and besides, Stockhausen discusses `subdividing' a quantum, which is meaningless...." ([Backus1962], p. 18)

In subatomic physics, the unit of energy representing the smallest possible jump from one level to the next higher or lower one is denoted as a "quantum", and it is a fundamental, indivisible unit. Clearly Stockhausen is taking liberties when he refers to a measurement of one second of time as a "unit quantum". (But we note the "acoustical quantum" [Gabor1946,1947], which is in fact an indivisible unit of information)

Backus [p.18] also points out the defects in Stockhausen's first example, in which a pair of impulses, apparently (the assumptions are not very precise about exactly how many impulses there are, nor about the acoustical properties of the impulses themselves), is to be heard, with the distance between successive impulses gradually shortening from 1 second, to 1/2 second, to 1/4, 1/8, 1/16, 1/32 second, etc. At first they will be heard separately, but after the distance between them grows short enough, they will merge into a sensation of a continous tone at a particular pitch. Stockhausen is concerned with the threshold at which the perception of duration merges into the perception of continuous pitch, which is approximately at 1/16 second between impulses.

In acoustical theory, an "impulse" is an infinitesimally short transient sound-object, basically the short est possible "click". Clearly this is not what Stockhausen was referring to in his article; he was trying to describe the output of a modified pulse wave oscillator, an electrical impulse generator, that was available for his use in the Cologne Radio electronic music studio ([Manning1985], p. 73), which put out short but measurable bursts of sound at definite pitches (i.e. when their duration was long enough to cause a definite pitch sensation), with control over the duration between successive impulses.

Backus shows that if two impulses are heard with a time-interval of more than 1/16 second between them, the time-interval will indeed be perceived as a duration, and the two impulses will be heard separately. But if the two impulses are heard again, with the duration shortened below 1/16 second, the sensation will be that of one single impulse; no definite pitch will be heard at all, just a single click! This is because the ear requires more than just two impulses to get a sense of a pitch. For repeated impulses spaced 1/1000 seconds apart, the ear needs around 12 in a row before a pitch is sensed. Backus gives, as reference for this experimental result, [Olson], p. 250.

Backus is referring to a relationship between the time-interval separating the impulses, on the one hand, and the number of successive impulses needed to define a frequency with reasonable certainty (i.e. enough to distinguish a "fundamental" frequency). Additionally, the human ear is not so precise a measuring device, and in general will need more than the theoretical minimum number of impulses to produce the sensation of a specific pitch.

This relationship and its limiting factor of unity are described more generally as an "uncertainty principle" of sound, in a direct analogy to the Heisenberg uncertainty principle and its limiting factor of Planck's Constant, in ([Gabor1946, 1947]). On a graph of time and frequency, the precision with which an isolated acoustical event, like a single impulse, can be located is limited mathematically; rather than a precise point, it can only be located in a rectangle of a minimum size depending on the representation and scaling. The limit is described by an inequality which is analogous to the inequality in one-dimensional static wave-mechanics that limits the accuracy of describing both the position and velocity of a sub-atomic particle-wave at a given instant.

Thus it actually makes no sense to speak of an event which is both completely specified as occurring at a precise instant of time, and with a precise single frequency. The more precisely you specify one attribute, the less precisely the other can be specified, according to this limiting factor.

Specifying a precise time instant, as in a single impulse, requires a broad spread of frequency components (thus an impulse click has the same frequency spectrum as "white noise"). Any real-world sound which has enough energy at a particular fundamental frequency or narrow band of frequencies to cause the sensation of a definite "pitch" has to occupy a certain minimum time-interval.

On the other hand, specifying a precise frequency, as in an ideal sine wave, with energy only at one discrete point in the frequency-spectrum, theoretically requires an infinitely long periodic signal; any interruption of the signal introduces energy in a wider band of frequency. Any real-world sound, which of course cannot last infinitely long with no variation, has energy at more than one point frequency-wise. And any real-world sound which lasts for a very short time is going to be ambiguous in pitch, i.e. is going to occupy a certain wide interval in the frequency domain. The sensation of "noise" as opposed to definite pitch is generally caused by wide intervals in the frequency domain.

But if we repair Stockhausen's assumptions so that we are dealing with a sufficiently long train of successive impulses to satisfy the mathematical and perceptual requirements of pitch-determination, his conclusion is basically valid. Frequencies above the threshold of roughly 16 cycles per second are perceived as pitches, while frequencies below the threshold are perceived as individual events in an overall rhythm.

Stockhausen proposes a "new morphology of musical time", which seems to mean several different things. One meaning is that all aspects of sound can be characterized by "order-relations in time", whatever that means. It is true that sound can be represented purely by successive measurements of amplitude at instants in time, although the usual representations of music involve some measurements of pitch, or frequency.

.Fourier's Theorem does state that the representation in time and the representation in frequency of an infinitely long signal are equivalent, that no information is lost by representing an infinite signal either as amplitude values at points in time, or as amplitude values at points in frequency. And the current practice of digitally sampling sound, then playing it back as pre-set samples, is an application of a pure time-domain representation.

Pure frequency information is most useful for representing continuous periodic sounds which do not vary in pitch, like sine waves. The pure electronic music done in the studios of the Cologne Radio station had used sine waves almost exclusively, and composers there had been optimistic about synthesizing interesting forms of musical sound purely through the combination of sine waves. But sounds in the real world have transient, or rapidly-varying, characteristics, which are only poorly modelled with sine waves (unless a very large number of sine waves is used, which is practically impossible to control). The initial optimism expressed in volume 1 of Die Reihe, for composition using only pure sine waves ([Goeyvaerts1955]) tapered off after Stockhausen's Electronic Etude I.

More interesting sounds were not so easily represented or synthesized with continuous, single-frequency sine waves. This led Stockhausen to his experiments with impulses and with the notion of time-domain representation of sound that he is trying to develop in his article.

Pure time-domain representations of musical sound require different tools than frequency representations. The most powerful models involve elements of both time and frequency in the representation of musical sound.

Gabor, in his derivations of the "uncertainty principle" for acoustical information, suggested that only a mixed representation which contained both time and frequency information could be truly useful as a representation of complex musical sounds. Gabor's choice for a representation was what he called a fundamental quantum of acoustical information. This quantum took the form of an "elementary signal", a fragment of a sinusoidal waveform enclosed in a Gaussian exponential amplitude envelope, with an overall duration as short as 10 - 20 milliseconds, at the edge of the duration threshold below which the sounds are too short to even be perceived by human ears.([Gabor1946])

Gabor's principle was noted by Iannis Xenakis, who referred to Gabor's elementary signals as "grains" of sound, and who proposed a method of sound synthesis involving the synthesis of each separate grain. ([Xenakis1971]). Composers and scientists have since developed powerful theories, including "wavelet analysis" ([Kronland-Martinet1991]), which capture information in mixed frequency and time representations, and have developed more elaborate systems for controlling digital synthesis of granular sound ([Truax1990b], [Roads1991]).

Gottfried Michael Koenig, who was Stockhausen's assistant in the studio during the composition of Kontakte, later becomes the artistic director of the Institute for Sonology, at the Hague, Netherlands. He develops a digital sound synthesis system called SSP in which the only data which can be specified, at the lowest level, is the amplitude of a certain time value. Thus he has carried Stockhausen's notion, that music can be represented solely as events in time, to its technical conclusion in the SSP system. ([Berg1980], p. 25).

We see that although Stockhausen is right under certain conditions, that musical signals can be completely represented purely in the time domain (again, this is the basis of digital "sampling"), his characterization is quite faulty on specific technical grounds. Furthermore, pure-time representations of musical signals are also not the most powerful ones available for sound analysis or synthesis. Few, if any, kinds of analyses or transformations of sound can be done without recourse to some kind of frequency information.

Backus continues his exposition of Stockhausen's flawed example with the other extreme. Supposing that a pure sine wave were given, instead of a train of finite impulses, with frequency 1000 cycles per second, and supposing that the frequency were gradually lowered to about 15 cycles per second, the sensation of sound would completely disappear! Instead of transforming into a sensation of rhythmic duration, our perception of the acoustic waves would vanish. A frequency would exist, and a duration between acoustic pressure maxima would exist, but our ears would simply lack the physical apparatus needed to detect them.

In order to produce the effect Stockhausen describes, the impulses would have to be finite (as they were in Cologne), perceptible in individual duration, and present in sufficient number to allow the ear to detect a pitch at the fast speeds. Then they would produce a sensation of duration if presented slowly enough, and would present a sensation of pitch if presented fast enough, as long as a sufficient number were heard to provide the definite pitch.

This phenomenon was noticed by others before Stockhausen. Even Ezra Pound the poet describes how the lowest notes of an organ could be discerned "not as a pitch but as a series of separate woof-woof's", and how "the percussion of the rhythm can enter the harmony exactly as another note would. It enters usually as a Bassus, a still deeper bassus; giving the main form to the sound" ([Pound1934], p. 301). Of course, in this same treatise on musical harmony and time, Pound also claims that his own personal copy of Mozart's "Le Nozze di Figaro" was marked, in Mozart's original handwriting, as "Presto, half note equals 84; Allegro, black equals 144"..... which tends to dilute the authority of his other claims....

Adriaan Fokker comments on the same example of Stockhausen's, of a pair of impulses slowing down to yield first a sensation of pitch, then one of duration. As a more clear example of the phenomenon, Fokker proposes a physical situation involving a steel ball, dropped from some height onto a horizontal marble slab. It falls perpendicularly and rebounds. The rebounds repeat themselves, but the time lapses between them diminish. The separate impacts of the ball form the sound of a roll, like a drum roll. The roll transmutes itself into a sound, a note of rising pitch, .... "We hear macrowhiles between the initial impacts. There are microwhiles between the impacts in the final state, which we no longer hear separately, perceiving a note instead..." ([Fokker1962], p. 69)

Fokker discusses the uncertainty principle of time vs. instantaneous frequency which we already mentioned. For N impulses or samples, the width of the frequency spectral band with nonzero energy will be (N+1)/N, which gradually converges towards unity (i.e., towards a single definite frequency) as N gets infinitely large To illustrate this, he gives an example of a doublebass and a violin playing a passage in unison, within their respective ranges. The G on the bass, at 96 cycles per second, has no more than 12 vibrations in 1/8th second, so the frequency of the bass's tone in that interval has an uncertainty of (12+1)/12. Therefore the exact pitch of the bass in that 1/8th of a second is spread across an area covering 3/4 of a whole tone around 96 cycles per second. The G3 on the violin, however, at 384 cycles per second, makes 48 vibrations in this 1/8 second, with uncertainty of only 1/5 of a whole tone around 384 cycles per second, and the ear won't even notice this small uncertainty in the violin pitch.

Reiterating what Backus said, Fokker emphasizes that a sound making only one vibration in 1/1000 second, like the crack of a whip, does not relate a sensation of pitch corresponding to 1000 cycles per second, but the pitch uncertainty is evenly spread across an entire octave. Fokker concludes

"It is quite misleading to put a certain `while' in direct relation to a pitch. In the first place a single microwhile is not sufficient to determine a pitch, and in the second place, by increasing the length of the microwhile, the pitch is neither increasing nor rising, but sinking and decreasing." ([Fokker1962], p. 70)

Backus' next objection is to Stockhausen's term "subharmonic series of proportions", which is "another example of terms selected to impress rather than clarify." ([Backus1962], p. 18) This is merely a harmonic overtone series, based on the sub-division of a unit duration, so that successive higher partials refer to shorter and shorter durations, i.e. if 1 second is the fundamental duration, partials are found at 1/2 second, 1/3 second, 1/4 second, 1/5 second, etc.

Fokker again illustrates Stockhausen's point by giving the example of two doublebass strings, one performing 3 vibrations against 4 of the other string. As they slow down below audio frequency ranges, below 16 cycles per second, a rhythmic sense is generated (because their individual vibrations are complex enough that they can still be detected as sort of individual impulses). There is a basic 12 units in one period, divided either in 3 bars of 4 units, or 4 bars of 3 units. Instead of starting the 3's and 4's simultaneously, an able pianist might shift the 3's so their first stroke is midway between the first and second of the 4's. Then the "metrum" would be 24 units in one overall period, either 3x8 or 4x6. ([Fokker1962], p. 71)

In a nasty jab, Backus says "His [Stockhausen's on p. 16] statement, `Even today, it is quite impossible to make a musician play a single 1/3 or 1/5 of a fundamental phase' ... makes one wonder about the calibre of the musicians of his acquaintance!" ([Backus1962], p. 18)

Hugh Davies, the British composer who soon went to work for Stockhausen in Cologne, responded to Backus by explaining that Stockhausen refers to the difficulty of playing only the first note of a triplet, followed immediately by only the first note of a quintuplet. While a good musician ought to be able to sub-divide a reasonable duration into 15 underlying units, so that the triplet element counts for 5 units and the quintuplet element counts for 3 units, in the two Stockhausen examples that Backus makes fun of, the underlying 15th-notes would only be 1/15 second long in one case and 1/105 second long in the other case. It is indeed doubtful that any musician could play such a passage accurately without extraordinarily unreasonable amounts of preparation for such a short fraction-of-a-second passage ([DaviesH1965], p.17)

Eventually, Backus tires of picking at specific problems:

"We conclude that Stockhausen's technical language is his own invention, using terms stolen from acoustics but without their proper acoustic meanings, and that the technical jargon he has developed is designed mostly to impress the reader and to hide the fact that he has only the most meager knowledge of acoustics." ([Backus1962], p. 20)

In retrospect, we feel that Backus has perhaps not picked up on the interesting musical ideas Stockhausen hints at, in his inability to get beyond the misuse of scientific jargon.

A second aspect of Stockhausen's "new morphology of musical time" is the notion of a unified compositional and structural approach to both the "macro" and "micro" time intervals (referred to by Stockhausen as "the sphere of duration" and the "sphere of pitch"). Traditionally in instrumental music, one composes a "score" in the macro-time domain, consisting of "notes", which are events performed on orchestral instruments. The internal structure of the notes is not precisely specified. Each note is, in turn, composed of events on the micro-time level. The advent of electronic music made it possible to think about controlling the micro-time level events, relating them to macro-time structures, and vice versa.

Concerning the general principle of composition as a unified approach to both the macrostructure and the microstructure of sound, Otto Laske, a noted researcher in compositional theory and artificial intelligence, writes

"It would be every composer's dream, to have at his/her disposition a task environment equally suited to modular composition in the micro- and macro-time domains, and thus to be able to dispose of the dichotomy of `orchestra' and `score' entirely. On closer scrutiny, to achieve a unification of the two time domains is a tall order. The task is nothing else but to unify a composer's decision-making in four temporal dimensions, of event-time, note-time, control-time, and audio-time. Of these, the first two make up score-time or macro-time, focusing on the `note' as a primitive, while the other two make up micro-time, focusing on the sample as primitive. While... macrotime is `fractal', microtime is `quantized', there being nothing much of aesthetic interest between note- and control-time, and between control-time and audio-time. For this reason, a strict analogy between these two sets of levels is hard, or impossible, to maintain, and information-hiding, in an object-oriented style or otherwise, is a crucial method for achieving their integration." ([Laske1990], p.132)

In the same journal issue, dedicated to compositional theory in the age of computer systems, H. Vaggione writes

"A digital sound object is always a composed one; it is composed music at the microtime level of samples. This fact in no way precludes, or contradicts, principles of macrotime structuring; rather, an interaction between all possible time scales is at the heart of the process by which a musical form comes into being.

"....To summarize, an object is transparent only if it is in an open state in which one can work on its internal structure. In order to manipulate the object as an autonomous entity, it must be closed under some name. The difference between a digital and an analog sound object lies in the fact that the analog one is a black box which can never be opened, while the digital object is open or closed, depending on the level at which the composer is operating." ([Vaggione1990, p. 211)

This shows the difficulty, or the impossibility, of Stockhausen's task of unifying the macro- and micro-time domains of musical sounds, working in the 1950's with primitive analog equipment, compared with the task today of working with digital sound objects, which can be either open or closed at the composer's will.

Barry Truax, through his work with granular digital synthesis over a number of years, writes of an insight, which most closely approximates Stockhausen's insight in his earlier attempt to construct musical sound from successions of single analog impulses:

"...The most dramatic paradigm shift I have encountered in my software development has been that involving granular synthesis. By shifting the base unit to the microtime domain, it challenges many if not all of our previous notions about sound synthesis and musical composition." ([Truax1990a], p. 230)

Truax mentions the threshold of approximately 50 milliseconds per event, or 20 per second, which is the boundary between separately perceivable events and micro-level phenomena which fuse together perceptually (this is close enough, given variations in measurement and in individual human perception, to Stockhausen's 1/16 second threshold). The technical term for this and related phenomena, like distinguishing between single and multiple melody lines or auditory signal sources, is "auditory stream formation". The classic reference on this topic is by Stephen McAdams and Albert Bregman:

"....In sequences where the tones follow one another in quick succession, effects are observed which indicate that the tones are not processed individually by the perception system. On the one hand, we find various types of mutual interaction between successive tones, such as forward and backward masking, loudness interactions and duration interactions. On the other hand, a kind of connection is found between the successive perceived tones....

"...Consider that a repetitive cycle of tones spread over a certain frequency range may be temporally coherent, or integrated, at a particular tempo. It is possible to gradually increase the tempo until certain tones group together into separate streams on the basis of frequency.... the faster the tempo, the greater the degree of breakdown or decomposition into narrow streams until ultimately every given frequency might be beating along in its own stream...

".... These findings indicate that the perceived complexity of a moment of sound is context-dependent.... Context may be supplied by a number of alternative organizations that compete for membership of elements not yet assigned. Timbre is a perceived property of a stream organization rather than the direct result of a particular waveform, and is thus context-dependent. In other words, two frequency components whose synchronous and harmonic relationships would cause them to fuse under isolated conditions may be perceived as separate sine tones if another organization presents stronger evidence that they belong to separate sequential streams."([McAdams1979], p. 25)

The last paragraph suggests that the perception of timbre, or tone quality, what Stockhausen refers to in his article as "formant rhythm", is a much more complicated subject than is possible to treat in any simple manner; certainly it is much more than a matter of simply combining pure sine waves at different frequencies and different amplitudes into complex "note mixtures" as was being attempted in the Cologne electronic music studio before Stockhausen began using impulse generators instead.

"...There are thus attentional limits in the ability of the auditory system to track a sequence of events. When events occur too quickly in succession, the system uses the various organizational rules discussed in this article to reorganize the events into smaller groups.... In the example where the fast sequence of tones merges into a continuous `ripple', the auditory system is unable to successfully integrate all of the incoming information into a temporal structure and simplifies the situation by interpreting it as texture. Thus the auditory system, beyond certain tempi. may interpret the sequence as a single event and assign to it the texture or timbre created by its spectral and temporal characteristics." ([McAdams1979], pp. 28-42)

And thus we see that in spite of Backus' complete dismissal of Stockhausen's ideas because of the unclear presentation, acoustic scientists are now looking deeply into the phenomenon that Stockhausen is interested in, in the boundary between rhythm and pitch, between separate successive events and continuous texture.

Back to the index

Next section (The Serial System of Composition, part 3)

Previous section (Introduction)