We sought to explore how ‘noise’ might provide challenging input to algorithmic listening techniques or make for a desirable, divergent output (in all the varied senses of that word, (see Marie Thompson’s Beyond Unwanted Sound: Noise, Affect and Aesthetic Moralism 3). We sought to misappropriate known techniques to uncover their limitations or the implicit assumptions built into them. Independently, we each brainstormed proposals for makes. Combined we had a long list of 48, some expressed compactly, some at greater length, some in a standard ‘scientific’ language, some deliberately written humorously or facetiously, some with a degree of overlap and convergence with other proposals, some unique, some making reference to existing artworks but bending them to our context of interest, and so forth.
We made work in two concerted sessions of two days duration each, one at each of our host institutions. We worked with a light touch doing just enough to prove the principle of our design ideas before moving on to the next. We were drawn to prioritise proposals that we both shared but we ensured that our individual idiosyncrasies were also represented to maximise the coverage of our work. We conducted a third two day session to combine our makes in a performable installation environment. In total, 18 of our proposals were made to some degree with 14 having a role in the final presentation of the work.
Examples of algorithmic images from the series of generated by JB for the project
The input signal is amplitude envelope-followed. When the signal drops below a given threshold, it is let through the gate, thereby performing the opposite action to a classic noise gate. At the moment the signal drops below threshold, it is subject to single frame FFT analysis which is used to create a freeze effect that is held until the next time the gate opens. The sound through the open gate and the frozen spectral texture can be cross-faded. The cross-fade and the threshold are both variable in performance.
In the Spectral AntiGate (SAG), a carefully engineered multi-resolution spectral gate, made by Harker to showcase his new FrameLib signal processing framework, is hijacked by simply reversing the inequality at its core. Being multi-resolution means that the chirping redolent of crude spectral processing is mitigated somewhat, particularly in higher frequencies that retain a degree of texture. If a feedback loop is set up with an air microphone picking up SAG’s output, a steady cycle is settled into that alternates between more chirpy mid-frequencies and bursts of higher frequency noise, although the inner textures of these components do vary. This behaviour is oddly reminiscent of the change ringing of bells. The rhythmic behaviour changes if a player manipulates the microphone, for instance by shielding the microphone.
Bursts of filtered room-tone are fired back into the space with an attack-release energy profile. The timing of these bursts is dictated by a maximum-length pseudo-random sequence (see our discussion of LFSRs below) with four ‘voices’ each with different length sequences and occupying different spectral bands. The room tone is read from a 10-second delay line, so depending on the delay time, there is the possibility of sampling previous output. The overall effect depends to a large degree on how fast the sequencers are driven. High speeds and short bursts produce an impulsive kind of texture, moderate speeds a more rhythmic feel, and low speeds with long bursts can occasionally punctuate whatever else is happening in the space with dramatic impact sounds.
An inductive coil (sometimes known as a phone tap coil) is used to transduce electromagnetic fluctuation into a signal that is presented to the EFR which tries to model its input as coloured noise. This is done using a conventional source-filter technique where noise is filtered in the Fourier domain by a spectral envelope derived by cepstral liftering of the input. This is supplemented with a very simple, single-voice sinusoidal model driven by the sigmund~ external to the Max language. The character of the resynthesis is largely determined by the degree of liftering and, of course, how much sense it makes to model the input with filtered noise in the first place. In variants of the EFR, a microphone has substituted the inductive coil.
A signal is ‘resynthesised’ by sine oscillators driven by three different pitch trackers in Max (sigmund~, zsa.fund and the built-in Fzero~). One can also mix in another signal of oscillators driven by the difference in frequency between each of the three trackers, ring-modulated with each other. Driven with a pitched signal and a sensible gain structure, the effect is rather like six excited slide whistles. However, introducing feedback and nonlinearity opens up a much wider range of territory. If left in a feedback loop with a suitably large delay (we use 12 seconds here), DPT1 can settle into a quite diverse range of states, especially if there is clipping or distortion somewhere in the loop. Smoothing and delaying the frequency inputs of the oscillators by different amounts can also enrich the emerging dynamics.
A similar approach to pitch tracking and making the disagreement in results between algorithms palpable was made using the Pd-vanilla language, The sigmund~ and fiddle~ objects are used to identify pitches in the input and to set the frequencies and amplitude envelopes of two sine waves. These signals are also ring modulated to enhance the perceptibility of their disagreement. The performer can cross-fade between the sine waves and their ring modulation. The identified pitches and their absolute difference as a disagreement measure are made available from DPT2 to other patches (e.g. to parameterise the LFSR, see below).
The ERM is a means for converting any input into a sustained noise texture. On receipt of a button press style event, the momentary spectrum of the sound is subject to an 4096-band FFT and used to synthesise a sustained frozen noise. Successive button presses will add partials to the sustained sound if their FFT bands are louder than in the last analysis. Button presses will also momentarily open a gate to pass the input sound to Pd’s freeverb~ set to a large room size with little damping. When the gate closes, the reverb is frozen to give an infinite reverb effect. This gives an alternative way to synthesise a spectral noise from input sound. The performer can cross-fade between the two methods and reset the analysis (which fades both kinds of noise to silence).
An 8-bit linear feedback shift register (LFSR) was implemented in Pd-vanilla. A flexible design was adopted where the last bit could feedback to any of the 8 positions in the register for exclusive-OR combination with the position’s contents. This creates an algorithmic system which can generate a variety of behaviour from the digital pseudo-noises of maximal length sequences to varied periodic behaviour. The values in the register were interpreted both as a 8-bit sample values to be read into a wavetable and as 8-bit specifications of frequency with which the wavetable (or a sine or a square wave) would be played. The rate at which the LFSR is clocked and the centre and range values of frequency could be determined manually or received from other processes (e.g. the Disagreeing Pitch Trackers). In this way, pseudo-noises or pitched sequences could be generated which followed identified profiles.
We reversed-engineered a music-psychological study that aims to demonstrate a mapping between given musical ‘features’ (timbre, tempo, mode, register, articulation, dynamics) and ‘emotions’ on the basis of rating judgments given by listeners to various transformations of simple melodies. Working in parallel, we each independently came up with ways of trying to estimate these six features from an audio stream. Then, using the paper’s experimentally derived table of correlations between features and emotions, we constructed a mapping function between ‘features’ and the four ‘emotions’ examined in the paper (happy, sad, scary, peaceful). We then set about using this mapping for generative purposes. One of us made a noise/drone generator, which constructed a spectrum based on a shifting histogram of detected pitch classes that was modulated using the detected emotions and features. The other of us made a melody generator which, on the basis of the emotions recognised in the input audio stream, estimated values for the six musical features analysed in the study and played back notes synthesised with enveloped, filtered sawtooth waves.
The instantaneous digitised value of an input audio stream is sampled at random intervals and read into a wavetable, the insertion point wrapping round when the table is full. Following a fractal expansion technique used previously by JB, the wavetable is read to generate long patterns of nested amplitude modulated sound. The reference rate for reading the wavetable can be set as a linear function of the currently sampled value or from other pitch tracking processes. The range of the random sampling intervals can be set in performance. The output can vary from a noisy reconstruction of the input through a slow pattern which can variably follow the pitch content of the input to a distorted granular-sounding stream.
The analog-in values from an Arduino Nano are read into wavetables and used for direct digital synthesis via nested amplitude modulation as described in the previous section. The analog terminals are left floating so they are sensitive in unpredictable and interactive ways to touch and circuit noise. This creates a lively five oscillator digital synthesizer capable of a range of distorted, bit-reduced and granular-sounding textures which can be steered by touch but not precisely played. Two improvisations were recorded and used by us in performance as a fixed media element.
In recognition of the prominence that Mozart’s music has in the history of algorithmic composition and machine listening, we took a recording his Eine kleine Nachtsmusik and extracted its tonal component using Izotope RX. We followed this with a sinusoidal analysis using Spear and made various resynthesises. For example, we made a version which was reconstructed out of banks of sine waves, another which retained only the transients and yet another in which the tonal analysis was read at a slow rate to generate a 45 minute texture. To explore how machine listening techniques might react to suboptimal renderings, we also degraded the original recording by playing it back in a reverberant space, freely talking over it and recording the result using a gain structure with a tendency to distort. We selected five versions plus the original and mixed them using a good to bad (Nacht- to Schlecht-musik) crossfader. We informally calibrated the crossfader so that at extreme good/Nacht the online music recognition service Shazam would accurately recognise Eine kleine Nachtsmusik while at extreme bad/Schlecht no results were returned, with an approximately 50% hit rate in the middle.
Using sigmund~ feeding an oscillator bank with a generous number of partials (100), we found that a reasonable facsimile of even a noisy environment could be rendered, but that it was a simple matter to reduce this to a sludge of artefacts by over-smoothing frequency and / or amplitude tracks. The degree of over-smoothing was made a function of the distribution of averaged spectral centroid in the space by building a histogram (periodically cleared), that was occasionally sampled as if it were a PDF and used to set the amount of smoothing.
Following the same principle as Alvin Lucier’s I am Sitting in a Room, a prepared text was read by one of us and recirculated through Skype until its original identity had completely dissipated. This was roughly 30 iterations. In contrast to the shifting resonances of Lucier’s acoustic version, the accumulating artefacts included bursts of noise and clicks, and the appearance of a distinctive crescendo of bass-drum-like impact sounds partway through, as well as the chirpy filtering we had expected. Our text was from a deeply critical review of Abraham Moles’ Information Theory and Esthetic Perception, and the results formed a fixed-media component of the final presentation.
Using a black-box de-reverberation plugin and a reverberation pedal, we constructed a controllable feedback loop, stimulated with chirps, noise bursts and crackles programmed in Pd-vanilla, and recorded a short, two-person improvisation that was used as a source of fixed material in our presentation. One feature of the plugin is that reverberant components can be boosted as well as suppressed using a ‘focus’ parameter, and that it is easy to mistune the settings to generate plenty of artefacts. The resulting material had a drone-like character but did not tend to collapse into indistinct mush.
We made a number of other explorations which we will only briefly relate here. Many of these concern processing fixed media material using offline processes or involve recordings that for reasons of practicality could only be appear in our work as fixed media. For example, one of us created a piece entitled Maximum Zero which takes a recording of David Tudor performing John Cage’s 4’ 33” and subjects it to brickwall limiting to bring out the environmental sounds around the performance at maximum intensity. One of us also made recordings using the aerial array and amplifier designed by NASA’s Radio Jove to bring recordings of the radio transmissions of Jupiter to our project. We also made experiments to see whether we could transmit the results of our machine listening analyses via non-standard means. This included an encoding of identified pitches as audible Morse messages, which we decoded and played back in a feedback loop. In this way, we sought to corrupt conventional understandings of the relationship between representation and the represented and between signal and noise.
The fruits of our labours were assembled together and explored in the University of Huddersfield’s multichannel Spatialisation and Interactive Research Lab (SPIRAL), which offers 25.4 channels to work in, arranged as three tiered rings of eight, plus a ceiling mounted speaker dubbed the voice of god. A binaural dummy head was used as the input for all listening process, which we dubbed Stookie Helen (people who attended the second HAL meeting in Belfast will have already encountered JB’s partner, Stookie John).
JB and Stookie Helen have some quiet time together
Different processes were placed in different speakers, and kept stationary. In this way, the character of what emerged was driven in part by the interactions of the different processes, affected by the relative gain structure, over which we had control.
This allowed more or less scrutable relationships to emerge between processes as they interfered with each other, and it also encouraged visitors to explore the space, and discover different points of focus. We had some control over each process, in the form of individual gain faders, ‘nudge’ buttons which would push an individual process into a new (possibly random) state, and a combined overall control on a boundless rotary encoder that would affect all processes. This combined control yielded 24 separate control signals internally, based on a set of transfer functions. Processes were free to use whichever of these we fancied, however we wished, the object being to generate variety with coherence. For example, processes like the crossfades on the Anti-Gates or the ERM could set by the values from the transfer functions.
Under the provisional title of All The Noises, our work was first presented on 18th January 2018, and occupied territory between a performance, an installation and a research presentation. We started with a brief, 15 minute, performance whilst an audience composed of colleagues from Huddersfield and members of the public responding to local publicity arrived and explored the space. We then set the system into a lower-key state whilst we explained our project to the room at large. Thereafter, we had a steady trickle of guests passing through and we would alternate between talking, nudging the system, and demonstrating brief performative moves. Finally, we concluded the session with a 10 minute performance crescendo.
Here are edited highlights:
Oi Algorithm Performance, Huddersfield 18 January 2018
OG and JB, tearing it up
Going into this we had a few ambitions: one was to use the many makings approach to try and sketch out an approach for errantly-inclined, artistic algorithmic listening research that complements and challenges to the engineering orthodoxy. Our thoughts on this have been submitted to NIME 2018, so we hope to be pontificating on this topic in public later this year. An other ambition was, straightforwardly, to collaborate, as we hadn’t done so before despite having been in each other’s orbit for a while. We’re encouraged by what we made, and intend to keep refining and gigging it.
As well as the support of HAL in making this possible, OG’s time and access to facilities are supported by the ERC through the Fluid Corpus Manipulation project.
Andrew Feenberg (2002). Transforming Technology: A Critical Theory Revisited. Oxford University Press, p. 106 ↩
Bowers, John, Simon Bowen, and Tim Shaw (2016) “Many makings: Entangling publics, participation and things in a complex collaborative context.” Proceedings of the 2016 ACM Conference on Designing Interactive Systems. ACM. ↩
Marie Thompson (2017) Beyond Unwanted Sound: Noise, Affect and Aesthetic Moralism. London: Bloomsbury ↩
John Bowers & Owen Green SEEDS