redmeansrecording

Transcript and Sources: Spectral Synthesis

Added 2024-10-20 23:43:15 +0000 UTC

This is the script and sources for my video on Spectral Synthesis and X-Stream by Steinberg.

THIS VIDEO WILL BE PATREON ONLY UNTIL THE VIDEO IS LIVE THIS COMING WEEK.

Video here: https://youtu.be/5g632XP7HVY
Get the free Spectral Synthesizer X-Stream here: https://www.steinberg.net/vst-instruments/x-stream/
After X-Stream is activated and installed, get my presets here: https://download.steinberg.net/assets/X-Stream/X-Stream_RMR.vstsound
Get an entirely free (and free to use anywhere) collection of music made with X-Stream here: https://jjbbllkk.itch.io/spectral-landscapes

INTRO

What is sound? No really, what is it?

Is sound music? Sometimes. Ok, what is music made of? Notes? Ok, sure, yeah, and a note is a fundamental frequency that’s played according to a scale we made up, but if we just play that single frequency it sounds like this: Sine wave

Not like this: Flute sound

Or this: Guitar sound

Or this: Meme sound

So what’s all that other stuff? If we look at this sound in an eq we can see these little spikes, and if we look at it in something that examines the spectral content of a sound we can see these spikes as bright lines. Those bright lines are harmonics or sidebands, and I thought I understood how these made up a sound, but it turns out the entire thing is MUCH weirder and cooler than I thought. That’s where we’ll eventually get in this video, a video on the history and technique of spectral synthesis, a journey to the very heart of what makes things sound the way they do.

Hi, my name is Jeremy, this is Red Means Recording, and today we’re going to look at what led to the modern implementation of spectral synthesis, a technique that breaks apart sound into discreet components and lets us stretch, warp, and reconstitute the sound in different ways. We’ll also look at X-Stream, a free spectral software instrument from the video’s sponsor, Steinberg. They approached me to make presets and a video on the software, and I decided to turn it into more than just a software demo. I wanted to know how this whole thing worked, and now I hope to help you learn too. Let’s get started.

EARLY DEVELOPMENTS

We mentioned fundamental frequencies in our intro. That’s the root frequency of a note, what you would play as a key on your keyboard. If we play a simple sine wave, we get a pure expression of that frequency as a smooth, sinusoidal shape with no bumpies or kinks. Just a smooth null friend.

But that’s not how any sound appears in nature, and even as far back as the 1730s scientists could tell something weird was going on. If you look at a vibrating string, you can see it has a much more complex vibrational pattern than a simple sine way. This got a guy name Daniel Bernouills interested, and his work on The Modal Decomposition of Vibrating Strings dug into it. If you look at a guitar pluck in an eq or spectral analyzer, you’ll see peaks above the fundamental. Each one of those is a harmonic, or sideband. These harmonics follow a certain pattern, called the harmonic series. Andrew Huang has a great video on the harmonic series you can find on YouTube if you’re interested.

So each one of these harmonics has to exist for a sound to sound the way it does. When you put them all together you get a much more complex waveform than a simple sine, but if you isolate them individually you get sines. Bernouilli figured out that each one of those individual harmonics has it’s own equation, and you can deconstruct a complex wave into a series of simpler ones using math. Each one of those individual bits Daniel called a “mode”, and this is the foundation for everything going forward.

It was in 1822 that we got a mathematical foundation for this, thanks to Jospeh Fourier. His theorem provided a way to represent periodic signals as a sum of sinusoids. Sinusoids is such a word, like, i I know it means sine wave but it’s so much cooler to say Sinusoids.

Fourier proved that no matter how complex the original signal was, it could be broken down into a collection of discrete sine waves, each with its own amplitude and frequency. He also proved that this could go both ways, that once you’ve decompiled a signal into individual bits, you could also recompile it into its original signal with the same math.

Notice that I’m not saying sound here, just signal. That’s because Fourier Transform, as this technique is known, applies to way more than just sound. It’s useful for understanding all sorts of wave patterns in science and engineering, like sound waves, light waves, or even the vibrations of structures like bridges and buildings.

A breakthrough that has everything to do with sound in this field came in 1863 from German polymath Hermann von Helmholtz. He figured out how to decompile sound in real time for the human ear using what’s known as a Helmholtz resonator. This weird-looking thing was made to be put directly in the ear and was designed in such a way that one frequency would pass through at a time. Sort of like what happens when you blow across a bottle, but without all the fancy harmonics. When you listened to a band with these things on, the sound would go from hearing the whole band to hearing only the frequency resonated by the device, and an amplitude fluctuating as that frequency was accessed by the sound the band was creating.

20TH CENTURY MILESTONES

There were a few early electronic instruments that started working with the concept of sound as a collection of modes, or partials. The Telharmonium (1898) and Hammond Organ (1934) were early electromechanical instruments that performed a form of additive synthesis by combining multiple sine wave components.

The telharmonium synthesized musical tones by mechanically generating high-voltage alternating currents of the required frequencies using rotating tonewheels and alternators. This was an early form of additive synthesis, where complex tones were built up by combining multiple pure sine wave components.
Each alternator consisted of a toothed iron wheel rotating in front of an electromagnet, producing an alternating current with a frequency proportional to the rotation speed and number of teeth.
The instrument had a total of 145 alternators, with each one generating a specific tone or frequency. This allowed it to produce multiple notes simultaneously.
The alternators were arranged on a series of 8 shafts spanning 60 feet, driven by a large motor and system of gears.
Musicians played the Telharmonium using a console with 153 keys and pedals, which controlled a complex array of switches and circuits to route the desired tones from the alternators.
The electrical tones generated were then transmitted over telephone lines to be received and converted into sound waves by speakers or telephone receivers in remote locations.

The Hammond Organ used drawbars to mix the fundamentals with upper and lower harmonics at a user’s choice of amplitude, allowing a player to create brighter or bassier tones on the fly. This was a basic implementation of additive synthesis.

Vocoder (1935) and Voder (1937) were early speech analysis/synthesis systems that modeled the spectral characteristics of the human vocal tract.

The original vocoder, invented by Homer Dudley at Bell Labs in 1928, was designed for secure voice communications over telephone lines, by encoding speech into a compressed signal that could be transmitted over narrower bandwidths.

It consisted of an analyzer that broke down speech into its frequency components and a synthesizer that reconstructed an artificial voice from those components.

The analyzer used bandpass filters to separate the speech spectrum into different frequency bands, while the synthesizer combined noise and tone generators to resynthesize the voice.

In 1966 we got the Phase Vocoder, and this is where things get super weird, at least for me. It’s also where my ability to properly articulate the math concepts kinda goes out the window, so apologies up front to any math wizards or FFT nerds watching the video.

So reading this got me thinking: what the heck does phase have to do with anything? Up until this point, I understood sound as a collection of partials with different amplitudes over time, some harmonic, some inharmonic. That made sense. What did phase have to do with anything? What the heck even is a phase?

The phase vocoder uses snapshots of time and frequency to capture its sound. Each one of those captures the amplitude of the wave and the phase of the wave.

When it comes to waveforms, phase indicates a point in time within a sound wave’s cycle. It's measured in degrees, with 0° being the start of the wave, 90° the peak, 180° the midpoint, 270° the trough, and 360° completing the cycle. Any point in a repeating wave has a phase value and when you combine waves at different phases, weird things can happen. For instance, two identical waves 180 out of phase completely negate each other, as shown here.

What I did not understand, and I needed big smart man Venus Theory to help me with, was that to get even a simple waveform like a saw wave, the harmonics above the fundamental are actually out of phase with each other in a set way, purposefully causing specific phase relationships that lead to the edges of the saw.

This was wild to me, and helped me understand why a Phase Vocoder captured the phase of each partial: it’s intrinsic to the way sound itself is produced.

MODERN SPECTRAL MODELING

During the 70s and 80s advancements were made in Sinusoidal Modeling, which breaks down complex sounds into individual sine wave components known as partials, which could be reconstructed with additive synthesis techniques to create an approximation of the original sound, but it still struggled with noisy and transient aspect of audio.

That was, until The Sines+Noise+Transients model (1988-2000) developed by Xavier Rodet and others at IRCAM, a French institute dedicated to the research of music and sound, especially in the fields of avant-garde and electro-acoustical art music.

The Sines+Transients+Noise (STN) model, developed by Xavier Rodet and others at IRCAM, is a way to represent audio signals by breaking them down into three main components:

Sines (sinusoids): These are the time-varying sinusoidal components that model the tonal or pitched parts of the sound, like the fundamental frequency and harmonics of a musical note.

Transients: These are short bursts or impulses that represent the attack portions of sounds, like the initial strike of a drum or pluck of a string. Transients capture the abrupt changes in the waveform.

Noise: This is the remaining broadband residual after the sinusoidal and transient components are extracted. It models the non-tonal, stochastic parts of the sound like breathiness or frication noise.

It provides an efficient way to decompose complex audio signals into deterministic (sines, transients) and stochastic (noise) components.

Sinusoids model the slowly evolving tonal parts, transients represent the sharp attacks, and noise captures the remaining non-tonal textures.

The separate components allow transformations like time-stretching the sines, preserving transients, and filtering the noise residual.

It enables high-quality resynthesis by recombining the modified sines, transients, and noise components.

You can see some of this technology in wavetable synthesizers like the PPG Wave, Groove Synthesis 3rd wave, and Knobula Pianophonic where wavetables represent the morphing sinusoidal part, and pcm samples or a specific 2nd wavetable is made and used to represent inharmonic transients.

So there it is: the basis for spectral synthesis. Scientists found that complex wave patterns could be broken down into smaller, simpler sine wave components, known as modes or partials, and by analyzing these separately and combining them we were able to recreate complex waveforms.

Later, other scientists focusing specifically on the audio aspect of this fourier transformation technique realized that sound was more complex than just sinewaves smooshed together, and developed an addition to the concept that accounted for the transient and noise information inherent in complex real-world sounds.

The combination of all these techniques is where we find ourselves today, with software tools at our disposal that would make fourier blush. Before we talk about X-stream, here’s some other software that works with additive spectral resynthesis

Discodsp Vertigo
Tone2 Icarus
Xoxos resyn
iZotope IRIS 2
Spectra Additive ReSynthesizer for Reason
KLEVGR Tomophon
DAWSome Nova
FL Studio Harmor
VirSyn CUBE 2
Native Instruments Form