POV-Ray: Newsgroups: povray.off-topic: More accurate spectrogram

POV-Ray : Newsgroups : povray.off-topic : More accurate spectrogram		Server Time 29 Jul 2024 18:29:39 EDT (-0400)

From: Mike Raiford
Subject: More accurate spectrogram
Date: 20 May 2011 11:30:38
Message: <4dd6891e$1@news.povray.org>

So,

I was watching Mythbusters the other day, and they were testing movie 
sounds against the real thing. I noticed the sound engineer they hired 
to record and analyze the sounds had an unusual looking spectrogram of 
the sound. It looked almost like a woven tapestry of frequencies, rather 
than the usual buckets of FFT information. I had seen it before when I 
was looking at FFT filtering and analysis.

Turns out, by looking at the /phase/ of the signal in addition to the 
frequency and differentiating that with a previous sample and FFT 
bucket, you can find out where in that block the sound should lie. It 
only works well, of course when there's a single sound at a time in the 
bucket, more frequencies stacked on top of each other in a single bucket 
during a single sample interval means it can't actually do the 
approximation, and will result in scattering. But, in situations where 
the harmonics don't share a space, you can get a much more accurate 
picture of what the frequencies are. (rather than everything being a 
power of 2)

http://arxiv.org/PS_cache/arxiv/pdf/0903/0903.3080v1.pdf

in a similar (and somewhat amusing) vein, I discovered this way back as 
well:

http://www.cerlsoundgroup.org/Kelly/soundmorphing.html

-- 
~Mike

Post a reply to this message

From: Invisible
Subject: Re: More accurate spectrogram
Date: 23 May 2011 06:53:52
Message: <4dda3cc0$1@news.povray.org>

On 20/05/2011 16:30, Mike Raiford wrote:

> Turns out, by looking at the /phase/ of the signal in addition to the
> frequency and differentiating that with a previous sample and FFT
> bucket, you can find out where in that block the sound should lie. It
> only works well, of course when there's a single sound at a time in the
> bucket, more frequencies stacked on top of each other in a single bucket
> during a single sample interval means it can't actually do the
> approximation, and will result in scattering. But, in situations where
> the harmonics don't share a space, you can get a much more accurate
> picture of what the frequencies are. (rather than everything being a
> power of 2)

Wikipedia has this to say:

http://en.wikipedia.org/wiki/Reassignment_method

No, I still don't understand it.

Post a reply to this message

From: Invisible
Subject: Re: More accurate spectrogram
Date: 23 May 2011 11:10:51
Message: <4dda78fb@news.povray.org>

On 20/05/2011 16:30, Mike Raiford wrote:

> Turns out, by looking at the /phase/ of the signal in addition to the
> frequency and differentiating that with a previous sample and FFT
> bucket, you can find out where in that block the sound should lie. It
> only works well, of course when there's a single sound at a time in the
> bucket, more frequencies stacked on top of each other in a single bucket
> during a single sample interval means it can't actually do the
> approximation, and will result in scattering. But, in situations where
> the harmonics don't share a space, you can get a much more accurate
> picture of what the frequencies are. (rather than everything being a
> power of 2)

Fundamentally, almost all of DSP boils down to this:

   sin x + sin y == 2 cos (x - y)/2 sin (x + y)/2

This can alternatively be written as:

   sin x sin y == 1/2 (cos(x - y) - cos(x + y))

In summary, the sum of any two sine waves is equivalent to the product 
of two other sine waves. And so the product of any two sine waves is 
equivalent to the sum of two other sine waves. So every sum is also a 
product, and every product is also a sum.

In particular, there is no difference between

   a 600 Hz wave with an amplitude that modulates at 0.1 Hz

and

   a 600.05 Hz wave plus a 599.95 Hz wave

Either description accurately describes the exact same signal. To say 
that one is "more accurate" than the other is rather baseless.

Consequently, I can take Queen's Bohemian Rhapsody and split it into 
10-sample chunks, perform an FFT on each chunk, and I now have a 
representation that gives a temporal resolution of 226.8 microseconds, 
but a piffling 5-bucket spectral representation. Or I can take the 
entire 5:55 opus and Fourier-transform the entire thing as one gigantic 
chunk, giving me fifteen million, six hundred and fifty five thousand, 
five hundred frequency buckets, each with a width of 0.00261 Hz, but no 
indication of temporal location at all.

Both of these representations are completely accurate. You can perfectly 
reconstruct the original signal from either of them. Whether they are 
*useful* is another matter.

Generally, if I analyse the frequency spectrum of a sound, what I want 
to know is HOW THE HUMAN AUDITORY SYSTEM WILL PERCEIVE IT. The human ear 
definitely does *not* have the ability to distinguish 15,000,000 unique 
frequencies. (Wikipedia claims the Organ of Corti is serviced by 
approximately 20,000 nerve endings.) And the best information I can find 
suggests that the maximum nerve firing rate is something like 200 Hz or 
so, so 226.8 microseconds is far beyond the temporal resolution of human 
senses.

Basically, there is more than one way to split a signal into 
time-varying frequency amplitudes. The challenge is to figure out which 
way is the most useful.

Oh, and if you want to spectrally encode image or video data, that's a 
whole *other* box of frogs...

Post a reply to this message