|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
So,
I was watching Mythbusters the other day, and they were testing movie
sounds against the real thing. I noticed the sound engineer they hired
to record and analyze the sounds had an unusual looking spectrogram of
the sound. It looked almost like a woven tapestry of frequencies, rather
than the usual buckets of FFT information. I had seen it before when I
was looking at FFT filtering and analysis.
Turns out, by looking at the /phase/ of the signal in addition to the
frequency and differentiating that with a previous sample and FFT
bucket, you can find out where in that block the sound should lie. It
only works well, of course when there's a single sound at a time in the
bucket, more frequencies stacked on top of each other in a single bucket
during a single sample interval means it can't actually do the
approximation, and will result in scattering. But, in situations where
the harmonics don't share a space, you can get a much more accurate
picture of what the frequencies are. (rather than everything being a
power of 2)
http://arxiv.org/PS_cache/arxiv/pdf/0903/0903.3080v1.pdf
in a similar (and somewhat amusing) vein, I discovered this way back as
well:
http://www.cerlsoundgroup.org/Kelly/soundmorphing.html
--
~Mike
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 20/05/2011 16:30, Mike Raiford wrote:
> Turns out, by looking at the /phase/ of the signal in addition to the
> frequency and differentiating that with a previous sample and FFT
> bucket, you can find out where in that block the sound should lie. It
> only works well, of course when there's a single sound at a time in the
> bucket, more frequencies stacked on top of each other in a single bucket
> during a single sample interval means it can't actually do the
> approximation, and will result in scattering. But, in situations where
> the harmonics don't share a space, you can get a much more accurate
> picture of what the frequencies are. (rather than everything being a
> power of 2)
Wikipedia has this to say:
http://en.wikipedia.org/wiki/Reassignment_method
No, I still don't understand it.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
On 20/05/2011 16:30, Mike Raiford wrote:
> Turns out, by looking at the /phase/ of the signal in addition to the
> frequency and differentiating that with a previous sample and FFT
> bucket, you can find out where in that block the sound should lie. It
> only works well, of course when there's a single sound at a time in the
> bucket, more frequencies stacked on top of each other in a single bucket
> during a single sample interval means it can't actually do the
> approximation, and will result in scattering. But, in situations where
> the harmonics don't share a space, you can get a much more accurate
> picture of what the frequencies are. (rather than everything being a
> power of 2)
Fundamentally, almost all of DSP boils down to this:
sin x + sin y == 2 cos (x - y)/2 sin (x + y)/2
This can alternatively be written as:
sin x sin y == 1/2 (cos(x - y) - cos(x + y))
In summary, the sum of any two sine waves is equivalent to the product
of two other sine waves. And so the product of any two sine waves is
equivalent to the sum of two other sine waves. So every sum is also a
product, and every product is also a sum.
In particular, there is no difference between
a 600 Hz wave with an amplitude that modulates at 0.1 Hz
and
a 600.05 Hz wave plus a 599.95 Hz wave
Either description accurately describes the exact same signal. To say
that one is "more accurate" than the other is rather baseless.
Consequently, I can take Queen's Bohemian Rhapsody and split it into
10-sample chunks, perform an FFT on each chunk, and I now have a
representation that gives a temporal resolution of 226.8 microseconds,
but a piffling 5-bucket spectral representation. Or I can take the
entire 5:55 opus and Fourier-transform the entire thing as one gigantic
chunk, giving me fifteen million, six hundred and fifty five thousand,
five hundred frequency buckets, each with a width of 0.00261 Hz, but no
indication of temporal location at all.
Both of these representations are completely accurate. You can perfectly
reconstruct the original signal from either of them. Whether they are
*useful* is another matter.
Generally, if I analyse the frequency spectrum of a sound, what I want
to know is HOW THE HUMAN AUDITORY SYSTEM WILL PERCEIVE IT. The human ear
definitely does *not* have the ability to distinguish 15,000,000 unique
frequencies. (Wikipedia claims the Organ of Corti is serviced by
approximately 20,000 nerve endings.) And the best information I can find
suggests that the maximum nerve firing rate is something like 200 Hz or
so, so 226.8 microseconds is far beyond the temporal resolution of human
senses.
Basically, there is more than one way to split a signal into
time-varying frequency amplitudes. The challenge is to figure out which
way is the most useful.
Oh, and if you want to spectrally encode image or video data, that's a
whole *other* box of frogs...
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|