|
|
Part of MP3 encoding involves a DFT. Traditionally is this done with an
FFT, or a DFT? and if DFT, why? From what I gather FFT can be a bit more
accurate due to fewer rounding errors incurred from fewer calculations.
No. I'm not going to attempt to write an MP3 encoder. I'm just
exercising my curiosity.
--
~Mike
Post a reply to this message
|
|
|
|
Mike Raiford wrote:
> Part of MP3 encoding involves a DFT. Traditionally is this done with an
> FFT, or a DFT? and if DFT, why? From what I gather FFT can be a bit more
> accurate due to fewer rounding errors incurred from fewer calculations.
>
> No. I'm not going to attempt to write an MP3 encoder. I'm just
> exercising my curiosity.
- An FFT *is* (one implementation of) the DFT.
- Theoretically, both the correlation algorithm and the FFT algorithm
produce exactly the same numbers. In practice, FFT is obviously faster
and has fewer rounding errors.
- You will find lots of codecs use a Discrete Cosine Transform (DCT)
instead of a Fourier transform. This is a slight modification of the
algorithm that uses only cosine waves. (Not sinewaves, because then
you'd have no way to record any DC offset!) There are several slightly
different variants of DCT. I forget exactly what its advantage is...
- MP3 doesn't actually use any of these methods. It uses a custom
filterbank which works *like* a Fourier transform, but faster. (And less
accurate. If you transform, then immediately transform back, you get
about 3% distortion, before you've even "done" anything to the signal.
With Fourier, you theoretically get 0% distortion.)
- MP3 (and most audio codecs) work like this:
- Chop the audio into "windows".
- Transform each window into the frequency domain using some
Fourier-like technique. (Typically not the vanilla DFT/FFT.)
- Measure the frequency components, and figure out which ones are the
"most important".
- Record the strength of each frequency component using a number of
bits propertional to how "important" it is. IOW unimportant frequencies
get recorded very inaccurately, but important ones have high accuracy.
- Use normal lossless compression to shoehorn all this data into the
smallest possible space.
- Notice that "unimportant" frequencies aren't filtered out. They're
still there, still at approximately the same loudness. It's just that
the loudness isn't recorded very accurately. (This means that the
overall "balance" of the music isn't screwed up too badly, even if the
detail is gone.)
- If you listen to heavily MP3-compressed audio, you'll hear the
loudness of different parts (especially the trebble) sharply rising and
falling, rather than smoothly changing. You're hearing the
quantinisation steps in the coder.
- Vorbis uses a version of the DCT where each chunk of data overlaps the
next one. This reduces the "edge effects" where one window ends and the
next begins.
- Wavelets would also seem very applicable here - but there's lots of
patents FUD surrounding that technology.
Post a reply to this message
|
|