POV-Ray : Newsgroups : povray.off-topic : MP3, so then ... : Re: MP3, so then ... Server Time
6 Sep 2024 09:18:55 EDT (-0400)
  Re: MP3, so then ...  
From: Invisible
Date: 19 Jan 2009 10:28:29
Message: <49749c1d$1@news.povray.org>
Mike Raiford wrote:
> Part of MP3 encoding involves a DFT. Traditionally is this done with an 
> FFT, or a DFT? and if DFT, why? From what I gather FFT can be a bit more 
> accurate due to fewer rounding errors incurred from fewer calculations.
> 
> No. I'm not going to attempt to write an MP3 encoder. I'm just 
> exercising my curiosity.

- An FFT *is* (one implementation of) the DFT.

- Theoretically, both the correlation algorithm and the FFT algorithm 
produce exactly the same numbers. In practice, FFT is obviously faster 
and has fewer rounding errors.

- You will find lots of codecs use a Discrete Cosine Transform (DCT) 
instead of a Fourier transform. This is a slight modification of the 
algorithm that uses only cosine waves. (Not sinewaves, because then 
you'd have no way to record any DC offset!) There are several slightly 
different variants of DCT. I forget exactly what its advantage is...

- MP3 doesn't actually use any of these methods. It uses a custom 
filterbank which works *like* a Fourier transform, but faster. (And less 
accurate. If you transform, then immediately transform back, you get 
about 3% distortion, before you've even "done" anything to the signal. 
With Fourier, you theoretically get 0% distortion.)

- MP3 (and most audio codecs) work like this:
   - Chop the audio into "windows".
   - Transform each window into the frequency domain using some 
Fourier-like technique. (Typically not the vanilla DFT/FFT.)
   - Measure the frequency components, and figure out which ones are the 
"most important".
   - Record the strength of each frequency component using a number of 
bits propertional to how "important" it is. IOW unimportant frequencies 
get recorded very inaccurately, but important ones have high accuracy.
   - Use normal lossless compression to shoehorn all this data into the 
smallest possible space.

- Notice that "unimportant" frequencies aren't filtered out. They're 
still there, still at approximately the same loudness. It's just that 
the loudness isn't recorded very accurately. (This means that the 
overall "balance" of the music isn't screwed up too badly, even if the 
detail is gone.)

- If you listen to heavily MP3-compressed audio, you'll hear the 
loudness of different parts (especially the trebble) sharply rising and 
falling, rather than smoothly changing. You're hearing the 
quantinisation steps in the coder.

- Vorbis uses a version of the DCT where each chunk of data overlaps the 
next one. This reduces the "edge effects" where one window ends and the 
next begins.

- Wavelets would also seem very applicable here - but there's lots of 
patents FUD surrounding that technology.


Post a reply to this message

Copyright 2003-2023 Persistence of Vision Raytracer Pty. Ltd.