|
|
|
|
|
|
| |
| |
|
|
|
|
| |
| |
|
|
>> The problem is when you apply a convolution to an image, you get a bigger
>> image (the size of the original image + the size of the convolution
>> window). The extra data around the edges is needed in order to do the
>> deconvolution. Obviously in a photo you don't get this information.
>
> This merely means that you can't deconvolve the edges properly.
I thought about this some more and don't agree with you. Take this 1D
example:
Convolution kernel: {1,2,1}
Source image: {a,b,c,d,e,f,g,h}
Result: {12,12,12,12,12,12,12,12}
Can you figure out *any* of the pixels in the source image?
If you write out the convolution by hand, the first pixel gives you one
equation with 3 unknowns - obviously you can't solve this. Each additional
pixel gives you one more equation and 1 more unknown. You can never solve
this!
In the mathematical world though, your source image is actually:
{0,0,...,0,0,a,b,c,d,e,f,g,h,0,0,...,0,0}
and when you apply your convolution you get edge data too.
With this extra information you can solve the deconvolution (because you
know the source pixels are zero outside of the image).
IRL the source data is not zero outside of your photo.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
Invisible wrote:
> Echo, focal blur and motion blur are all instances of convolution. That
> means, hypothetically, that by applying a suitable *deconvolution*, you
> should be able to get back the original signal.
>
> Question: Is it actually feasible to do this in the real world?
Without artifacting? No.
Now you could make a huge matrix (N x N, where N is the number of
elements in the image to be sharpened) and solve it. But for an image
of, say, typical embedded YouTube size, you are looking an matrix that
is on the order of 80K x 80k elements. Granted, most of the elements
are zero, but it's still a headache.
Regards,
John
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
John VanSickle wrote:
>
> Now you could make a huge matrix (N x N, where N is the number of
> elements in the image to be sharpened) and solve it. But for an image
> of, say, typical embedded YouTube size, you are looking an matrix that
> is on the order of 80K x 80k elements. Granted, most of the elements
> are zero, but it's still a headache.
In the particular case of deconvolution there are much more efficient
algorithms available. There's various techniques, but one insight that
is commonly exploited is that convolution in the spatial domain is
equivalent to multiplication in the Fourier domain. This in the absence
of noise (which is a *big* assumption) and ignoring image boundaries you
could compute (non-blind) deconvolution by using FFT on the image and
the kernel and dividing, which is a pretty efficient operation.
IIRC the approach above isn't used much in practice, largely because of
the noise assumptions, but some of the more sophisticated methods still
perform operations in frequency space for efficiency.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> This in the absence of noise (which is a *big* assumption) and ignoring
> image boundaries you could compute (non-blind) deconvolution by using FFT
> on the image and the kernel and dividing, which is a pretty efficient
> operation.
The problem is the convoluted version you have "the photo" is not made of
complex numbers (which it would be if you did the convolution yourself).
You need this in order to do the deconvolution, otherwise you don't have
enough information. See my 1D example in the other post.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
>> This in the absence of noise (which is a *big* assumption) and
>> ignoring image boundaries you could compute (non-blind) deconvolution
>> by using FFT on the image and the kernel and dividing, which is a
>> pretty efficient operation.
>
> The problem is the convoluted version you have "the photo" is not made
> of complex numbers (which it would be if you did the convolution
> yourself). You need this in order to do the deconvolution, otherwise you
> don't have enough information. See my 1D example in the other post.
I'm a bit confused by what you mean here. If both the image and the
kernel are real then complex numbers would only be used in the frequency
space versions of the images, once you apply the IFFT everything should
be real again. I don't see how you'd need a (non-trivial) complex
version of the image anywhere.
Your 1D example seemed to be about the image boundaries rather than the
image being represented by real/complex numbers. The point about the
image boundaries is true, and will prevent an exact deconvolution as you
pointed out. You can still often get a good approximate solution by
either assuming boundary conditions or by using an image prior. I
believe the latter is these is what's generally used in practice for
image deblurring.
In the case of the most basic FFT method for convolution, the boundary
conditions for the image would be implicitly cyclic. I should point out
that (IIRC) the naive deconvolution method of x=ifft(fft(y)/fft(k))
generally doesn't work well in practice, but the idea of using the
Fourier transform to accelerate the computations still carries across to
more sophisticated approaches.
Unless I'm missing something somewhere?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> Your 1D example seemed to be about the image boundaries rather than the
> image being represented by real/complex numbers. The point about the
> image boundaries is true, and will prevent an exact deconvolution as you
> pointed out. You can still often get a good approximate solution by
> either assuming boundary conditions or by using an image prior. I
> believe the latter is these is what's generally used in practice for
> image deblurring.
The example I was referring to was this one:
Convolution kernel: {1,2,1}
Source image: {a,b,c,d,e,f,g,h}
Result: {12,12,12,12,12,12,12,12}
What is the source image?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
>> Your 1D example seemed to be about the image boundaries rather than
>> the image being represented by real/complex numbers. The point about
>> the image boundaries is true, and will prevent an exact deconvolution
>> as you pointed out. You can still often get a good approximate
>> solution by either assuming boundary conditions or by using an image
>> prior. I believe the latter is these is what's generally used in
>> practice for image deblurring.
>
> The example I was referring to was this one:
>
> Convolution kernel: {1,2,1}
> Source image: {a,b,c,d,e,f,g,h}
> Result: {12,12,12,12,12,12,12,12}
>
> What is the source image?
>
I think my last post was not clear enough. I'm agreeing with you that
you can't exactly recover the source image in your example. What I'm
disagreeing with are two things:
1) This has anything to do with complex numbers, even if you use the FFT
method to recover the source image.
2) This prevents you from getting an approximate answer in practice.
Both of these assertions can be tested pretty easily, so I'll treat them
in turn. Also, sorry for the long post, but it seemed useful to go into
a fair bit of detail.
---------- (1) complex numbers aren't the issue
First of all, let's try the FFT method for solving for the source image.
This code should run in both matlab and octave:
n = 8; % size of images
k = [1 2 1]; % kernel
k2 = [zeros(1,ceil((n-3)/2)) k zeros(1,floor((n-3)/2))]; % padded kernel
y = 12 + zeros(1,n); % result image
disp('source image (FFT method)');
ifft(fft(y)./(fft(k2)+1e-8)) % +1e-8 is a regularization hack
>>>>>> result:
{3 3 3 3 3 3 3 3}
So despite not having any complex numbers in "the photo" (i.e. the
result image) the method worked just fine and gave us a pretty
reasonable looking result.
"But there's not enough information to derive a result!" I imagine
you're thinking. This is correct, so clearly something was happening
behind the scenes that allows the method to actually compute a source
image. The answer is that the FFT method makes an implicit assumption
about what happens beyond the boundaries of the result image by treating
the image as cyclic. Thus the source image isn't just
{a,b,c,d,e,f,g,h}, it's actually treated as
{...,f,g,h,a,b,c,d,e,f,g,h,a,b,c,...}, which is what allows this method
to work.
This is the point I was making earlier that one way to solve the problem
is to make an assumption about what happens beyond the boundaries of the
image. A natural next question to ask is what sort of errors this
outright assumption might give in the reconstructed source image.
---------- (2) approximate solutions are possible
Let's look at the source images reconstructed under three different
"reasonable" assumptions about what happens beyond the boundaries of the
image:
cyclic: {...,f,g,h,a,b,c,d,e,f,g,h,a,b,c,...}
zero: {...,0,0,0,a,b,c,d,e,f,g,h,0,0,0,...}
full-intensity: {...,12,12,12,a,b,c,d,e,f,g,h,12,12,12,...}
For each case the resulting source images are:
cyclic-source: {3 3 3 3 3 3 3 3}
zero-source: {5.3333, 1.3333, 4.0000, 2.6667, 2.6667, 4.0000, 1.3333,
5.3333}
full-intensity-source: {4.5556, 1.8889, 3.6667, 2.7778, 2.7778, 3.6667,
1.8889, 4.5556}
Clearly the different assumptions have given rise to different
reconstructed images. Nevertheless, notice that the center pixels in
each of the three cases take the values of 3.0, 2.67, and 2.78. Despite
the fact that the assumptions for what happened beyond the boundary
range from 0 to 12, the value of the center pixels only ranges from 2.67
to 3.0 -- a factor of 36 less.
Basically, an incorrect assumption about what happens beyond the edges
of the image resulted in errors concentrated at the edges of the
reconstructed image, so in the center of the source image the FFT method
actually can actually give a pretty accurate answer. For a larger image
(and the same kernel) the center values will be even more accurate.
---------- side notes
I want to emphasize that the deblurring method I gave is only to
illustrate the basic theoretical principles behind using FFT to
accelerate the deconvolution process. In practice inaccurate kernels
and image noise will result in a pretty poor result if you just use the
method I gave. However, people can and have developed pretty good
deblurring methods which are more sophisticated and do work well in
practice. In many (but not all) of these approaches, using the FFT and
performing some operations in frequency space is very useful for
speeding up the algorithm.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> 1) This has anything to do with complex numbers, even if you use the FFT
> method to recover the source image.
Agreed, I incorrectly thought that if you had some complex version of the
resultant you would have more information to regenerate the source, but with
real source image and real kernel that is obviously not possible.
> 2) This prevents you from getting an approximate answer in practice.
> >>>>>> result:
>
> {3 3 3 3 3 3 3 3}
>
> cyclic-source: {3 3 3 3 3 3 3 3}
>
> zero-source: {5.3333, 1.3333, 4.0000, 2.6667, 2.6667, 4.0000, 1.3333,
> 5.3333}
>
> full-intensity-source: {4.5556, 1.8889, 3.6667, 2.7778, 2.7778, 3.6667,
> 1.8889, 4.5556}
My point was that the following are all valid solutions to the source image:
{3,3,3,3,3,3,3,3}
{2,4,2,4,2,4,2,4}
{1,5,1,5,1,5,1,5}
{6,0,6,0,6,0,6,0}
In this case even the centre pixels could be anywhere from 0 to 6, and the
"appearance" of the image is radically different in each case. Surely this
is exactly the sort of detail a "deblurring" filter needs to recover, but it
seems impossible.
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
scott wrote:
>
> My point was that the following are all valid solutions to the source
> image:
>
> {3,3,3,3,3,3,3,3}
> {2,4,2,4,2,4,2,4}
> {1,5,1,5,1,5,1,5}
> {6,0,6,0,6,0,6,0}
>
> In this case even the centre pixels could be anywhere from 0 to 6, and
> the "appearance" of the image is radically different in each case.
> Surely this is exactly the sort of detail a "deblurring" filter needs to
> recover, but it seems impossible.
Ok, I think I better understand what you were asking now. To answer it
I think I should get into some stuff I had alluded to earlier (i.e. "use
an image prior"). This will probably be another semi-lengthy post, and
I'll break it into two parts:
1) Looking at your example in the frequency domain to get a good
theoretical grasp of what's happening.
2) Some discussion of the practical (as opposed to theoretical) issues
and solutions.
---------- 1) theoretical discussion
Throughout this section I'll be assuming that the image is "cyclic" so
that it's easy to analyze it in the frequency domain. Fortunately your
examples above still work in a cyclic image.
As you've noted, it's possible to reconstruct multiple source images in
your example. The reason for this is that you happened to choose an
"unlucky" kernel. The reason why it's unlucky can be seen pretty easily
by looking at the kernel in frequency space.
In using the FFT on the kernel, I has to pad the kernel to make it the
same size as the image, yielding the padded kernel:
k = {0, 0, 0, 1, 2, 1, 0, 0}
If we now calculate abs(fft(k)), we get:
kf = {4.0, 3.4, 2.0, 0.6, 0.0, 0.6, 2.0, 3.4}
The key thing to notice here is that this contains a element which is
zero. This means that the convolution kernel will "kill" certain
frequencies in the source image. Basically, we'll get the same result
image for any source-image + c*ifft([0 0 0 0 1 0 0 0]). Unsurprisingly,
if we calculate 8*ifft([0 0 0 0 1 0 0 0]) we get:
{1, -1, 1, -1, 1, -1, 1, -1}
Which perfectly explains the nature of the examples you gave which all
convolve to the same result image. IIRC most kernels in practice won't
contain exact zeros like this, and it turns out that even a slightly
different version of your kernel wouldn't have had this problem:
abs(fft([0 0 0 0.01 1.98 1.01 0 0]))
= {4.0, 3.4, 1.98, 0.55, 0.04, 0.55, 1.98, 3.4}
It also wouldn't work on images which are an odd number of pixels across
(still assuming a cyclic image of course), which is possible why you
chose an image that was an even number of pixels across.
In the code snipped I gave to deconvolve your example, I dealt with this
issue in a very simple way:
ifft(fft(y)./(fft(k2)+1e-8))
It's the +1e-8 that was the key here, and in your example it's necessary
to prevent a divide by zero. The effect of this is that the
deconvolution didn't "hallucinate" any frequencies that weren't in the
result image, even if they would technically also be correct
deconvolutions. This is why it gave the {3, 3, 3, 3, 3, 3, 3, 3} answer
rather than one of the others you gave.
---------- 2) practical discussion
As far as I'm aware in practice having a kernel like yours which happens
to kill certain "unlucky" frequencies isn't really an issue. What can
be an issue is that your kernel will suppress high frequencies to the
point that they fall below the accuracy of your camera sensor (or even
numerical precision). This is why undoing a depth of field blur won't
always be possible (even if everything is at the same depth), since the
blur kernel in that case heavily suppresses high frequencies. This
means that only a limited amount of deblurring is possible (as an aside,
there's also been work on designing camera apertures so as to minimize
this).
Fortunately in the case of blur from camera shake the kernel is normally
a bit more kind and doesn't destroy high frequencies quite so much.
What is a much bigger issue is that you don't know what the blur kernel
was, and getting it wrong can induce artifacts into the image.
Fortunately, we can do quite a bit by assuming a good image prior. The
artifacts produced by an incorrect deconvolution tend to pretty
obviously not look like any "real" image. There's some examples which a
good explanation here:
http://www.mathworks.com/products/demos/image/ipexblind/ipexblind.html
The best image deblurring algorithms I know use this insight pretty
heavily, and optimize a deconvolution by comparing how the statistics of
the deconvolved image compare the expected statistics of "real" images.
It turns out that even pretty weak models of what real images look
like can help quite a bit and often give surprisingly good results.
In some sense the simple deconvolution method I gave was also implicitly
using an image prior which favors smoother images (for a very particular
definition of smoothness).
Did this answer your questions?
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
| |
|
|
> If we now calculate abs(fft(k)), we get:
>
> kf = {4.0, 3.4, 2.0, 0.6, 0.0, 0.6, 2.0, 3.4}
>
> The key thing to notice here is that this contains a element which is
> zero. This means that the convolution kernel will "kill" certain
> frequencies in the source image.
OK that makes sense.
> As far as I'm aware in practice having a kernel like yours which happens
> to kill certain "unlucky" frequencies isn't really an issue. What can be
> an issue is that your kernel will suppress high frequencies to the point
> that they fall below the accuracy of your camera sensor (or even numerical
> precision).
Yes, this was my expectation of what would happen in a practical situation.
I would imagine a focal-blur kernel would suppress the high frequencies,
such that the information was either impossible to recover (due to noise
levels) or what was recoverable would be heavily aliased due to the limited
precision.
> This is why undoing a depth of field blur won't always be possible (even
> if everything is at the same depth),
This was my initial thought in response to Andrew's original statement that
you could get back the original signal because it was a convolution.
> Fortunately in the case of blur from camera shake the kernel is normally a
> bit more kind and doesn't destroy high frequencies quite so much. What is
> a much bigger issue is that you don't know what the blur kernel was, and
> getting it wrong can induce artifacts into the image.
Yes, often you can even see a bright light source in the resultant image and
how it has moved (maybe a zig-zag pattern due to camera shake) - you could
probably use this as a basis for the kernel, maybe optimising the kernel
until the resultant image had the light source as small as possible.
> Did this answer your questions?
Yes thanks!
Post a reply to this message
|
|
| |
| |
|
|
|
|
| |
|
|