| 
|  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
The experiment from Hernan Badino was redone. You can see it there ...
	<https://www.youtube.com/watch?v=fqWdSfN9FiA> Source
The main interest is that video is looping, and the result is almost :
	<https://www.youtube.com/watch?v=0ZPJmnBh03M> Reworked
Well, Hernan Badino is moving his head when he is walking, so the
reconstructed trajectory is not perfectly looping at the end. But
we can reconstruct the movement almost perfectly. We use OpenCV
for image processing, and POV-Ray for 3D representation. We have
to determine projective dominant motion in the video with a
reference image, and change it when correlation drops below 80%.
We have a 3D inertial model of motion, that's why POV-Ray helps =)
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
Francois LE COAT writes:
> The experiment from Hernan Badino was redone. You can see it there...
> 
> 
> The main interest is that video is looping, and the result is almost:
> 
> 
> Well, Hernan Badino is moving his head when he is walking, so the
> reconstructed trajectory is not perfectly looping at the end. But
> we can reconstruct the movement almost perfectly. We use OpenCV
> for image processing, and POV-Ray for 3D representation. We have
> to determine projective dominant motion in the video with a
> reference image, and change it when correlation drops below 80%.
> 
> We have a 3D inertial model of motion, that's why POV-Ray helps =)
Three drones are flying between forests of trees. Thanks to the
optical-flow (DIS OpenCV) measured on successive images, the
"temporal disparity" reveals the forest of trees (3rd dimension)...
<https://www.youtube.com/watch?v=QP75EeFVyOI> 1st drone
<https://www.youtube.com/watch?v=fp5Z1Nu4Hko> 2nd drone
<https://www.youtube.com/watch?v=fLxE8iS7fPI> 3rd drone
The interest with the forest is that trajectories are curved, in
order to avoid obstacles. It is measured thanks to a projective
transform, and represented with <Ry,Rz,Tx,Tz> thanks to POV-Ray.
The evolution of the drone is shown in front-view with its camera.
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
Francois LE COAT writes:
>> The experiment from Hernan Badino was redone. You can see it there...
>>
>>
>> The main interest is that video is looping, and the result is almost:
>>
d
>>
>> Well, Hernan Badino is moving his head when he is walking, so the
>> reconstructed trajectory is not perfectly looping at the end. But
>> we can reconstruct the movement almost perfectly. We use OpenCV
>> for image processing, and POV-Ray for 3D representation. We have
>> to determine projective dominant motion in the video with a
>> reference image, and change it when correlation drops below 80%.
>>
>> We have a 3D inertial model of motion, that's why POV-Ray helps =)
> 
> Three drones are flying between forests of trees. Thanks to the
> optical-flow (DIS OpenCV) measured on successive images, the
> "temporal disparity" reveals the forest of trees (3rd dimension)...
> 
> <https://www.youtube.com/watch?v=QP75EeFVyOI> 1st drone
> <https://www.youtube.com/watch?v=fp5Z1Nu4Hko> 2nd drone
> <https://www.youtube.com/watch?v=fLxE8iS7fPI> 3rd drone
> 
> The interest with the forest is that trajectories are curved, in
> order to avoid obstacles. It is measured thanks to a projective
> transform, and represented with <Ry,Rz,Tx,Tz> thanks to POV-Ray.
> The evolution of the drone is shown in front-view with its camera.
It is possible to perceive the relief (in depth) of a scene, when we
have at least two different viewpoints of it. Here is a new example with
a drone flying in the middle of a forest of trees, and from which we
process the video stream from the embedded camera...
<https://www.youtube.com/watch?v=WJ20EBM3PTc>
When the two views of the same scene are distant in space, we speak
of "spatial disparity". In the present case, the two viewpoints are
distant in time, and we then speak of "temporal disparity". This
involves knowing whether the two images of the same scene are acquired
simultaneously, or delayed in time. We can perceive the relief in depth
in this case, with a single camera and its continuous video stream.
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Francois LE COAT <lec### [at] atari org> wrote:
> It is possible to perceive the relief (in depth) of a scene, when we
> have at least two different viewpoints of it. Here is a new example with
> a drone flying in the middle of a forest of trees, and from which we
> process the video stream from the embedded camera...
>
> <https://www.youtube.com/watch?v=WJ20EBM3PTc>
>
> When the two views of the same scene are distant in space, we speak
> of "spatial disparity". In the present case, the two viewpoints are
> distant in time, and we then speak of "temporal disparity". This
> involves knowing whether the two images of the same scene are acquired
> simultaneously, or delayed in time. We can perceive the relief in depth
> in this case, with a single camera and its continuous video stream.
Francois,
Although I believe that I understand the general idea of what you're doing in
your work, it's a bit difficult to fully grasp the details from the video.
I'm assuming that you're taking the first frame as a reference image, and then
reorienting the second frame to optimize the registration.
Then you're using the OpenCV to do some sort of photogrammetry to generate a
projection matrix from the 2D image, and extract/back-calculate 3D data from
that matrix for everything in the frame.
Then you move to the pair of 2nd frame + 3rd frame and repeat the process.
Is this correct?
I'm also wondering if you could create a 3D rendering with the data you're
extracting, and maybe a 2D orthographic overhead map of the scene that the
drones are flying through, mapping the position of the drones in the forest.
As always, I am very interested in understanding the details of what you're
doing.
This is great work, and I hope you are properly recognized for your
achievements.
- BW Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
Bald Eagle writes:
> Francois LE COAT wrote:
>> It is possible to perceive the relief (in depth) of a scene, when we
>> have at least two different viewpoints of it. Here is a new example wi
th
>> a drone flying in the middle of a forest of trees, and from which we
>> process the video stream from the embedded camera...
>>
>> <https://www.youtube.com/watch?v=WJ20EBM3PTc>
>>
>> When the two views of the same scene are distant in space, we speak
>> of "spatial disparity". In the present case, the two viewpoints are
>> distant in time, and we then speak of "temporal disparity". This
>> involves knowing whether the two images of the same scene are acquired
>> simultaneously, or delayed in time. We can perceive the relief in dept
h
>> in this case, with a single camera and its continuous video stream.
> 
> Francois,
> 
> Although I believe that I understand the general idea of what you're do
ing in
> your work, it's a bit difficult to fully grasp the details from the vid
eo.
> 
> I'm assuming that you're taking the first frame as a reference image, a
nd then
> reorienting the second frame to optimize the registration.
> Then you're using the OpenCV to do some sort of photogrammetry to gener
ate a
> projection matrix from the 2D image, and extract/back-calculate 3D data
 from
> that matrix for everything in the frame.
> 
> Then you move to the pair of 2nd frame + 3rd frame and repeat the proce
ss.
> 
> Is this correct?
I have the first reference image, and the second one, from which I
determine the optical-flow. That means every pixel integer displacement
between an image, to the other. That gives a vector field that can
be approximated by a global projective transformation. That means eight
parameters in rotation and translation, that match the two images the
best. The quality of the images' matching is measured with correlation,
between the first image and the second (projectively) transformed.
I repeat the treatments with the third, the fourth that gives me a
decreasing correlation value... Until correlation drops below 60%.
Then when correlation is too low, I take a new reference image (the
last good matching) let's admit fourth, and continue to determine
good matching (>60% correlation) with fifth, sixth, and so forth ...
That gives me reliable 8 projective transform parameters between 1st
and 4th images, because of the good correlation. That also gives me
the reliable vector field of the optical-flow, that can be named
"temporal disparity" horizontal and vertical. That's what you're seeing
in grey levels images.
Disparity field which is an integer displacement between images, is
linked to depth in the image (3rd dimension) with an inverse relation
(depth=base/disparity). That means we can evaluate image's depth from
a continuous video stream.
> I'm also wondering if you could create a 3D rendering with the data you
're
> extracting, and maybe a 2D orthographic overhead map of the scene that 
the
> drones are flying through, mapping the position of the drones in the fo
rest.
The disparity measurements are a physical approximation (optical
flow) that is imperfect, and is constantly improved with science
researches. I'm always interested in the progress with this domain.
But for the moment we must have a compromise between quality and
computing time. All is not so perfect, that we can imagine what
you're wishing, from the state of the art...
> As always, I am very interested in understanding the details of what yo
u're
> doing.
I hope you understood it better. I may be a little confuse...
> This is great work, and I hope you are properly recognized for your
> achievements.
> 
> - BW
It's a long work since I can show some promising results. Now I'm
happy to share it with a larger audience. Unfortunately all the
good people that worked with me, are not here to appreciate. That's
why I'm surprised with your interest =)
Thanks for your attention.
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Francois LE COAT <lec### [at] atari org> wrote:
> I have the first reference image, and the second one, from which I
> determine the optical-flow. That means every pixel integer displacement
> between an image, to the other. That gives a vector field that can
> be approximated by a global projective transformation. That means eight
> parameters in rotation and translation, that match the two images the
> best. The quality of the images' matching is measured with correlation,
> between the first image and the second (projectively) transformed.
So, you have this vector field that - sort of maps the pixels in the second
frame to the pixels in the first frame.
And then somehow the projection matrix is calculated from that.
> Disparity field which is an integer displacement between images, is
> linked to depth in the image (3rd dimension) with an inverse relation
> (depth=base/disparity). That means we can evaluate image's depth from
> a continuous video stream.
Presumably when an object gets closer, the image gets "scaled up" in the frame,
and you use that to calculate the distance of the object from the camera.
> > I'm also wondering if you could create a 3D rendering with the data you
> 're
> > extracting, and maybe a 2D orthographic overhead map of the scene that
> the
> > drones are flying through, mapping the position of the drones in the fo
> rest.
> All is not so perfect, that we can imagine what
> you're wishing, from the state of the art...
Well, I'm just thinking that you must have an approximate idea of where each
tree is, given that you calculate a projection matrix and know something about
the depth.  So I was just wondering if, given that information, you could simply
place a cylinder of the approximate diameter and at the right depth.
> It's a long work since I can show some promising results. Now I'm
> happy to share it with a larger audience. Unfortunately all the
> good people that worked with me, are not here to appreciate. That's
> why I'm surprised with your interest =)
Well, I have been interested in the fundamentals of photogrammetry for quite
some time.  And I have been following your work for the last several years,
hoping to learn how to create such projection matrices and apply them.
https://news.povray.org/povray.advanced-users/thread/%3C5be592ea%241%40news.povray.org%3E/
I don't work in academia or the graphics industry, so I only have what free time
that I can devote to independently learning this stuff on my own.
Even if I were to simply use a POV-Ray scene, where I rendered two images with
different camera locations - then I'm assuming that I could calculate a vector
field and a projection matrix. (something simple like cubes, spheres, and
cylinders)
Given The projection matrix and one of the two renders, would I then have the
necessary and sufficient information to write a .pov scene to recreate the
render from scratch?
- BW Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
Bald Eagle writes:
> Francois LE COAT wrote:
>> I have the first reference image, and the second one, from which I
>> determine the optical-flow. That means every pixel integer displacemen
t
>> between an image, to the other. That gives a vector field that can
>> be approximated by a global projective transformation. That means eigh
t
>> parameters in rotation and translation, that match the two images the
>> best. The quality of the images' matching is measured with correlation
,
>> between the first image and the second (projectively) transformed.
> 
> So, you have this vector field that - sort of maps the pixels in the se
cond
> frame to the pixels in the first frame.
> And then somehow the projection matrix is calculated from that.
> 
>> Disparity field which is an integer displacement between images, is
>> linked to depth in the image (3rd dimension) with an inverse relation
>> (depth=base/disparity). That means we can evaluate image's depth fro
m
>> a continuous video stream.
> 
> Presumably when an object gets closer, the image gets "scaled up" in th
e frame,
> and you use that to calculate the distance of the object from the camer
a.
> 
>>> I'm also wondering if you could create a 3D rendering with the data y
ou're
>>> extracting, and maybe a 2D orthographic overhead map of the scene tha
t the
>>> drones are flying through, mapping the position of the drones in the 
forest.
> 
>> All is not so perfect, that we can imagine what
>> you're wishing, from the state of the art...
> 
> Well, I'm just thinking that you must have an approximate idea of where
 each
> tree is, given that you calculate a projection matrix and know somethin
g about
> the depth.  So I was just wondering if, given that information, you cou
ld simply
> place a cylinder of the approximate diameter and at the right depth.
> 
>> It's a long work since I can show some promising results. Now I'm
>> happy to share it with a larger audience. Unfortunately all the
>> good people that worked with me, are not here to appreciate. That's
>> why I'm surprised with your interest =)
> 
> Well, I have been interested in the fundamentals of photogrammetry for 
quite
> some time.  And I have been following your work for the last several ye
ars,
> hoping to learn how to create such projection matrices and apply them.
> 
> https://news.povray.org/povray.advanced-users/thread/%3C5be592ea%241%40
news.povray.org%3E/
> 
> I don't work in academia or the graphics industry, so I only have what 
free time
> that I can devote to independently learning this stuff on my own.
> 
> Even if I were to simply use a POV-Ray scene, where I rendered two imag
es with
> different camera locations - then I'm assuming that I could calculate a
 vector
> field and a projection matrix. (something simple like cubes, spheres, a
nd
> cylinders)
> 
> Given The projection matrix and one of the two renders, would I then ha
ve the
> necessary and sufficient information to write a .pov scene to recreate 
the
> render from scratch?
> 
> - BW
I understand your question. The problem is I'm far from reconstituting
a 3D scene from the monocular information I have at the moment. I know
a company which is doing this sort of application, called Stereolabs...
	<https://www.stereolabs.com/>
I'm far from their perfect 3D acquisition process. And it is obtained
with two cameras. I only have one camera, and a video stream that I
didn't acquired myself. Is there an interest, and applications? I'm
not at this step in my work.
I know that similar monocular image processing have been used on planet
Mars, because the helicopter only had one piloting camera, for weight
and embedding reasons.
The main goal at this point of the work, is to show that we could
eventually do the same job, using many cameras, or with only one.
But I'm far from obtaining a similar result, as elaborated systems,
like with Stereolabs camera ZED, for instance.
That is already being done perfectly with a stereoscopic system...
Do you understand? Thanks for your attention.
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Francois LE COAT <lec### [at] atari org> wrote:
> I have the first reference image, and the second one, from which I
> determine the optical-flow. That means every pixel integer displacement
> between an image, to the other. That gives a vector field that can
> be approximated by a global projective transformation. That means eight
> parameters in rotation and translation, that match the two images the
> best. The quality of the images' matching is measured with correlation,
> between the first image and the second (projectively) transformed.
So, you have this vector field that - sort of maps the pixels in the second
frame to the pixels in the first frame.
And then somehow the projection matrix is calculated from that.
> Disparity field which is an integer displacement between images, is
> linked to depth in the image (3rd dimension) with an inverse relation
> (depth=base/disparity). That means we can evaluate image's depth from
> a continuous video stream.
Presumably when an object gets closer, the image gets "scaled up" in the frame,
and you use that to calculate the distance of the object from the camera.
> > I'm also wondering if you could create a 3D rendering with the data you
> 're
> > extracting, and maybe a 2D orthographic overhead map of the scene that
> the
> > drones are flying through, mapping the position of the drones in the fo
> rest.
> All is not so perfect, that we can imagine what
> you're wishing, from the state of the art...
Well, I'm just thinking that you must have an approximate idea of where each
tree is, given that you calculate a projection matrix and know something about
the depth.  So I was just wondering if, given that information, you could simply
place a cylinder of the approximate diameter and at the right depth.
> It's a long work since I can show some promising results. Now I'm
> happy to share it with a larger audience. Unfortunately all the
> good people that worked with me, are not here to appreciate. That's
> why I'm surprised with your interest =)
Well, I have been interested in the fundamentals of photogrammetry for quite
some time.  And I have been following your work for the last several years,
hoping to learn how to create such projection matrices and apply them.
https://news.povray.org/povray.advanced-users/thread/%3C5be592ea%241%40news.povray.org%3E/
I don't work in academia or the graphics industry, so I only have what free time
that I can devote to independently learning this stuff on my own.
Even if I were to simply use a POV-Ray scene, where I rendered two images with
different camera locations - then I'm assuming that I could calculate a vector
field and a projection matrix. (something simple like cubes, spheres, and
cylinders)
Given The projection matrix and one of the two renders, would I then have the
necessary and sufficient information to write a .pov scene to recreate the
render from scratch?
- BW Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
Bald Eagle writes:
> Even if I were to simply use a POV-Ray scene, where I rendered two imag
es with
> different camera locations - then I'm assuming that I could calculate a
 vector
> field and a projection matrix. (something simple like cubes, spheres, a
nd
> cylinders)
> 
> Given The projection matrix and one of the two renders, would I then ha
ve the
> necessary and sufficient information to write a .pov scene to recreate 
the
> render from scratch?
> 
> - BW
For the moment, the work with depth from monocular vision is not enough
advanced that we can recreate the visible scene. Vision with two cameras
or more, gives a much advanced result for 3D reconstruction of scenes.
Let's remind us the starting point from this thread... We've redone the
experiment from Hernan Badino, who is walking with a camera on his head:
	<https://www.youtube.com/watch?v=GeVJMamDFXE>
Hernan determines his 2D ego-motion in the x-y plane, from corresponding
interest points that persist in the video stream. That means he is
calculating the projection matrix of the movement to deduce translations
in the ground plane. With time integration, it gives him the trajectory.
We're doing almost the same, but I work with OpenCV's optical-flow, and
not interest points. And my motion model is 3D, to obtain 8 parameters
in rotation and translation, that I can use in Persistence Of Vision.
I hope you're understanding... I'm reconstituting the 3D movement, and I
discover it's giving "temporal disparity", that is depth from motion.
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |  |  
|  |  | Hi,
Here is another result...
> Bald Eagle writes:
>> Even if I were to simply use a POV-Ray scene, where I rendered two ima
ges with
>> different camera locations - then I'm assuming that I could calculate 
a vector
>> field and a projection matrix. (something simple like cubes, spheres, 
and cylinders)
>>
>> Given The projection matrix and one of the two renders, would I then h
ave the
>> necessary and sufficient information to write a .pov scene to recreate
 the
>> render from scratch?
>>
>> - BW
> 
> For the moment, the work with depth from monocular vision is not enough
> advanced that we can recreate the visible scene. Vision with two camera
s
> or more, gives a much advanced result for 3D reconstruction of scenes.
> 
> Let remind us the starting point from this thread... We've redone the
> experiment from Hernan Badino, who is walking with a camera on his head
:
> 
> 
> Hernan determines his 2D ego-motion in the x-y plane, from correspondin
g
> interest points that persist in the video stream. That means he is
> calculating the projection matrix of the movement to deduce translation
s
> in the ground plane. With time integration, it gives him the trajectory
.
> 
> We're doing almost the same, but I work with OpenCV's optical-flow, and
> not interest points. And my motion model is 3D, to obtain 8 parameters
> in rotation and translation, that I can use in Persistence Of Vision.
> 
> I hope you're understanding... I'm reconstituting the 3D movement, and 
I
> discover it's giving "temporal disparity", that is depth from motion.
An instrumented motorcycle rolls on the track of a speed circuit. Thanks
to the approximation of optical flow (DIS - OpenCV) by the dominant
projective movement, we determine translations on the ground plane,
roll and yaw. That is to say the trajectory by projective parameters
(Tx,Tz,Ry,Rz).
<https://www.youtube.com/watch?v=-QLJ2ke9mN8>
Image data comes from the publication:
Bastien Vincke, Pauline Michel, Abdelhafid El Ouardi, Bruno Larnaudie,
o
Rodriguez, Abderrahmane Boubezoul. (Dec. 2024). Real Track Experiment
Dataset for Motorcycle Rider Behavior and Trajectory Reconstruction.
Data in Brief, Vol. 57, 111026.
The instrumented motorcycle makes a complete lap of the track. The
correlation threshold is set at 90% between successive images, to
reset the calculation of the projective dynamic model.
Best regards,
-- 
<https://eureka.atari.org/>
 Post a reply to this message
 |  |  |  |  |  |  |  |  
|  |  |  |  |  |  |  |  |  |