How does photogrammetry work?

Can one of you bright 3D programmers out there enlighten me on how photogrammetry works internally? I know how to do it, how to use the programs, but what about the math behind the magic? You have many photos taken at different angles, and somehow you can extract information from each photo to build a 3D model? How is this done?

At a very basic level, you mark common points found in multiple images, it then calculates a vector from each camera view to that point and then makes a calculation that would make each camera match a position relative to each other such that the vectors of each camera would intersect at the same point. Automatic methods do some pattern recognition to automatically figure out which points correspond. Once the camera views are matched then it can triangulate the position of any points in the image.
The big problem with photogrammetry though is that it requires some visible mark that can be tracked, so for example if you put a white board in front of the camera it won’t be able to detect any depth at all, which is why there’s other solutions that project a laser or an infrared grid onto a surface to create points that it can track (that’s how Kinect works)

Hmm, okay I think I understand. I ask because I wanted to try and put something together. I’m not sure if it will work, or how similar it is to photogrammetry in general. It is different than anything I have seen so far though.

I was thinking of using an image processing library to detect shapes in an image and draw them on screen (2D). That is an easy task. However I thought, if you could easily (and accurately) draw shapes from a single image; could you successfully take a video recording of an object, and process the video file frame by frame to gather shape information, and eventually gather enough data to build a 3D model?

With a video file you would likely have enough information to rebuild shapes and objects. Of course the video recording would have to be sufficient in terms of the view angle throughout the recording, but if you had enough shots of an object(s) at different angles then it might work.

Does this already exist by any chance? I have seen it done with a series of images, but never with a video recording, which would likely provide more data.

I’m pretty sure I’ve seen something before that can use a video. Another method to doing 3D photogrammetry is instead of analyzing points in the image it just looks at the silhouette of the object to get the 3D shape–still uses some calibration of the different views though.

I have seen demonstrations using video, but in general - no, and there’s a few reasons for it. Firstly, you’re dealing with a lower resolution image and Photogrammetry works by analysing detail - I once tried being lazy and filming a spin around of an object, then capturing screenshots from VLC, but it didn’t work very well. You also have motion blur to deal with, because if features are blurred, they’re useless. The main problem is it’s that just too much data to analyse accurately: you want to feed the program the most efficient data possible to limit errors and increase speed.

The maths used in 3D Trackers like PFTrack (used for compositing VFX in live action footage) is similar and I think PFTrack even generates a point cloud of the footage. It can certainly be done, but it’s less efficient.

Edit: Here’s a tutorial showing how PFTrack works and can do this kind of thing, sort of. Note the fairly low density of tracked points (white) to keep the computation time efficient.