AI That Creates A 3D Model Of Humans

Today, a variety of techniques exists that can take an image that contains humans, and perform pose estimation on it. This gives us these interesting skeletons that show us the current posture of the subjects shown in these images.



Having this skeleton opens up the possibility for many cool applications, for instance, it's great for fall detection and generally many kinds of activity recognition, analyzing athletic performance, and much more. But that would require us to do that for not only still images but for animations. So, the question arises that can we perform pose estimation for people in motion? Yes, we already can. You can see it in this video.

But what if wish for more? Let’s take a step ahead. Can we reconstruct not only the pose of the model, but the entire 3D geometry of the model itself including the body shape, face, clothes, face, and more? That sounds like science fiction, right? But with today’s power learning algorithms, it’s possible. Let’s look at the three experiments.

Experiment number 1, still images.

Look at these. Isn’t it amazing? I am sure if you knew these people, you would have recognized them solely from the 3D reconstruction. And not only that, but you can also see some details in the clothes, a suit can be recognized and you can also notice the wrinkles in the jeans. This new method uses a different geometry representation that enables higher-resolution outputs and it immediately shows them. So, you can see in the image that it is working quite well on still images.  

Our Experiment number 2 shows that it can not only deal with the still images of the front side only, but it can also reconstruct the backside of the person. Amazing, but wait, that part of the data is completely unobserved. We haven’t seen the backside. So, how is that even possible? Well, let’s look at it the other way. An intelligent person would be able to infer some of these details, for instance, if we know that it’s a suit then we can roughly estimate how its backside would look like. And it goes the same for the boots or any other thing. This new method leans on an earlier technique by the name image to image translation to estimate this data. And it’s truly works like a magic! If you take a closer look, you see that its backside has fewer details than the front, but the fact that we can do it is itself a miracle. 

 

With our third experiment, we can go even further. What about video reconstruction? Let’s have a look. Don’t expect miracles, at least not yet, there is obviously still quite a bit of flickering left but the preliminary results are quite encouraging and it’s just a matter of two papers down the line and these videos will be as nearly as good as the ones were for the still images. The key idea is here that the new methods perform these new reconstructions in a way that is consistent, or in other words, if there is a small change in the input model, there will be a small change in the output model. These are the properties that open up the possibility to extend this method to videos.

So, how does it compare to the previous methods? All of these competing techniques are quite recent as they are from 2019. They appear to be missing a lot of detail, and I don’t think we would have a chance of recognizing the target subject from the reconstruction. 

And it’s not have been even two years and we have such incredible progress. It truly feels like we are living in a science-fiction world.