What if all it took to create a realistic digital avatar of a person was a single image? In a paper accepted to the Conference on Computer Vision and Pattern Recognition (CVPR) 2020, researchers at Imperial College London and FaceSoft.io, a startup leveraging AI and machine learning for facial analysis, describe AvatarMe, a technique that’s able to reconstruct photorealistic 3D busts from “in-the-wild” photos. They claim it outperforms existing systems by a “significant margin” and generates authentic, 4K-by-6K-resolution 3D faces from low-resolution targets with detailed reflections.
Rendering 3D faces has countless applications in domains from videoconferencing to virtual reality, but though geometry can be inferred without AI, much more information is required in order to render a face in arbitrary scenes.
To extract this information, the researchers captured the pore-level reflectance maps of 200 peoples’ mugs using an LED sphere rig with 168 lights and 9 DSLR cameras. They then used it to train an AI model — GANFIT — to synthesize realistic maps from the textures while optimizing for the “identity match” between the rendering and output.
Register for the free livestream.
Like other generative adversarial networks (GANs), GANFIT is a two-part model consisting of a generator that creates samples and a discriminator that attempts to differentiate between the generated samples and real-world samples. Both the generator and discriminator improve in their respective abilities until the discriminator is unable to tell the real examples from the synthesized examples with better than the 50% accuracy expected of chance.
Another component of the AvatarMe pipeline enhanced the textures’ resolutions, while a third removed the “baked” lighting GANFIT introduced. A separate module then predicted the per-pixel reflectivity of skin structures like pores, wrinkles, or hair from the illuminated texture, even estimating surface details like fine wrinkles, scars, and skin pores.
The researchers say that in experiments, AvatarMe didn’t produce any artifacts in the final renderings and successfully handled “extreme” poses and occlusions like sunglasses. The reflectivity was consistent, and even in different environments, the system “realistically” illuminated subjects.
AvatarMe isn’t without its limitations. The training data set doesn’t contain many examples of subjects from certain ethnicities, leading to poor performance when it attempts to reconstruct the faces of darker-skinned subjects. And the facial reconstruction isn’t completely independent of the input photograph — well-lit, higher-resolution images produce more accurate results. Nevertheless, the coauthors assert it’s the first method to achieve “rendering-ready” faces with any portrait image, including black-and-white and hand-drawn sketches.
AvatarMe is only the latest art-generating AI system to automate what was previously a manual process. Startup Promethean AI employs machine learning to help human artists create art for video games. Nvidia researchers recently demonstrated a generative model that can create virtual environments using video snippets. Elsewhere, machine learning has been used to rescue old game textures in retro titles like Final Fantasy VII and The Legend of Zelda: Twilight Princess and generate thousands of levels in games like Doom from scratch.