Photo by andrew kamyab on Unsplash

NVIDIA — Advancing Digital Media Manipulation & Flying into the Sun

Mike Flanagan

--

Researchers at NVIDIA just developed a new technology that will take video streaming compression, manipulation, and deepfakes to a new standard. Dubbed One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing, the GAN-based model will be presented at the 2021 Conference on Computer Vision and Pattern Recognition that takes place in June.

The model offers quality improvements to video compression, and manipulation of head rotation.

Applied to video conferencing, the new technique apparently requires less information to be transmitted than previous compression techniques. Compared with H.264, a video compression format that is standard for HD web video, the new technique has a static bit-per-pixel output, while H.264 necessitates a constant bit-per-pixel output for the duration of video stream, which benchmarks at ten times the bandwidth, according to the whitepaper.

How it Works

Using a GAN framework, the model takes the first image of a video, along with the facial expressions and head pose changes that occur over the duration of the video, then discards all of the video except for the first still image. The head pose may be adjusted, and the source image transmitted may be replaced with a photograph of another person entirely—allowing for motion transfer, where the captured motion data is applied to whoever is in that photograph.

Want to read this story later? Save it in Journal.

GAN, or general adversarial network, is a class of machine learning framework in which two neural networks contest each other in a zero-sum game, which allows the model to learn how to generate data with limited statics available from a training dataset. They have been found to be useful for many applications within the field of image and digital media creation and detection.

This is not the first GAN based system deployed that allows synthesis of talking-head video from a single still picture. In 2019, Samsung researchers demonstrated a similar tech.

Samsung researchers demonstrated GAN system still photo video synthesis in 2019.

Improving video compression through video reconstruction is not the only application. The technology also supports manually editing head rotation, redirection of the filmed subject’s attention towards the camera, and motion transfer. Let’s break these down a little:

Head Rotation

The technology enables manually editing the rotation of a filmed or photographed subject’s head. Manipulating the pitch, yaw, and roll is as simple as adjusting the brightness, contrast, and saturation of an image in any photo editing software—by clicking and adjusting a slider in a straightforward user interface.

A remarkably straight-forward and effective head rotation tool

Face Frontalization

Tied to applying head rotation, the model allows correcting the eyeline, or gaze of the subject, to be directed towards the viewer, allowing for simulation of face-to-face conversation. Put simply, if the video subject is engaging with a questioner off-camera in person, or with their computer monitor in zoom while the camera recording them is not local with the monitor, face frontalization allows for making it appear that the subject is speaking directly to the viewer.

Model allows for superior head frontalization when compared to other methods

There are some issues with artifacts in the resulting image in the examples, where distortion can be seen, as well as some movement transformation that translates to jumpiness. The resulting video doesn’t feel 100% authentic. These, however, mostly occur with features that obstruct the head of the photographed subject, and are certainly minor compared with former deep learning methods for face frontalization.

Motion Transfer

Comparison of transmitting movement on to a photo subject

Hello deepfakes. The technology allows an individual to take a single photo of another person’s bust and transmit their own movement and speaking gestures on to the photographed person. The method is not perfect for this application, with some distortion, a touch of uncanny valley, and restriction to a static background, but it is so simply effective, and to my eye offers a very good alternative method of deepfaking (is that an adjective yet?) another human. It is certainly more effective than the other motion transfer methods used as a comparison in the paper, FOMM and few-shot vid2vid.

While there are certainly deficiencies in this technology, in most instances, you may not be able to catch them on first view. As we can see in comparisons with other tools that attempt to produce the same result, the new model appears to be the most convincing, in all attributes, by a wide margin.

But What of Icarus?

The ethical implications of deepfake technology have been a concern since its genesis. In the era of post-truth, it is easy to feel a collective sense of doom from technology that enhances the ability to deceive. Fortunately, there have not been severe societal damage caused by deepfake technology as of yet, and there are other technologies that are being developed simultaneously that may limit the consequences of deceptive media that potentially could be used for nefarious ends.

NFTs (non-fungible tokens) use blockchain technology to represent unique digital media. NFTs can represent unique digital items—files such as art, audio, videos, digital assets in video games, etc.—and can be used as a seal of authenticity. Imagine a future where broadcasting and media companies use NFTs as an assurance to the authenticity of their video. This would create a new standard for confirming claims by checking sources and is already happening—Jack Dorsey just sold his first tweet.

Resources & Further Reading

Ayush Thakur, Two Minute Papers

Research Paper Electronic Pre-print

One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing Whitepaper

Citation

Ting-Chun Wang, Arun Mallya, Ming-Yu Liu. “One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing.”
In CVPR, 2021.

📝 Save this story in Journal.

--

--

Mike Flanagan
Mike Flanagan

Written by Mike Flanagan

Data Scientist. Cyclist. Michaelist.

Responses (1)