John DiMarco on Computing (and occasionally other things)

John DiMarco on Computing (and occasionally other things)
I welcome comments by email to jdd at cs.toronto.edu.

Mon 26 Aug 2024 14:42

Social implications of AI-generated deepfakes

AI-generated photo-realistic portrait of a brown-haired, brown-eyed young man looking forward — AI-generated image by Juan Agustín Correa Torrealba (Noes_Cucho) from Pixabay

The ability of modern AI techniques to create artificial pieces of digital media (images, video, audio) that are almost indistinguishable from actual photographs/recordings is remarkable already, and it continues to improve. Modern AI software can create images, text, video and audio of people who do not actually exist but who look and sound completely real. Or it can create media of existing people saying and doing things they never said or did. Digital media of places, people and circumstances that are not real can be made to seem as if they were recordings. While these can still be distinguished from true recordings, more and more, it takes an expert to do it. The indications that the piece of digital media is AI-generated rather than a recording are increasingly subtle.

There are many useful consequences of this new AI capability, of course. While it has long been possible for a single person, alone, to write a novel, with AI-powered software, a person can now make a film single-handedly, using AI-generated actors, scenes, settings, and dialogue. AI generation can also be used to create recording substitutes for circumstances too dangerous or difficult to create and record in real life, such as virtual training for emergency situations. Moreover, situations of historical interest from the past (before recording was possible) can be simulated through AI, for education and study. The ability to recreate and examine scenarios through AI simulation can also be useful for investigative and legal work. Many other useful consequences of high quality AI-generated digital media can be imagined, and increasingly realized.

But the ability to create AI-generated facsimiles indistinguishable from actual recordings is also socially destructive, because it makes it very easy to lie in a way that cannot easily be distinguished from truth. An AI-created or AI-edited facsimile intended to look real is called a deepfake. This term was originally used to describe AI-modified media (often pornography) where one person's face (typically that of a celebrity) is AI-edited onto another's body. But a deepfake need not merely be an AI edit of a real recording, any AI-generated piece of media that can pass for a recording is a deepfake.

While lies have always been possible, creating and maintaining a convincing lie is difficult, and so convincing lies are relatively rare. This is a good thing, because the general functioning of society depends on the ability to tell true from false. The economy depends on being able to detect fraud and scams. The legal system relies on discerning true from false claims in order to deliver justice. Good decision-making relies on being able to distinguish falsehoods from facts. Accountability for persons in authority relies on being able to examine and verify their integrity, to establish trustworthiness. The same is true for institutions. Medicine requires the ability to distinguish true from false claims about health, disease, treatments, and medicines. And of course the democratic process relies upon voters being generally able to make good and well-informed voting decisions, something not possible if truth cannot be distinguished from fiction.

To get an idea of the progression of AI technology for creating deepfakes, let's look at video conferencing. Normally a video-conference isn't fake, it's a useful way of communicating over a network using camera and microphone, and few would normally wonder if what they are hearing and seeing from remote users is actually genuine. But AI technology for modifying what people hear and see has been advancing, and the era of a live deepfake video conference is not far off. Let's take a look.

One practical issue with a video-conferencing camera feed has been the fact that the camera picks up more than just the person's face: it picks up the background too, which may not present a professional image. It has long been possible to use a static image for one's video-conferencing background, typically as a convenient social fiction to maintain a degree of professionalism. In 2020, Zoom, one of the most popular videoconferencing platforms, introduced video backgrounds which can be used to create a more plausible background where things in the background can be seen moving in natural ways. AI powers the real-time processing that stitches together the moving background and the person's camera feed. This video background feature is often used in creative and fun ways; to pretend to be at a tropical beach with waving palm fronds and gentle waves; to be in a summer cabin with open windows, curtains blowing in the breeze; or even to be in outer space, complete with shooting stars. Yet this technology makes it possible to create a convincing misrepresentation of where one is, and no doubt an enterprising tele-worker or two, expected to be at the office, has created and used as a video conferencing background a video of their office, while they were elsewhere.

A significant new AI-generated video conferencing capability became generally available in early 2023, when Nvidia released video conferencing software that added an eye contact effect. This is a feature whereby a person's video feed is AI-edited in real time to make it look as if the person is always looking directly at the camera, even if the person is looking somewhere else. It is strikingly convincing. While the purpose of this software is to help a speaker maintain eye contact when they are in fact reading what they are saying (e.g. using a teleprompter), it turns out to be quite useful to disguise the fact that one is reading one's email while appearing to be giving one's full attention to the speaker. Even though this technology is only editing eyes in real-time, it is often quite sufficient to misrepresent in a video conference what one is doing.

A little over a year later, in August 2024, a downloadable AI deepfake software package, Deep-Live-Cam, received considerable attention on social media. This AI software allows the video-conferencing user to replace their own face on their video feed with the face of another. While this has been used by video bloggers as a sort of "fun demo", where they vlog themselves with the face of Elon Musk, for example, the effect can be surprisingly convincing. It is AI-driven technology that makes it possible to misrepresent in a video conference who one is. In fact, one can use it to "become" someone who does not even exist, because realistic AI-generated faces of non-existent people are readily available, and one can use this software to project such a face onto one's own.

This is still video-conferencing, though. Perhaps the person can appear as if they are somewhere else than they really are, or they can appear as if they are looking at you when they are not, or they can even appear to be a different person. But there is still a human being behind the camera. But with a large language model and suitable AI software, it will soon be possible, if it is not already, to create an entirely deep-faked real-time video-conference attendee, using AI-generated audio and video, that leverages a large language model such as GPT to simulate conversation. Let's put aside for the moment thinking about how such a thing might be useful. Consider instead the possibility that an AI-generated simulacrum might not easily be distinguishable from an actual person. That raises a general question: if deepfakes become so good that they cannot be told apart from the real thing, what happens to society?

A set of possible consequences to society of AI-generated deepfakes that are too difficult to tell from the real thing is articulated in a paper by Robert Chesney and Danielle Keats Citron published in TexasLaw in 2018. Essentially, deepfakes make good lies: too good! If deep-faked falsehoods can be successfully misrepresented as genuine, they will be believed. Moreover, if they are difficult to distinguish from the truth, even genuine content will be more readily disbelieved. This is called the liar's dividend: the ability of a liar to misrepresent true content as false. Liars can use convincing lies to make it much less likely that the truth, when it appears, will be believed. If such lies become abundant, people may well become generally unable to tell true from false.

In economics, a situation in which customers cannot tell the difference between a good and a bad product is called a market for lemons. This concept comes from George Akerlof's 1970 seminal paper, where he studied the used car market. A used car that is unreliable is called a lemon. Akerlof showed that if used car purchasers cannot tell if a used car is a lemon, all the used cars offered for sale will tend to be lemons. The reasoning is that a reliable used car is worth more than a lemon, but if purchasers cannot tell the difference, they will not pay more for it. If a seller tries to sell a reliable used car, they will not be able to receive full fair value for it. The only sort of used car for which the seller can receive fair value is a lemon. So sellers will keep their reliable used cars, and sell their lemons. Thus only lemons will be generally available for sale.

A world in which people cannot tell the difference between digital media that is a true recording and media that is an AI-generated fabrication is a market for lemons, or rather, a market for fabrications. Just as sellers will not sell a reliable used car because they would not get fair value for it, so truth-tellers will not speak if they will not be believed. Nobody wants to be Homer's Cassandra, blessed with a gift of oracular prophecy, yet cursed never to be believed. The drying-up of true content will have the effect that digital media channels, even ones generally considered trustworthy today, will become increasingly dominated by deepfakes, so much so that they will be no longer useful for disseminating true and genuine things. While it is not yet clear whether things will go so far as to make even video-conferencing generally untrustworthy, the ready availability of powerful AI software to create convincing fakes will be consequential. The social disruption it will create will no doubt be significant. As this AI technology progresses, it is a good bet, I think, that we will see an increasing reliance on unfakeable in-person interactions, for situations where authentication, and authenticity, is important.

/it permanent link