Mon 26 Aug 2024 14:42
Social implications of AI-generated deepfakes
There are many useful consequences of this new AI capability, of course. While it has long been possible for a single person, alone, to write a novel, with AI-powered software, a person can now make a film single-handedly, using AI-generated actors, scenes, settings, and dialogue. AI generation can also be used to create recording substitutes for circumstances too dangerous or difficult to create and record in real life, such as virtual training for emergency situations. Moreover, situations of historical interest from the past (before recording was possible) can be simulated through AI, for education and study. The ability to recreate and examine scenarios through AI simulation can also be useful for investigative and legal work. Many other useful consequences of high quality AI-generated digital media can be imagined, and increasingly realized.
But the ability to create AI-generated facsimiles indistinguishable from actual recordings is also socially destructive, because it makes it very easy to lie in a way that cannot easily be distinguished from truth. An AI-created or AI-edited facsimile intended to look real is called a deepfake. This term was originally used to describe AI-modified media (often pornography) where one person's face (typically that of a celebrity) is AI-edited onto another's body. But a deepfake need not merely be an AI edit of a real recording, any AI-generated piece of media that can pass for a recording is a deepfake.
While lies have always been possible, creating and maintaining a convincing lie is difficult, and so convincing lies are relatively rare. This is a good thing, because the general functioning of society depends on the ability to tell true from false. The economy depends on being able to detect fraud and scams. The legal system relies on discerning true from false claims in order to deliver justice. Good decision-making relies on being able to distinguish falsehoods from facts. Accountability for persons in authority relies on being able to examine and verify their integrity, to establish trustworthiness. The same is true for institutions. Medicine requires the ability to distinguish true from false claims about health, disease, treatments, and medicines. And of course the democratic process relies upon voters being generally able to make good and well-informed voting decisions, something not possible if truth cannot be distinguished from fiction.
To get an idea of the progression of AI technology for creating deepfakes, let's look at video conferencing. Normally a video-conference isn't fake, it's a useful way of communicating over a network using camera and microphone, and few would normally wonder if what they are hearing and seeing from remote users is actually genuine. But AI technology for modifying what people hear and see has been advancing, and the era of a live deepfake video conference is not far off. Let's take a look.
One practical issue with a video-conferencing camera feed has been the fact that the camera picks up more than just the person's face: it picks up the background too, which may not present a professional image. It has long been possible to use a static image for one's video-conferencing background, typically as a convenient social fiction to maintain a degree of professionalism. In 2020, Zoom, one of the most popular videoconferencing platforms, introduced video backgrounds which can be used to create a more plausible background where things in the background can be seen moving in natural ways. AI powers the real-time processing that stitches together the moving background and the person's camera feed. This video background feature is often used in creative and fun ways; to pretend to be at a tropical beach with waving palm fronds and gentle waves; to be in a summer cabin with open windows, curtains blowing in the breeze; or even to be in outer space, complete with shooting stars. Yet this technology makes it possible to create a convincing misrepresentation of where one is, and no doubt an enterprising tele-worker or two, expected to be at the office, has created and used as a video conferencing background a video of their office, while they were elsewhere.
A significant new AI-generated video conferencing
capability became generally available in early 2023, when Nvidia
released video conferencing software that added an eye contact
effect. This is a feature whereby a person's video feed is AI-edited in
real time to make it look as if the person is always looking directly at the
camera, even if the person is looking somewhere else. It is strikingly
convincing. While the purpose of this software is to help a speaker
maintain eye contact when they are in fact reading what they are saying
(e.g. using a teleprompter), it turns out to be quite useful to disguise
the fact that one is reading one's email while appearing to be giving
one's full attention to the speaker. Even though this technology is only
editing eyes in real-time, it is often quite sufficient to misrepresent in a
video conference what one is doing.
A little over a year later, in August 2024, a downloadable
AI deepfake software package, Deep-Live-Cam, received considerable
attention on social media. This AI software allows the video-conferencing
user to replace their own face on their video feed with the face of
another. While this has been used by video bloggers as a sort of "fun
demo", where they vlog themselves with the face of Elon Musk, for example,
the effect can be surprisingly convincing. It is AI-driven technology that
makes it possible to misrepresent in a video conference who one is. In
fact, one can use it to "become" someone who does not even exist, because
realistic AI-generated faces of non-existent people are readily available,
and one can use this software to project such a face onto one's own.
This is still video-conferencing, though. Perhaps the person
can appear as if they are somewhere else than they really are, or
they can appear as if they are looking at you when they are not,
or they can even appear to be a different person. But there is
still a human being behind the camera. But with a large language
model and suitable AI software, it will soon be possible, if it is not
already, to create an entirely deep-faked real-time video-conference
attendee, using AI-generated audio and video, that leverages a large
language model such as GPT to simulate conversation. Let's put aside
for the moment thinking about how such a thing might be useful. Consider
instead the possibility that an AI-generated simulacrum might not easily
be distinguishable from an actual person. That raises a general question:
if deepfakes become so good that they cannot be told apart from the real
thing, what happens to society?
A set of possible consequences to society of AI-generated deepfakes
that are too difficult to tell from the real thing is articulated in a
paper by Robert Chesney and Danielle Keats Citron published in TexasLaw in
2018. Essentially, deepfakes make good lies: too good! If deep-faked
falsehoods can be successfully misrepresented as genuine, they will be
believed. Moreover, if they are difficult to distinguish from the truth,
even genuine content will be more readily disbelieved. This is called the
liar's dividend: the ability of a liar to misrepresent true content
as false. Liars can use convincing lies to make it much less likely that
the truth, when it appears, will be believed. If such lies become abundant,
people may well become generally unable to tell true from false.
In economics, a situation in which customers cannot tell the difference
between a good and a bad product is called a market for lemons. This
concept comes from George
Akerlof's 1970 seminal paper, where he studied the used car market. A
used car that is unreliable is called a lemon. Akerlof showed
that if used car purchasers cannot tell if a used car is a lemon, all the
used cars offered for sale will tend to be lemons. The reasoning is that
a reliable used car is worth more than a lemon, but if purchasers cannot
tell the difference, they will not pay more for it. If a seller tries to
sell a reliable used car, they will not be able to receive full fair value
for it. The only sort of used car for which the seller can receive fair
value is a lemon. So sellers will keep their reliable used cars, and sell
their lemons. Thus only lemons will be generally available for sale.
A world in which people cannot tell the difference between digital
media that is a true recording and media that is an AI-generated
fabrication is a market for lemons, or rather, a market
for fabrications. Just as sellers will not sell a reliable used
car because they would not get fair value for it, so truth-tellers
will not speak if they will not be believed. Nobody wants to be Homer's
Cassandra, blessed with a gift of oracular prophecy, yet cursed never
to be believed. The drying-up of true content will have the effect that
digital media channels, even ones generally considered trustworthy today,
will become increasingly dominated by deepfakes, so much so that they will be
no longer useful for disseminating true and genuine things. While it is not
yet clear whether things will go so far as to make even video-conferencing
generally untrustworthy, the ready availability of powerful AI software to
create convincing fakes will be consequential. The social disruption it will
create will no doubt be significant. As this AI technology progresses,
it is a good bet, I think, that we will see an increasing reliance on
unfakeable in-person interactions, for situations where authentication,
and authenticity, is important.