Deepfakes

Deepfakes, Pose Detection, and the Death of “Seeing is Believing”

We say seeing is believing, but actually, we are much better at believing than at seeing. In fact, we are seeing what we believe all the time and occasionally seeing what we can’t believe.
– Robert Anton Wilson

If you change the image, you change history.
– Hany Farid

Introduction

The idea of photo manipulation, and even video footage manipulation, is nothing new. In Forrest Gump, Tom Hanks is shown meeting President John F. Kennedy 31 years after the President was assassinated. In the 1999 movie eXistenZ, two characters move through various layers of virtual reality, losing their ability to distinguish what is in the game and what is not. Today, the gap between mere cinematic imagination and reality is narrower than ever.

Photos and footage can have objects cut and pasted into them or removed; faces can be swapped; audio can be replaced; the time and place of where an image or footage was taken can be made to look like they are from another time or place entirely. In the past, credible photo editing required a high level of skill and was a time-consuming process, but “[n]ow the technology is democratizing,” and nearly anyone with enough time, patience, and electricity can do it from their home computer. It is already difficult to tell a real video from a “deepfake,” and that gap is shrinking every day. This creates major problems for government entities, who are not only slow to react to new technology, but are also limited in response level by international legal conventions.

What is a Deepfake and How Does It Work?

A deepfake is a video where artificial intelligence (AI) is used to replace one or more faces, voices, or even body movements in film footage with the desired alternative, known as the “targeted replacement.” This process is mostly automated and requires only basic coding skills. A hypothetical deepfake creator has several options from where to download free software, which functions as a training algorithm for the AI. The training algorithm is what will tell the AI not only how to replace or swap faces, but also which part of the image is the face, how to adjust for the face moving around the frame within a video, and how to fix some problems with blurred images. Currently, one of the most up-to-date is the DeepFaceLab program available via GitHub. FaceSwap and FaceIt are other freely available options.

The deepfake programs run on many operating systems, including Windows, Linux, and MacOS, with the disclaimer that a “modern [graphics processing unit or “GPU”] with CUDA [(Compute Unified Device Architecture)] support” is required for “best performance.” Reddit user “derpfakes” recommends, in their YouTube tutorial on creating deepfakes, that creators use an Nvidia GPU, which is commercially available for as low as $88.69 from Walmart.

Once a creator has downloaded the software, there are many YouTube tutorials available for how to get started in addition to the guided setup available on the download page. First, a number of photos, or still frames from available video, must be loaded into two data set files. One set tells the program which face is the target, and the other the face the target will replace. Depending on the length of the resulting video, the number of facial angles, and types of shot (wide, close up, etc.), different numbers of photos for the target will be required, but a fairly convincing deepfake can be done with as few as 300 images.

The easiest option is to load video footage of the replacement target into the program, which will automatically extract the frames into images. If the video is in a standard frame rate of 24 frames per second, each minute of footage will produce 1,440 still frame images once extracted. Given that a mere 300 images are required for a reasonable fake, each minute of standard video footage provides almost three times the necessary data. Newer video standards can support as many as 300 frames per second, or 18,000 frames per minute, which makes creating deepfakes of public figures extremely easy. Alternatively, a face finding program can “automatically [download] training images from Google” and “[serve] as a web engine to perform searches for faces over a user-defined image data set.” Once downloaded, these search results could then theoretically serve as the data set needed for deepfake creation.

After loading the two data sets into the program files, the automated training process can begin. For a shorter video with a good data set, this process can be accomplished within 48 to 72 hours. Once the data-sets have been processed, the next step is to run the facial extraction part of the program on both data sets. The facial extraction tells the AI exactly which elements of each image to replace.

During training, the AI creates a mask of the target face from the facial extraction elements. This mask is then anchored to several points on the face being replaced, in a way that mimics methods used with motion capture and CGI effects for Hollywood blockbuster movies. Once the training is complete, the batch files are converted within another subpart of the program into useable merged frames that can be reassembled into the new deepfake video. For a user without the time or inclination to extract their own data, some face sets are available online, including Elon Musk, Nicolas Cage, and Vladimir Putin.

There are already different training models available for different types of footage. H64 is “good for straight faces as a demo” and so is H128, with H128 offering “higher resolution and details.” The LIAE model “can partially fix dissimilar face shapes, but results in a less recognizable face.” As of early 2019, the best freely available training model is the flexible SAE, which combines all other models. The different models offer varying options based on available VRAM (the type of RAM used in computer display cards), which allows end-users with different systems to achieve comparable results.

Once the models are trained, there is really only a creator’s imagination to limit the possibilities. Deepfakes are no longer confined to fakes of existing videos. Detector-Pose software creates a wireframe of the desired body movements and renders it into a new video where the target body is synced to match those movements. Combined with systems like ImageNet, a system “which brings together fourteen million photographs of ordinary places and objects . . . posted to Flickr, eBay, and other Web sites,” it is entirely possible to create a video of a person doing or saying anything the creator wants, in a place that person has never been. “This vast archive of the uninteresting has made a new level of synthetic realism possible.”

There is another program that is able to take a still image of someone and have that person appear to speak in lip-sync with some provided audio. This method does not require AI training time on the specific target face or voice and is able to run in real-time. Another option for syncing audio and video is a program called “Out of Time” which claims to “[remove] temporal lags between the audio and visual streams in a video” and “[determine] who is speaking amongst multiple faces in a video.” Adobe’s Project Voco could allow for even better faking of vocal recordings, or be combined with a program like Lyrebird for creation of false dialogue. If the goal is to create a seemingly realistic conversation, then a combination of these programs could theoretically correct syncing issues that might otherwise clue a listener in that the video has been altered, and layer in an artificial approximation of a specific voice.

Emerging Online Market

If a system is not able to run the program, or a user does not have the Python programming skills to create a deepfake themselves, it is easy to find someone online willing to create one. While many creators in public online spaces are only willing to create comedic replacements of one celebrity for another, it would not be difficult to find someone willing to make less savory videos. There are Reddit forums and private Discord servers for deepfake creators, with nearly 2,000 users subscribed to the “safe for work” deepfake Reddit page.

Outside of forums, some deepfake creators have begun offering their services for minimal fees via various websites. Derpfakes has a Patreon.com profile, where users are able to pledge certain monthly amounts in exchange for tiers of rewards. These reward tiers include tutorials, hands-on assistance, pre-made face sets or data sets, and even monthly deepfake creations. On another “gig” website, Derpfakes offered a personalized deepfake creation of up to one minute, using a minimum of 100 photos, for $20.00.

While internet user Derpfakes’ work was featured by CNN, BBC, and various other news outlets, others offer the same services for comparable prices, without the veneer of respectability. One user offered deepfake creation for $30, with no limit on video length. For $10, another user will put a person’s face into the Shia LaBeouf “Do It” video. Depending on what end result is desired, and how much a person has to spend, it is clear that there are deepfake creators out there willing to make it for them.

Next Stages in Deepfake Technology

Just as fast as the government and academia develop technology to detect deepfakes and protect machine learning for beneficial purposes, the other side works just as quickly to develop more sophisticated fakes that avoid this detection. “While many manipulations are benign, performed for fun or artistic value, others are for adversarial purposes, such as propaganda or misinformation campaigns.” With allegations of Russian interference in the 2016 election still unresolved, the development of deepfake technology in the interim casts a grim shadow on the upcoming 2020 election.

The next stage in this arms race is the development of real-time deepfakes. Researchers developed an auto-encoder neural architecture which works by converting the “code” of the first person’s face and translating it into a live output of a second person’s face. The initial steps for training the AI for a real-time fake are the same as for a more “traditional” deepfake. With 48 hours of training, a programmer was able to have a real-time render video of himself as John Oliver in a live video conference via simple webcam. While there is still a noticeable “off” quality in the final video, this particular version of the program is already nearly a year old; ancient for cutting edge technology. Combined with the Adobe Project Voco or Lyrebird, it would be possible to have a “live” videoconference with any public figure, using the voice replacement software to input any desired dialogue.

While not as widely known, or at least not as widely publicized, there is a complimentary set of software to deepfake technology called “pose detection.” Pose detection software pairs specific data points with various joints and body parts to create a wireframe puppet, which can then be used to control the body movements of a person in a video. The pose detection technology program, OpenPose, is also freely available for download, with run times of under three seconds for images with 30 people or less in them. This means the pose estimation software works on real-time videos, and it can be used with a simple webcam. One example on YouTube uses footage from a debate between Hillary Clinton and Donald Trump but applies the pose models from a dance video to create a reasonably realistic “dance battle” between the two then-candidates. The GPU used for the published example is commercially available for $699. The program is “the first open-source [real time] system for multi-person 2D pose detection,” and the developers have already published several updates to the software for improved accuracy and utility.

Other pose detection software is available, with more limited models offering better results for videos with only a certain kind of shot. Similar to the publically available face set data for creating a deepfake, software models for tracking upper bodies in videos have already been developed and released as training models for pose detection algorithms. With most news broadcasts showing only the upper half of various celebrities, politicians, and reporters, having the advanced capacity for training the algorithms on only the necessary upper half will likely cut down rendering time for pose detection in news broadcasts and result in faster creation of faked videos. With large amounts of footage available for popular reporters, it would be straightforward to encode someone like Anderson Cooper into a deepfake reporting on a false news story.

One article speculates as smartphone technology develops and GPU capable smartphones are developed, theoretically in the future, this technology could be run by anyone with a powerful enough phone. Smartphones are already capable of digital manipulation of photos via neural networks, with the iPhone’s “portrait mode” being a prime example. Progress is already being made on translating the current OpenPose software to a format that a smartphone can run. Another alternative which could allow for deepfake creation via smartphone is the development of outsourcing technologies which allow the programs to process through a central server with a more powerful GPU than the user’s own system.

These technologies could be further combined with existing gaze tracking technology to ensure that the fakes were most convincing in the areas where people’s eyes were expected to focus or to spend the most time on. Visage Technologies offers programs for face tracking and analysis, and claims that their “easy-to-use API” can “easily integrate the eye and gaze tracking technology into any application.” If a deepfake creator knows where someone is going to focus when looking at the deepfake, then the creator knows which area to spend the most effort on for an overall more convincing result.

Detecting and Combating Deepfakes

Everyday people, whether aware of it or not, are also contributing to the arms race between detection and deepfake creators. By uploading vacation snapshots to public Instagram accounts, or playing around with websites like Ganbreeder, each internet user is contributing to the data sets and training of neural networks. With over 300,000,000 photos uploaded to Facebook every day, and 46,740 photos uploaded to Instagram every minute, it is not surprising that fakes can seem so realistic.

Some companies are trying to head off this use of their product. One of Lyrebird’s stated goals is to raise public awareness of the availability of its AI voice replacement technology to ensure it is used only for ethical purposes. The deepfake Reddit page banned non-consensual use of body parts in porn videos after the initial explosion of this particular use of deepfakes grew exponentially in the early days of release. How these efforts impact the real uses of technology remains to be seen.

Academics and government researchers are the main combatants in the arms race against the deepfake creators. Research leader Hany Farid says that they are not trying to “win” the race, but are instead just trying to make it so the average person cannot create a compelling fake which escapes detection. DARPA’s program MediFor (standing for Media Forensics) attempts to “develop technologies for the automated assessment of the integrity of an image or video and integrating these into an end-to-end media forensics platform.” Two of the best publically known methods in early 2019 include amplifying the color saturation in a person’s face to show the tiny changes in time with their heartbeat and tracking for an unnatural amount of blinking. Another method for detection traces tiny color variations, known as “chromatic aberrations,” around shapes in images which can suggest different photographs have been merged where the pattern of aberrations is broken.

One potentially realistic suggestion for how to stem the tide of deepfakes, at least to a small degree, would be to remove publically uploaded face sets for public figures. By providing a ready-to-use face set of Vladimir Putin, for example, roughly half of the work of creating a deepfake is provided by the online data library. Where before it might have taken 48 hours to create a deepfake involving Putin, in theory, time could be reduced down to 12 hours if the face set for the target was another publically available data set. With the speed of the online news cycle, and the strength of viral media, a convincing faked video could be seen, and believed, by millions of people before it was revealed as false.

Attempting to keep a level of veracity in digital media, new companies like Truepic have begun to emerge. Using “patented Controlled Capture technology and image forensic tools, Truepic aims to provide verifiable digital images and videos.” The company uses blockchain technology as part of this process, which theoretically checks each access of a particular file for veracity, and logs each verified data point as part of the chain. However, it is possible to cheat blockchain validation checks or fool it into accepting false data. A determined media manipulator could, at least theoretically, get an altered image verified as accurate even with the blockchain safeguard in place. Even with blockchain verification, whether an image’s metadata is accurate or not is unlikely to affect its impact on an average viewer.

A major difficulty in combatting deepfakes is the speed of technological development in comparison to law. When attempting to deal with rapidly developing technological issues, the law is often too slow to turn and face the threat. In a case where a deepfake is being used to influence an election or insight widespread panic, the military may be the most appropriate responder, but under international legal conventions is restricted in the responses it may take. What is less clear is who should respond in cases where a private citizen has made a reputation-damaging deepfake of an ex-partner or private employer. Many local law enforcement departments would be unable to handle such a case, and national law enforcement agencies such as the FBI are unlikely to get in a case with those facts.

Legal Problems of Deepfakes

Everyone knows that it is dangerous to shout “Fire!” in a crowded theater, and the internet is the largest, most crowded theater in the world. There is no centralized fact-checking authority for information published online, and viral media can be viewed millions of times in a single day. Trust in the news cycle and media has been eroded significantly in recent years, and as deepfakes proliferate, the issue is likely to escalate. “In the absence of an agreed-upon reality, efforts to solve national and global problems will become enmeshed in needless . . . questions.” Fact verification will become a difficult, tedious process where there are multiple “credible” sources for any point of view, and where average citizens become mired in a host of contradicting stories.

Politics is not the only social arena where deepfakes could cause major upheaval. Depending on the content of a deepfake, another financial crisis like the 2008 crash could easily trigger. Misleading or incorrect news stories already cause dips and spikes in the stock market. It is entirely plausible that a deepfake would magnify these effects on the market, especially if released in conjunction with other market news, or at a time when stock prices were already unsteady.

Terrorist organizations widely use social media as a recruitment tool. Logically, they will employ deepfakes as part of these efforts, and will manufacture materials to further radicalize their members against the West. For troops still on the ground in Afghanistan, this could have deadly consequences. As deepfakes allow for more convincing and glamorized propaganda, terrorist efforts to recruit from within America would also likely increase, creating an embedded danger that could prove almost impossible to track. The impact of this long-distance conversion has already hit home in both the San Bernardino mass shooting in 2015 and the Orlando night club shooting in 2016.

This use of social media tactics also means that terrorist organizations have the required footage and image sets to create deepfakes of high ranking members. If Western intelligence forces are unable to detect that a video has been faked, lives could be endangered as part of unnecessary missions. Additionally, the credibility of reported troop activity could be attacked, with terrorists releasing videos claiming to show targets that Western forces have killed or captured.

Deepfake Creators

The federal government is far from indifferent to the risks posed by deepfake technology and creators. One of DARPA program MediFor’s goals is to trace deepfake creators, and, through attribution, try and discover why a particular creator made a certain deepfake. Private companies also try and limit the spread of bots and propaganda materials, with mixed results.

Even after detection deepfakes still pose many problems. Those who viewed one as a legitimate video may not see, or not believe, a statement about it being faked. Further, even if authorities are able to track down a specific deepfake creator, there are issues of what, if any, charges can be brought against that person. It is not technically a crime to create most deepfakes, unless they classify as defamation, or are considered “revenge porn” in a jurisdiction with a relevant statute.

Beyond what a particular creator could, or could not, be charged with, a major question that rapidly looms before the courts is the question of deepfakes’ impact on evidence. Generally speaking, under FRE Rule 801, a computer is not a declarant and computer-generated records are not excluded as hearsay. In a case against a deepfake creator, it is likely that any computer records the prosecution sought to introduce would require further authentication and foundational support before they could be entered into evidence.

This is not the only area of impact to a theoretical trial. If a deepfake creator altered surveillance footage or anonymously published falsely incriminating footage online, this could lead to serious issues during a trial. With computer-generated evidence considered self-authenticating in most cases, a false video could be introduced into evidence and have a major impact on a jury’s verdict. With many companies running on outdated, and therefore vulnerable, systems, a deepfake creator framing someone for a crime would likely face few obstacles to planting falsified electronic evidence.

The idea of using a deepfake to frame someone is not mere speculation. A South Carolina woman made a video in response to an online request for her services in promoting a book, and that video was then used to implicate her in a Canadian anthrax hoax. Canadian police did not believe the video, but if they had, the South Carolina woman could have faced 83 charges related to the false anthrax packages and multiple false bomb threats.

Where a deepfake creator is based outside of the United States, the issues become more complicated, especially if that creator is based in the European Union. Article I of the General Data Protection Regulation (GDPR) “protected the fundamental rights and freedoms” of European Union citizens in regards to their personal data. Under Article 4 of the GDPR, any entity that controls, analyzes, or handles data which includes personally identifiable information, is subject to the privacy requirements of the Regulation, and under Article 2, any file containing a person’s name or identifying information is generally covered by GDPR protection. Articles 6 and 7 have additional consent and proof of consent requirements, which include consent to the usage of personal data by controllers.

If the data involved in the creation of the deepfake is located in Europe, there is a good chance that the GDPR would make accessing that data challenging, especially if the owner of that data could be definitively identified through it. While Article 2 does limit the material scope of the GDPR, stating that the Regulation does not apply to personal data processed “by competent authorities for the… investigation, … or prosecution of criminal offences,” this circles back around to the issue of which offenses a deepfake creator could be charged with.

Privacy issues are not restricted to international creators. The Fourth Amendment guarantees privacy in a person’s “papers, and effects,” and case law applies this protection to personal data. Riley v. California, 573 U.S. 373 (2014), explicitly applied Fourth Amendment protection to cell phone data, focusing on the reasonableness of the privacy expectation for personal data. Putting aside the issues in what criminal charges could apply to a deepfake creator, the warrant requirements under the Fourth Amendment would be yet another necessary hurdle for any possible prosecution.

For a creator located within the United States, there are also First Amendment freedom of expression considerations. In the cases of more creatives uses, and possibly in some nonconsensual use cases, it is likely that the deepfake would be considered protected. Some deepfakes are parodies or other transformative works which are specifically protected under both First Amendment and copyright law. However, if a deepfake is one that is “directed to inciting or producing imminent lawless action and is likely to incite or produce such action” then it would not be protected under the First Amendment.

If a creator cannot be located, then could the online platform hosting the deepfake be sued? After the San Bernardino and Orlando attacks, the victims’ families tried just that. Twitter regularly suspends accounts suspected of posting certain kinds of terrorist content, but also routinely fails to block other accounts calling for violence. Unfortunately for victims’ families, because social media accounts publish content without the host website editing it prior to publication, the host is protected against liability for that content. Without a clear-cut federal sanction for spreading this type of content, companies will continue to claim that they are exerting their “best efforts” to ban it without actually ensuring that it gets removed from the platform. As long as the company makes a show of banning some accounts, others will continue to slip through the cracks and spread propaganda to a wide online audience.

National Security Concerns

Spreading propaganda, even when done by one sovereign State against the interests of another, does not constitute an armed attack or use of force that can easily or simply be responded to with a proportionate response attack. Instead, cyber-attacks tend to fall in the murky area between interference and intervention, assuming they fall within the framework of sovereign actions at all. In the cases where a State could be definitively found to have deployed a deepfake, it would likely be considered interference, and the response would be restricted to countermeasures.

This is not to say that any level of interference is harmless. The U.S. itself has been known to use propaganda as part of international campaigns to further U.S. interests abroad, often to great effect. Today, the spread of propaganda and false information often cannot be tracked to a State actor, and instead leads investigators to civilians and other private actors. “The very structure of the modern Internet… creates incentives to spread fake news as much as possible, not by focused efforts of a foreign nation with an agenda, but by offering the chance at easy money.” It is easy to understand why seeing the propaganda tactics the U.S. has used so effectively turned back would make the government more than a little concerned, especially with such high portions being attributed to private actors.

Conclusion

In the movie eXistenZ, one of the characters says, “We’re both stumbling around together in this unformed world, whose rules and objectives are largely unknown, seemingly indecipherable or even possibly nonexistent, always on the verge of being killed by forces that we don’t understand.” While deepfakes alone are unlikely to kill anyone, the rest of this quote applies in a worryingly prescient way. The world of deepfakes and related technology is growing at a rapid pace, and it is increasingly unlikely that an average person will be able to differentiate a faked video from a real one. The “rules” are based around what a user’s computer can accomplish. The “objectives” of deepfake creators are their own, and largely unknowable until after the deepfake is published to the web. Even after publication, a creator’s true motives may be hidden or ulterior to what they appear to be on the surface, and government response options are limited in cases where the creator is a private actor, rather than a State-sponsored one.

On the one hand, there exist many entertaining and harmless uses for the various technologies once confined to big-budget Hollywood studios. On the other hand, having them readily available to average citizens opens up a whole new world of possible abuses. Experts worry that the real danger becomes one of plausible deniability in a world where no one trusts the media, and anyone is able to deny that a recording is accurate. In the political arena especially, attack ads featuring allegations of misconduct could be doctored with faked footage, or an opponent could deny credible accusations.

Current legal regimes are unable to deal with the deepfake threat effectively. Social media companies whose platforms host user content are protected from liability by a statutory shield. There needs to be a new assessment for application of First Amendment protection of free speech to online spaces, as this may prove the most effective way to maintain information integrity in a time where more and more news is disseminated online.

Having a wider proliferation of users for deepfake technology can also serve to conceal those who would use it for “bad” ends. The creators of deepfakes will only become more anonymous as use of it spreads, especially if the technology is fully translated to use via smartphone. Even if the government is able to keep ahead of developing new detection methods, it becomes increasingly unlikely that any creator would be caught or charged with any criminal offense, assuming one could be found that was considered applicable. As the U.S. government’s own propaganda tactics are turned against it by international actors, it becomes very clear that a response to this emergent technology is needed sooner rather than later.

About Ashley Dean

Ashley Dean
A. Fiona Dean is a graduate of Penn State Law who studied cyberlaw and national security law. She has written several academic papers on the topics of deepfakes and use of electronic evidence, and the application of existing law to new technology. Fiona also studied law at the University of Cardiff and brings her international experience to her writing.

Check Also

Machine Translation

Impacts Of Machine Translation Technology On Law Firms

Machine translation services can help improve security and confidentiality.