Leaked FTC Civil Investigative Demand to OpenAI Provides a Rare Preliminary View of the Future of AI Enforcement

On July 13, 2023, The Washington Post broke the news that the Federal Trade Commission (FTC) had issued a Civil Investigative Demand (CID) — a sort of a pre-litigation subpoena as part of what is supposed to be a nonpublic investigation — to OpenAI, LLC, the maker of ChatGPT and DALL-E, asking questions and seeking documents in an effort to determine whether the company is in compliance with FTC standards for privacy, data security, and advertising.

Off

The Post published a leaked redacted version of the CID on the same day. How the FTC proceeds in the matter, once the investigation is complete, could set new consumer protection guardrails for the nascent generative AI industry very close in time to the industry’s bursting into the public consciousness late in 2022.

By itself, this leak is extraordinary. There has historically been no serious debate about the FTC’s record of keeping nonpublic consumer protection investigations nonpublic. We rarely know of even the fact of a nonpublic investigation before the FTC closes it (in which case the FTC sometimes publishes closing letters) or initiates an enforcement action. However, in this case, the availability of the CID gives us a rare glimpse into the FTC’s thinking at the outset of an investigation into a nascent industry.

It’s important to note that many FTC investigations do not lead to enforcement, and in this case, OpenAI may satisfy the FTC’s concerns. In that case, the FTC would close the investigation without acting. But, at the very least, this CID shows the areas of enforcement of interest to the FTC as it related to the generative AI industry. Advertising, privacy, safety, and data security are clearly top concerns.

Advertising

The CID asks how OpenAI advertises its products and asks for copies of all such advertisements regarding its Large Language Model (LLM) products. Specifically, the FTC is trying to understand how OpenAI’s advertising conveyed information about the capabilities, accuracy, and reliability of AI outputs. The FTC previewed this line of questions in its blog posts here and here, making clear that AI companies must advertise their products truthfully. We covered the potential for generative AI to create “dark patterns” – user interfaces that can manipulate users to take actions they did not intent – here. AI product advertising is clearly an FTC priority, and this CID drives that point home.

Privacy

Training Data Sets, Data Scraping, and Secondary Use of Publicly Available Personal Information

Following on the heels of recent class action activity alleging privacy law violations in connection with data scraping used to train LLMs, the CID goes on to ask a number of questions on how OpenAI obtained the data sets used to train its products, specifically asking whether these data were obtained by means of data scraping, purchasing training data from third parties, whether the information was on publicly-available websites, the types of data comprising the data sets, and how Open AI vetted and assessed these data sets before using them for LLM training or for other development purposes. The CID then asks the company to describe all steps it takes to remove personal information (or information mat may become personal information when combined with other information) from its LLM training data sets.

This suggests that the FTC may take issue with widespread data scraping for LLM training purposes, at least where those data include individuals’ personal information or sensitive personal information. Even OpenAI’s own CPT-4 system card, which says that the company “remov[ed] personal information from the training dataset where feasible[1] suggests that at least some personal information was included in training data sets.

The FTC’s focus on personal data in training sets may be based on the “inconsistent secondary use” concept, one element of the data minimization principle we covered here. Specifically, the theory would be that even when consumers have provided their personal information on a publicly-accessible website – say, for example, a social media service — Section 5 of the FTC Act (which prohibits deceptive and unfair practices in commerce) prohibits the use of that information for purposes inconsistent with those for which consumers initially disclosed it (here, for LLM training purposes). If the FTC does make this type of allegation, it would have serious implications for LLM training going forward and could even add fuel to a growing demand training sets that do not contain personal information (or, for that matter, information subject to intellectual property law protection).

Consumer Controls
 

The CID goes on to ask about user data controls, including managing consumers’ requests to opt-out of collection, retention, use, and transfer, or to delete their personal information, including circumstances when these requests are not honored. Here, the FTC is less likely to try to establish new consumer rights under Section 5 of the FTC Act (although these issues are addressed in the Biden Administration’s nonbinding Blueprint for an AI Bill of Rights), and more likely to identify controls the company provides and look for failures to honor them, as offered and as described.

Enforcing Privacy Promises
 

In an apparent effort to assess the accuracy and completeness of OpenAI’s privacy policy, the CID also asks a number of questions about the personal information it collects, including the source, the type, how long it’s stored, including when a user opts out of data retention or requested its deletion, the purposes of use, to whom it’s disclosed, and for what purposes. The FTC has long enforced public-facing privacy representations, either for misrepresentations or for failing to adequately disclose material facts. If they find any such misrepresentation or failure to disclose here, that may lead to an enforcement action.

False, Misleading or Disparaging Statements about Individuals Leading to Harm
 

The CID goes on to ask the company what steps it has taken to address or mitigate risks that its LLM’s generate outputs containing personal information. More specifically, the FTC seeks information on the any complaints or reports that LLMs generate statements about individuals that are false, misleading, disparaging, or harmful, any procedures for addressing these complaints and reports, and any policies and procedures for excluding those outputs. This likely goes to whether the LLMs operate “unfairly” under Section 5 of the FTC Act, which prohibits acts or practices that cause consumer harm, that consumers can’t avoid, and where the harm is not outweighed by the benefits of the product.

Safety

The CID asks for extensive safety information. Most interestingly, information on complaints that the company has received regarding specific instances of “safety challenges” caused by the company’s LLMs. Based on the GPT-4 System Card, these include risks associated with:

  • Hallucinations (“producing content that is nonsensical or untruthful in relation to certain sources”);
  • Content that is harmful to individuals, groups, or to society, including “hate speech, discriminatory language, incitements to violence, or content that is then used to either spread false narratives or to exploit an individual”;
  • Harms of representation, allocation, and quality of service, including perpetuating or amplifying bias;
  • Disinformation and influence operations;
  • Proliferation of conventional and unconventional weapons;
  • Cybersecurity, including vulnerability discovery and exploitation (e.g. data breaches), and social engineering;
  • Economic impacts, including job displacement and wage rate reductions; and user overreliance or inaccurate information that appears to be true and believable.[2]

Here, the FTC seems to have two purposes. First, answers to these questions will be educational for the FTC as it tries to get its hands around this new technology. This is an opportunity for the FTC to get real-world information on the safety landscape of leading LLMs in the market now. Second, the answers may provide enforcement material if it turns out that they reveal information inconsistent with OpenAI’s public statements about its LLMs, or information showing that the LLMs cause harm within the meaning of Section 5. While it’s easy to think of hallucinations that do not rise to the level of a Section 5 violation, providing a tool that enables the widespread ability easily to cause data breaches could lead to expanded use of the FTC’s Section 5’s “means and instrumentalities” theory of liability.

Data Security

The FTC asks about data security at OpenAI itself, and with respect to its LLMs when made available by others through an API or plugin. This is tried and true territory for the FTC, which has brought scores of enforcement actions alleging inadequate security protections of consumers’ personal information.

The CID first addresses a specific security incident from March 2020 involving a “bug … which allowed some users to see titles from another user’s chat history” and payment-related information. The CID calls for the number of users affected, categories of information exposed, and information regarding the company’s response to the incident. The CID also asks for information on any other security incidents, specifically calling for information on any “prompt injection” attacks – unauthorized attempt to bypass filters or manipulate an LLM to ignore prior instructions or to perform actions that its developers did not intend.

The CID also calls for company policies and procedures used to assess risks to users personal information in connection with API integrations and plugins, including testing and auditing of plugins and API integrations, oversight of third parties that use the company’s API or plugins, restrictions imposed on third parties’ use of user data, the means by which the company ensures that all such third parties comply with OpenAI’s own policies, and policies limiting API users’ ability to fine tune the company’s LLM’s in ways increase the data security risks for user data. This line of questioning suggests that the FTC thinks that companies in the AI industry need to do due diligence on their partners and to hold them to contractual provisions and monitoring to make sure that the partners do not misuse the technology. This is also well-trod ground, and not surprising, as the implications of misuse in this context is probably higher than in most other contexts.

Conclusion

The FTC’s CID to OpenAI offers a very rare glimpse into the FTC’s enforcement policy development, as it is being developed in connection with an explosively growing new industry. Expect to see more activity in this area as the FTC, like so many other policymakers around the world, tries to get its arm around, and establish sensible guardrails for, the new generative AI industry.


[1] GPT-4 System Card OpenAI March 23, 2023, available at https://cdn.openai.com/papers/gpt-4-system-card.pdf (emphasis added).

[2] Id.

Contacts

Continue Reading