SDNY Order Renews Possibility of Digital Millenium Copyright Act as Legal Recourse for News Organizations in the Age of AI

In a pending lawsuit in the US District Court for the Southern District of New York (SDNY), OpenAI Inc. recently failed to convince the court to dismiss allegations that it improperly removed copyright management information from news articles used to train its large language models. While still early, the court’s favorable decision for The Intercept Media, Inc. could serve as a model for other news organizations and rights holders seeking to discourage the unauthorized use of their works by artificial intelligence (AI) developers.

Shortly after the decision, OpenAI informed the court of its intention to consolidate all eight of the copyright infringement and Digital Millennium Copyright Act (DMCA) suits currently pending against the AI developer into a single multidistrict litigation in the Northern District of California.

Background

While a seemingly ever-growing number of companies have filed lawsuits accusing AI developers and deployers of copyright infringement, The Intercept’s lawsuit is unique in that it is based entirely on alleged violations of the DMCA. Specifically, The Intercept alleged that OpenAI violated Sections 1202(b)(1) and 1202(b)(3) of the DMCA. Section 1202(b)(1) of the DMCA prohibits intentionally removing or altering copyright management information (CMI) without authorization. Under 1202(b)(3), it is illegal to distribute or publicly perform works that have had their CMI removed or altered. The DMCA defines CMI as the title of the work, the author, and terms and conditions for use of the work, as well as other information about a copyrighted work.

According to The Intercept’s complaint, OpenAI violated Section 1202(b)(1) by training ChatGPT on datasets that included The Intercept’s copyrighted works of journalism and, in the process, knowingly removed the author, title, copyright, or terms of use information from those works. The Intercept bases its claim on the fact that, on several instances, ChatGPT allegedly reproduced The Intercept’s works verbatim without including CMI. Further, The Intercept alleges that OpenAI violated Section 1202(b)(3) by distributing The Intercept’s works to Microsoft, knowing that the CMI was removed. For both DMCA claims, The Intercept alleges that OpenAI lacked proper authority.

Court Allows The Intercept’s DMCA Claim to Proceed

On November 21, the court dismissed the 1202(b)(3) distribution claim against OpenAI but allowed the 1202(b)(1) claim to proceed. Perhaps the most significant consequence of the court’s decision to allow the 1202(b)(1) claim to proceed is that OpenAI could be compelled to begin discovery and provide information to The Intercept about its datasets and training process. To date, OpenAI has provided only limited information about the specific content that it used in its datasets to train ChatGPT. Therefore, the discovery process will be critical for both parties to show whether OpenAI knew that it removed CMI from The Intercept’s works without its authorization and the extent to which OpenAI knew such a distribution would enable or conceal an infringement.

Key Takeaway: The Intercept’s Case Against OpenAI Will Clarify the Future of DMCA Protection Against AI Developers

Until now, other DMCA claims against AI developers have largely failed — most of these cases have not proceeded past the motion-to-dismiss stage — but the order allowing The Intercept’s claim to proceed renews the possibility that the DMCA may be a viable claim against AI developers. For rights holders, 1202(b) provides distinct causes of action against AI developers with different evidentiary requirements than traditional copyright infringement claims. For developers, 1202(b) is another legal risk to be managed, particularly in the wake of the order in The Intercept case.