When Sam Altman, CEO of the company behind ChatGPT, gave Congressional testimony about the dangers of AI, many believed Altman had genuinely altruistic intentions. More cynically, we wondered if Altman was embarking on a preemptive apology tour, aiming to foster good will before damning revelations about his company could come to light.
OpenAI started in 2015, promising to be an idealistic, transparent nonprofit. But over time, it has broken these promises. After reorganizing to add a for-profit entity in 2019, OpenAI accepted a $1 billion investment from Microsoft. Then, OpenAI stopped sharing the underlying code for its Large Language Models (LLMs), despite its previous commitment to the open-source software community.
Today, there’s little at OpenAI that remains open, except its name. OpenAI won’t even say what data it used to train ChatGPT, the LLM it released for public use on November 30, 2022.
There’s something hypocritical – and deeply unsettling – about OpenAI’s closed doors. Of course, we wouldn’t feel any better had OpenAI remained an idealistic nonprofit that was fully transparent about how it was using others’ work without permission. We simply wouldn’t have to worry that we were being paranoid.
As it happens, we suspect we’re not being paranoid, because OpenAI has a mounting number of legal troubles on the horizon. Here are some lawsuits and U.S. regulatory actions that we’ll be watching during the coming months. With luck, they may help us all learn the truth about OpenAI.
In November 2022, software developers sued OpenAI, Microsoft, and GitHub for violating open-source licenses for computer code. Founded in 2008, GitHub is a website where programmers share their work and allow it to be reused for free, so long as borrowers acknowledge the original authors and preserve notices of copyright and licensing terms. After acquiring GitHub in 2018, Microsoft collaborated with OpenAI to develop Copilot, an AI tool trained on code from GitHub (and possibly other sources). Released in June 2021, Copilot generates code in much the same way that ChatGPT generates text. According to the pending lawsuit, Copilot reproduces thousands or even millions of people’s copyrighted work without proper attribution, which, if proven, would make its corporate designers guilty of “software piracy on an unprecedented scale.”
In early June 2023, radio host Mark Walters sued OpenAI because ChatGPT had allegedly published libelous claims about him. According to the lawsuit, ChatGPT incorrectly identified Walters as a former financial officer for the Second Amendment Foundation (SAF), a real nonprofit. ChatGPT said that Walters had abused the position – which, remember, he had never held – by embezzling more than $5 million and falsifying records to cover his tracks. ChatGPT made these false claims by producing a complete yet fabricated legal filing, which looked like a lawsuit against Walters. OpenAI had previously warned that ChatGPT sometimes “hallucinates,” presenting falsehoods as though they were facts. A legal expert advised Ars Technica that such warnings were unlikely to excuse OpenAI from defamation charges.
In June 2023, internet users sued OpenAI, arguing that the company infringed copyrights and violated the privacy of millions of people when it scraped their online writings and personal data. According to the lawsuit, the writings included – but were not limited to – posts on Facebook, Instagram, Reddit, Snapchat, TikTok, YouTube, and Wikipedia. Additionally, OpenAI collected a staggering amount and variety of personal information about users of ChatGPT or its plug-ins (interfacing apps): contact details, financial information, geolocation, medical records, every keystroke … The account of materials and data that OpenAI exploited without permission to develop AI products like ChatGPT is as shocking as it is mind-boggling.
In June and July 2023, authors sued OpenAI, as well as Meta, for infringing copyrights on their published books. The original plaintiffs were Mona Awad and Paul Tremblay, but they were followed by Christopher Golden, Richard Kadrey, and Sarah Silverman. According to the lawsuit, OpenAI trained ChatGPT on books from online sources that included not only Project Gutenberg, which archives books in the public domain, but also piracy sites like Bibliotik, Library Genesis, or Z-Library, which illegally distribute hundreds of thousands of books still under copyright. The authors sought to demonstrate that OpenAI had pirated their books by asking ChatGPT to produce summaries. ChatGPT responded with lengthy, detailed, and mostly accurate descriptions. (The shocking evidence appears here and here.) In addition to calling chatbots by OpenAI and Meta “industrial-strength plagiarists,” lawyers for the plaintiffs emphasized that books “were copied by OpenAI without consent, without credit, and without compensation.”
In July 2023, the Federal Trade Commission (FTC) contacted OpenAI to demand information about possible privacy violations and consumer harms resulting from its Large Language Models (LLMs), including ChatGPT. The official document was leaked to The Washington Post. The FTC posed 49 questions, many of which have numerous subparts, and requested 17 types of documents, such as business contracts, internal studies, and employee communications. To judge by the questions, the FTC appears particularly concerned by OpenAI’s collection and storage of personal data, advertising practices, and release of products that appear to make false statements and defame individuals. Of course, the FTC expects OpenAI to retain all records or documents related to matters under investigation. We hope OpenAI won’t delete them, which is what Google would do.
Looking over this summary of pending lawsuits and government investigations, contemplating the magnitude of charges against OpenAI, we find ourselves wondering: Have we only just begun our OpenAI Timeline of Scandal and Strife?
We hope a timeline won’t prove necessary. If OpenAI cleans up its act and compensates people for past harms, then perhaps it can recommit to its mission: “to ensure that artificial general intelligence benefits all of humanity.”
If not, then OpenAI will join Google and Facebook on our list of internet companies that profit at people’s expense – and that appear to enjoy it, more or less openly.