Data Trawling to Train LLMs: A Lawful Catch?

Activity: Talk or presentation typesInvited talkScientific

Description

Since its launch in late 2022, OpenAI’s ChatGPT has become a focal point of global conversation, evidenced by its record-breaking achievement as the fastest-growing consumer application in history, reaching 100 million monthly active users just two months after its debut. Since then, large language models (LLMs), which are trained, among others, on publicly accessible online personal data, have dominated discussions within the EU data protection domain. Several investigations were initiated by data protection authorities concerning, among others, the processing activities involved in training these models. The establishment of the EDPB’s special task force on ChatGPT in early 2023 underscored the significance of these issues in the EU data protection framework. At the heart of these discussions lies the question of whether and to what extent developers of these models have a “legitimate interest” in training their models with publicly accessible online personal data to understand whether Article 6(1)(f) GDPR could be relied on as the legal basis for this processing activity. After providing an overview of these developments, this talk will further explore why this processing activity should be subjected to Article 9 GDPR regime given the relevant jurisprudence of the Court of Justice of the European Union (CJEU). It will then explain the potential implications of this interpretation and a possible way forward that may be taken to ensure the lawfulness of these processing activities in the EU data protection framework will be speculated with its opportunities and challenges for all stakeholders.
Period7 Feb 2025
Event titleCeBIL Seminar: Data Trawling to Train LLMs: A Lawful Catch?
Event typeSeminar
LocationCopenhagen, DenmarkShow on map
Degree of RecognitionLocal