GEMA vs. OpenAI: Limits on the Use of Song Lyrics in AI Language Models

Published on 17 November 2025 categories , ,

Last week, the German court drew a clear line for generative AI. In the case of GEMA versus OpenAI, the court ruled that OpenAI infringed the copyrights of German musicians and lyricists by using their song lyrics in its language models.

What was the issue?
GEMA had sued OpenAI because ChatGPT and other models from the company could reproduce recognizable fragments of German song lyrics, including “Atemlos” by Kristina Bach and “Wie schön, dass du geboren bist” by Rolf Zuckowski. According to GEMA, this reproduction was simply copying without permission, constituting copyright infringement.

OpenAI defended itself with the familiar argument that AI models only “learn” linguistic patterns, that they do not store texts, and do not reproduce the texts. Moreover, the company argued that the training process fell under the legal exception for text and data mining. This exception allows the reproduction of texts and data for scientific research, provided it is done with lawful access and the reproductions may only be kept as long as necessary.

Reproduction
The core of the judgment lies in the concept of reproduction. According to the court, the song lyrics were not only present as abstract data in the model but were technically reproducible and factually present in the model parameters. This means that the disputed song lyrics are reproducibly recorded in OpenAI’s models. The court referred to this phenomenon as “memorization.” According to the court, this makes the texts a reproduction in the sense of Article 2 of the InfoSoc Directive.

The court applies the concept of reproduction broadly: numerical storage (such as weights or probabilities) also falls under this, as long as the original text can be retrieved. This reasoning deals a blow to the claim that AI models do not remember texts verbatim. Memorization is therefore not a mere technical side effect but a fixation of protected works. Without the creator’s permission, this may constitute copyright infringement.

No free pass for text and data mining
The exception for text and data mining from the DSM Directive offered no relief to OpenAI. According to the court, this exception covers reproductions for scientific research that are preparatory and serve to analyze statistical relationships or patterns. They do not affect the economic interests of the author because the work itself is not taken over. According to the court, this is different with ChatGPT: the texts are permanently present in the model parameters and thus part of the model. Therefore, the exception cannot be applied. The court also rejected an “analogous application”—interpreting the law broadly in favor of technological innovation. The legal text is clear, and extension would be to the detriment of authors.

OpenAI also relied on the rule from the German Copyright Act for “insignificant ancillary works”: when a copyrighted work plays only a minor, insignificant role within a larger work, it may be used without permission. The court found this rule not applicable here. The song lyrics were not a small detail within a larger copyrighted whole but specific parts of the training corpus. To see something as insignificant ancillary work, the entire training corpus would have to count as a copyrighted work itself. According to the court, this is not the case.

Liability for output
Interestingly, the court also holds OpenAI liable for what the chatbot reproduces, not its users. According to the court, the output is not the result of the user’s input, but of OpenAI’s choices in the design, datasets, and output mechanism. The outputs are generated by simple prompts of the user. Because the chatbot can reproduce recognizable parts of the song lyrics on simple prompts, the court finds that OpenAI makes the works publicly available without permission, constituting copyright infringement.

Conclusion
The German court sees memorization in language models as a real, lasting copy of copyrighted material. Both training with protected texts and reproducing those texts via a chatbot can infringe the creators’ copyrights. The text and data mining exception offers no protection here, as the model does not only analyze information but also stores texts in its system.

For tech companies, this ruling means that training AI on copyrighted content without permission poses a high risk of infringement, even when the storage appears indirect or statistical. This significantly limits the scope of the data mining exception in commercial AI applications, possibly so much that it leaves insufficient room for the training process of contemporary language models. For creators, this means their copyright interests are more strongly protected, giving them a firmer position to demand compensation and control over how their works are used by AI providers.

publications

Related posts