1. Introduction

    In today's digital age, data is often referred to as the new oil, and AI products are increasingly becoming the drills that extract this valuable resource. Data scraping and text and data mining ("TDM") are techniques used by AI systems to gather vast amounts of information from the web, fuelling advancements in machine learning and predictive analytics. However, these practices raise significant legal and ethical questions, particularly concerning copyright law. This blog explores the intricacies of data scraping, including recent legal developments and an interesting judgment from the German courts.

    2. Recent German Case of Robert Kneschke v. LAION e.V.

    One of the most notable recent cases in this area is the German case of Robert Kneschke v. LAION e.V., decided by the Hamburg Regional Court in September 2024. This case revolved around a photographer, Robert Kneschke, who discovered that his images, uploaded to a stock photo site, had been included in a dataset used by LAION e.V. (a non-profit organisation) for training AI models without his consent.

    Facts of the Case:

    • Kneschke uploaded his photographs to a stock photo site, which explicitly prohibited the use of images by automated programs.
    • LAION e.V. downloaded Kneschke's photograph and included it in a large dataset of image-text pairs.
    • Kneschke claimed copyright infringement, arguing that LAION's use of his image did not fall under any copyright exceptions.

    Decision of the Court:

    • The Hamburg Regional Court dismissed Kneschke's claim, ruling that LAION could benefit from a limitation to copyright under Section 60d of the German Copyright Act and Article 3 of the EU Digital Single Market Directive (DSM-Dir).
    • The court found that creating datasets for training AI systems could be considered scientific research, thus qualifying for the TDM exception.
    • Importantly, the court also contemplated (but did not rule specifically on) the issue of machine-readable opt-outs from TDM (as permitted under Section 44b(3) of the German Copyright Act (and Article 4 of the DSM-Dir). This allows rights holders to circumvent the exception which permits TDM for commercial purposes. The stock photo site's terms of use included a clear opt-out from TDM, as it explicitly excluded the use of bots for any purpose. This indicates that general statements like "for any purpose" or "all rights reserved" could be enough to disapply the TDM exception. It also creates questions about what constitutes "machine-readable", and whether natural language opt-outs in outward facing terms and conditions are enough to satisfy the machine-readable opt-out under EU law. The court said that “machine readable” should be interpreted as “machine understandable” and, contrary to the prevailing opinion among legal specialists, the court was inclined to treat an opt-out in natural language as being machine understandable.

    Significance:

    • The court's recognition of the TDM exception for scientific purposes potentially sets a precedent for future cases involving AI training datasets, though it remains to be seen whether the UK courts would take a similar approach.
    • The court's comments in relation to the natural language and machine-readable opt-outs from TDM has left the position unclear and will inevitably lead to future cases at an EU level on this point, as AI businesses will want clarity on this sooner rather than later.
    • The fact that LAION e.V. was a non-profit organisation also likely influenced the court's decision. Had LAION e.V. been a commercial organisation, it is arguable that the court could have ruled differently.

      3. UK Position

      The legal framework for TDM in the UK is governed by Section 29A of the Copyright, Designs and Patents Act 1988. This provision allows TDM (without requiring explicit permission from rights holders) for non-commercial research purposes, enabling researchers to copy copyrighted material for computational analysis, provided the materials have been lawfully accessed. Rights holders cannot opt-out or contract-out of the TDM exception for non-commercial purposes, but they can implement measures to control access to their data thus having the lawful access criteria act as a legitimate gateway to their information. This means that while the exception facilitates TDM for research and development, it also ensures that rights holders have some level of control over their intellectual property.

      Developments:

      • The previous UK government considered extending this exception to cover commercial purposes, but the proposal faced significant opposition from the creative industries and was ultimately withdrawn.
      • The previous government then agreed to begin developing a code of practice on copyright and AI to clarify the relationship between intellectual property and generative AI. This code aimed to make licenses for data mining more accessible while ensuring protections for rights holders. However, the initiative was abandoned in February 2024 due to the inability to reach a consensus among stakeholders, further contributing to the regulatory uncertainty surrounding AI and copyright.
      • The new Labour government has suggested that legislative changes may be forthcoming to address this issue, potentially reintroducing the possibility of broader TDM exceptions under UK law. If the Labour government moves forward with these changes, AI businesses may gain more flexibility in using TDM for commercial purposes, provided they comply with the new regulations – though copyright holders will no doubt be concerned about how the new legislation will protect their works.

        4. Joint Statement from Data Protection Authorities

        In October 2024, the UK Information Commissioner's Office (ICO), along with other global data protection authorities, issued a joint statement on data scraping and data protection. The statement highlights the privacy risks associated with mass data scraping, particularly on social media platforms.

        Key Points:

        • Privacy Risks: Data scraping can expose individuals' personal information, even when it is publicly available, leading to potential misuse and privacy breaches.
        • Compliance: Organisations must adhere to data protection laws when using personal information, including for AI model development.
        • Safeguarding Measures: The statement calls for social media companies to implement technical barriers such as CAPTCHA, rate limiting, and monitoring to prevent unauthorised scraping.
        • Lawful Basis for Scraping: Businesses must have a lawful basis for collecting and using personal data, ensuring that their activities comply with data protection regulations.

          Recommendations:

          • Enhanced Safeguards: Implement robust measures to protect against unlawful data scraping and regularly update these measures to keep pace with advances in scraping techniques.
          • Collaboration and Education: Foster ongoing collaboration between regulators, industry stakeholders, and technology developers to address the risks associated with data scraping and increase public awareness of data protection issues.

            5. Conclusion

            The legal landscape of data scraping and TDM by AI products is complex and rapidly changing. Recent developments in Germany and the UK, as well as the joint statement by data protection authorities, underscore the need for businesses to stay informed and compliant with evolving regulations. As AI continues to advance, it is crucial for companies to navigate these legal challenges carefully to ensure they can continue to innovate while respecting intellectual property rights and data protection laws.

            We are increasingly advising clients from all sectors in relation to the use, development and/or procurement of AI products and tools. Please get in touch with any of the authors of this blog if you would like to discuss any of the topics covered.

            You can access all of Brodies' AI-related content and insights by accessing our centralised AI Hub here.

                          Contributors

                          Ally Burr

                          Associate

                          Alison Bryce

                          Partner

                          Steven Pears

                          Trainee Solicitor