Risks of GenAI - Safeguarding intellectual property in the AI era

What is the issue / risk?

Training

GenAI models are created by training artificial neural networks (a particular type of machine learning model) on large volumes of data, such as text, images, videos and code. These models can then be used to generate new content.

There is a strong correlation between the amount of training data used to train neural networks and the performance of these models. As a result, developers of GenAI models require vast quantities of training data in order to train models that produce the best results in response to a user’s requests.

So how do operators of such systems obtain this sheer volume of materials?

Training data and materials for GenAI systems often include materials available on the Internet. Although developers often filter the training data to remove things such as spam and erotic content, many models have been trained without regard to the copyright status of the training materials, raising a number of copyright issues.

We are already seeing this issue play out in the courts. As we have previously considered, Getty Images is currently suing Stability AI, claiming that Stability AI has infringed the copyright of more than 12 million photographs, captions and metadata in training their Stable Diffusion and DreamStudio products. Similarly, in the United States, a class action has been brought against GitHub, Microsoft and OpenAI, with anonymous plaintiffs alleging that the defendants utilised their copyrighted materials to create Codex and Copilot, with claims that the creation of AI-powered coding assistant, GitHub Copilot, constitutes ‘software piracy on an unprecedented scale.’

Closer to home, there haven’t been any lawsuits filed in Australia in relation to claims of copyright infringement, yet. However, Australia will be watching closely, as cases like the ones mentioned above, play out in the UK and US courts. Fundamentally, the cases put forth an argument that copying works that are protected by copyright for the purposes of ‘training data’ may be an infringement of copyright since these materials are being used without the copyright owners’ permission. In defence of this, an argument in the US based case, concerning how copyrighted materials for training data fall within the ‘fair use’ doctrine has been raised. Whilst we must wait to see the efficacy of such an argument, as Australia’s ‘fair dealing’ exceptions are much narrower than the US’s ‘fair use’ exceptions, a successful defence to copyright infringement in the US will not mean the same will be true in Australia. In fact, the differences in law between jurisdictions mean that the territorial reach of each jurisdiction’s legislation may become a significant factor in copyright disputes and may be a factor in how developers train and deploy these models.

However, this copyright risk is beginning to be considered by AI service providers and users. For example, Microsoft have commenced “filtering out” of their training data materials that are protected materials i.e., those protected by copyright, and as can be discerned by the News Corp and OpenAI Global Partnership agreement whereby OpenAI are able to use current and archived news content from The Wall Steet Journal, The New York Post, MarketWatch and Barron’s (among others), OpenAI have begun to enter into licensing agreements with owners of copyright.

But will we run out of data?

A big concern for tech companiess running out of data. While entering into licensing arrangements has the potential to solve copyright worries, the huge demand for data might make it impractical to obtain licenses for all materials required, so it seems that licensing might not be a good solution after all.

To feed this ever-growing demand, some companies are turning to synthetic information, which is data produced by AI models. This use of synthetic data has issues of its own – opinion is divided as to whether synthetic data is useful or will eventually lead to “model collapse.” In fact, a recent study by researchers in the UK and Canada found that where systems are trained on model-generated content, its outputs become increasingly wrong and homogenous, and that even in the best learning conditions, model collapse was inevitable.

Clearly, it is quite the balancing act to ensure a GenAI system has enough data to be able to train itself to produce effective responses and content, and to ensure that copyrighted material is not being used improperly.

Produced Materials in Outputs

A further consideration is whether copyright subsists in materials which are produced using GenAI systems. In Australian law, copyright materials must have originated from a human ‘author’ who has applied a sufficient amount of ‘independent intellectual effort’ to authoring the work. As GenAI systems are trained to produce outputs which reflect training data, such produced materials in output are arguably not novel or inventive. Thus, there is ambiguity about how this originality threshold is satisfied.

We expect courts will question the knowledge, complexity or skill used to “prompt” a GenAI system to generate a work, and the level of human intervention used to augment any output. This question is not only relevant for developers but for those organisations who utilise or monetise works created by GenAI – if copyright doesn’t subsist, such organisations are unable to adequately protect it and prevent others from copying the work. As advances in the use and capability of GenAI continue, this issue in relation to produced materials in outputs and copyright will become increasingly important.

Music deepfakes

A song purportedly by Drake and The Weeknd created a flurry of internet discussion and commentary when it was posted on TikTok and Spotify in April 2023. Within days of its posting, the song was removed from all platforms as a result of copyright claims by the artists’ record label. This rapid advancement of GenAI enabling the creation of music deepfakes also presents copyright risks.

Rights holders^[1] believe that unauthorised datasets are being used to produce these imitations of artists. This imitation has the potential to directly dilute and damage the artist’s brand and livelihood. Not only is there this risk associated with the imitation, but more broadly, the ability to create these deepfakes also poses risk to the music industry. As deepfakes can be cheap and royalty-free, since no compensation needs to be paid to the writers, publishers, performers and record labels, music streaming platforms may be incentivised to allow deepfake music on their platforms. In their view, this has the real potential to diminish greatly the richness and diversity of Australian music available online.

Utilising Outputs

As seen in the New York Times case against OpenAI and Microsoft where millions of their articles were utilised to train chatbots, there is also a risk that where a GenAI output contains a substantive part of an existing work in which copyright subsists, an end user could unknowingly infringe a third party’s copyright. There are some protections that already exist in Australian law, known as ‘fair dealing’ exceptions. There are also technical exceptions that may apply, which includes the temporary ‘copying’ of works whilst one views them (for example, downloading a movie on Netflix to watch later). However, the applicability of such exceptions to GenAI is questionable.

To address this risk and to address user and customer concern, companies like Microsoft, Adobe and Google are offering IP indemnities in relation to the generated outputs. Essentially, this contractual promise applies where a user is challenged or sued on grounds of copyright infringement for an output—Microsoft, Adobe and Google will then assume responsibility subject to certain conditions and limitations. This is intended to put end users at ease and is increasingly common.

So what?

As discussed above, there are legal risks that are posed by the use of a GenAI system – not only for owners and operators of these systems but for users as well. As GenAI systems invariably need data to function, the risk of breaching copyright is potentially quite high – developers need to consider the copyright status of their training data and users need to consider their own use. The risk posed by utilising newly generated content and outputs, also presents anxiety over breaching copyright.

Now what?

There are important considerations and actions both users and operators of GenAI systems can do to mitigate potential copyright issues.

Insight

Governance in the Age of Agentic AI

AI agents are now being deployed across a growing number of Australian organisations. These systems are characterised by their ability to pursue goals and interact with real-world systems with a degree of autonomy. Many of these AI agents have broad access to proprietary data and internal IT systems and can potentially undertake many irreversible actions. Moreover, the volume and speed of agents’ interactions could mean it is impractical for a human to review each decision. These factors require a rethink of existing approaches to AI governance.

25 May 2026

Insight

ASIC and APRA issue call to action on artificial intelligence

Artificial intelligence has the potential to deliver significant benefits and opportunities, but rapid advances in AI also carry possible risks for financial services licensees, credit providers and APRA-regulated entities.

19 May 2026

Insight

Federal Budget 2026-27: The digital economy / AI and Technology

Other notable allocations include $198.1 million to modernise business registers and extend participation in the Consumer Data Right, $160.4 million for the Services Australia Cyber Security Uplift program, and $112.7 million to address online gambling harms.

12 May 2026

USER	OPERATOR / DEVELOPER	Example uses 2
Carefully consider the risk of using GenAI outputs, and review any copyright commitments including any limitations.	Be fully informed of where data is sourced from, and where required, seek permission from owners of materials protected by copyright.
Consider the impact that use of GenAI will have on own IP rights to own materials.	Consider filtering measures on training data to remove well-known copyright material or allowing copyright owners to request that materials containing their IP is removed from the training data set (i.e., opt-out mechanisms)
Consider whether own practices should take into account jurisdictional differences. For example, the availability of the broader, fair use defence in the US.	Consider traceable links to the copyrights owners. For example, GenAI systems could have known IP information coded into the data source and surfaced as a summary to the user. The user can then make a decision in relation to the content, or the GenAI system could prevent particular uses.

Risks of Gen AI - Safeguarding intellectual property in the AI era

Tell me in two minutes

What are the key copyright issues?

What is the issue / risk?

Training

Produced Materials in Outputs

Music deepfakes

Utilising Outputs

So what?

Now what?

Conclusion

Getting lost in the changing landscape of AI regulatory requirements?

Governance in the Age of Agentic AI

ASIC and APRA issue call to action on artificial intelligence

Federal Budget 2026-27: The digital economy / AI and Technology

Governance Solutions

Crisis Management

Innovation at Mallesons

Owl Advisory by Mallesons

Early Careers

Qualified Lawyers

Shared Services and Support

Brisbane

Canberra

Melbourne

Perth

Sydney

Singapore