Seven dataset licensors of music, image, text, video and other content used to train AI systems banded together to form the Dataset Providers Alliance (DPA) to foster ethical data sourcing and usage practices.

The group’s goals include promoting transparency and standardisation for the licencing of intellectual property (IP) content for AI and ML datasets while also ensuring the protection of rights.

Founding members include music licensing company Rightsify, image licencing service vAIsual, Japanese stock photo provider Pixta, AI music generation company Global Copyright Exchange and data marketplace Datarade.

Alex Bestall, CEO of Rightsify, stated the DPA “will serve as a powerful voice for dataset providers, ensuring that the rights of content creators are protected while AI developers get access to large amounts of high-quality AI training data”.

The DPA’s first initiative will be a whitepaper outlining dataset licensing standards.

One of the concerns over the use of generative AI chatbots is a lack of clarity about which IP source code or open language is being used to gather information to develop the platforms.

Generative AI companies such as OpenAI are accused of mining the internet for data that can be used to train their large language models for free, which has resulted in lawsuits over alleged copyright infringement.

Google and Microsoft are offering customers protection from copyright infringement claims covering finished products or the use of their AI training data.