RunPod is excited to announce its partnership with Defined.ai, the world's largest marketplace of ethically sourced training datasets for AI models. This collaboration seeks to provide AI developers working with text-to-speech, speech-to-text models, and those fine-tuning LLMs the opportunity to access enterprise-grade conversational speech and text datasets.
DefinedAI's Marketplace of Ethically Sourced Datasets
Defined.ai's datasets are commonly sought after by Fortune 500 companies to train robust natural language processing models. Defined.ai provides a wide range of spontaneous and scripted speech datasets, alongside transcribed text, in a number of different languages, including English, Tagalog, Italian, Dutch, and Mandarin. These datasets are GDPR-compliant and ethically sourced by ensuring that all involved participants are aware during the data collection process that they are providing data that will be used for natural language processing models.
Introducing the Pilot Program
RunPod will run a pilot program with a select number of developers within the community to gain insight into how to optimize the developer experience before providing dataset access to further applicants. Pilot users will have access to one of 12 conversational speech datasets, and will share feedback with both the RunPod and Defined.ai team as they train and run inference on their models.
As the collaboration with Defined.ai evolves, RunPod aims to expand dataset availability to a larger portion of users, alongside a variety of open-source text and image datasets, with the purpose of accelerating machine learning model development.
This partnership is a step forward in providing a broader range of AI resources to the RunPod community. With Defined.ai’s support, RunPod aims to democratize AI development by providing developers and startups access to enterprise-grade datasets that they otherwise wouldn’t be able to leverage.