OpenAI Secures An Agreement To Use Reddit Data To Train Ai

Home AI in Business OpenAI Secures An Agreement To Use Reddit Data To Train Ai
OpenAI Secures An Agreement To Use Reddit Data To Train Ai

OpenAI recently completed an agreement with Reddit to access the platform’s large collection of data to improve AI model development. The collaboration announced on the OpenAI blog, aims to leverage Reddit’s wealth of dynamic, structured, and original data, such as posts and comments. This data will help improve and refine OpenAI’s AI capabilities, including ChatGPT, the organization’s advanced conversational AI. In addition, the partnership will explore new AI features for Reddit’s community and moderators.


Continuing the collaboration, OpenAI will also act as an advertising partner of Reddit. OpenAI expressed its intentions in a statement, saying, “Through the use of our AI technologies, Reddit can significantly improve the site’s interaction, providing a great experience for all users.”


OpenAI has negotiated various content licensing agreements with organizations ranging from media repositories to journalistic publications. However, the deal is particularly notable because of OpenAI CEO Sam Altman’s significant personal investment in Reddit, where he is the third-largest shareholder and former board member. Despite the potential conflict of interest, OpenAI clarified that the deal was led by its chief operating officer, Brad Lightcap, and was approved by independent board members.


As Reddit evolves as a public company, it views such data licensing mechanisms as key to its strategic development. In its IPO filings, Reddit disclosed data licensing contracts with several major firms, including Google, that are collectively valued at more than $200 million. In its first public earnings report, Reddit noted a 450% increase in non-advertising revenue from last year, largely due to these licensing deals. Reddit shares jumped 11% in after-hours after the OpenAI partnership was announced.


During a recent earnings call, Reddit CEO Steve Huffman commented on the changing dynamics of internet content, highlighting the growing value of genuine human input amid the rise of machine-generated content. The remark underscores the importance of Reddit, a platform boasting more than a billion posts and more than 16 billion comments, a treasure trove for AI firms looking to train their models on rich, diverse data sets.


Reddit may face resistance from its user base over the use of their data. Reflecting on a similar scenario, Stack Overflow, a popular developer question-and-answer site, faced backlash from users after partnering with OpenAI to use data, leading to the removal and reinstatement of the community’s most popular posts under controversial circumstances.


Reddit itself has shown resistance to attempts by third parties to give users more control over their data. A recent example involved Vana, a blockchain startup that aimed to create a “DAO” (digital autonomous organization) for Reddit users to collectively manage their data. Reddit responded by shutting down Vana’s discussion forum and criticizing its approach to managing user data.