When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Google reportedly let OpenAI transcribe a million hours of YouTube videos to train GPT-4

The OpenAI logo

According to a fresh report, in a bid to secure high-quality data to train their AI models, AI companies such as OpenAI, Google, and Meta have resorted to shady tactics. A New York Times report states that OpenAI has purportedly transcribed over a million hours of YouTube videos to dab data to train its most advanced large language model (LLM), GPT-4.

Reportedly, OpenAI developed the Whisper audio transcription model, which helped the company in scraping data from YouTube videos. The NY Times reports that OpenAI knew that this method could come under scrutiny, but they went ahead with it because they believed it to be fair use. Interestingly, Google, which owns YouTube, has also been allegedly involved in practising the same for its AI models, thereby violating its creator's copyrights.

The NY Times report is in line with The Information's report, where it was highlighted that OpenAI allegedly scrapped data from YouTube videos and podcasts to train two of its AI systems. The report also suggests that OpenAI's president, Greg Brockman, was also on the team.

When YouTube CEO Neil Mohan was interviewed by Bloomberg, he said that the company's policies "do not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service." However, when he was questioned whether YouTube data was used by OpenAI or not, Mohan gave an ambiguous answer, saying, "I have seen reports that it may or may not have been used. I have no information myself."

The NY Times report further claims that some people in Google knew about OpenAI's practice of transcribing YouTube data, but they could not do anything since Google also resorted to the same practice to train its own AI model. Google, though, said to The NY Times that it does data scraping of videos only after the creator of the video has given their consent.

As per the report, it is claimed that Google asked a team to "tweak its privacy policy" in June 2023, "to allow Google to be able to tap publicly available Google Docs, restaurant reviews on Google Maps, and other online material for more of its A.I. products."

Report a problem with article
GameSire Nova Controller
Next Article

GameSir Nova Review: a controller with hall effect sticks, 250Hz Bluetooth & A+ calibration

Artists illustration of one of the Voyager spacecraft
Previous Article

After five months of debugging, NASA finally knows why Voyager 1 sends gibberish data

Join the conversation!

Login or Sign Up to read and post a comment.

2 Comments - Add comment