For data-hungry tech companies, YouTube is a gold mine

Marketplace Tech

30-07-2024 • 11 minutes

Companies competing in the chatbot wars are using something known in the industry as “the Pile” to train their large language models. It’s a trove of open-source data made up of text scraped from all around the internet, including Wikipedia and the European Parliament. Annie Gilbertson, investigative reporter for Proof News, recently took a deep dive into the Pile and discovered something else: a dataset called “YouTube Subtitles.” Marketplace’s Lily Jamali spoke with Gilbertson about her investigation and how YouTube creators feel about their content being used without their consent.

Vous pourriez aimer

Darknet Diaries
Darknet Diaries
Jack Rhysider
Hard Fork
Hard Fork
The New York Times
TechStuff
TechStuff
iHeartPodcasts
Acquired
Acquired
Ben Gilbert and David Rosenthal
WSJ’s The Future of Everything
WSJ’s The Future of Everything
The Wall Street Journal
The Vergecast
The Vergecast
The Verge
Double Tap
Double Tap
Accessible Media Inc.
Waveform: The MKBHD Podcast
Waveform: The MKBHD Podcast
Vox Media Podcast Network