For data-hungry tech companies, YouTube is a gold mine

Marketplace Tech

30-07-2024 • 11 minutes

Companies competing in the chatbot wars are using something known in the industry as “the Pile” to train their large language models. It’s a trove of open-source data made up of text scraped from all around the internet, including Wikipedia and the European Parliament. Annie Gilbertson, investigative reporter for Proof News, recently took a deep dive into the Pile and discovered something else: a dataset called “YouTube Subtitles.” Marketplace’s Lily Jamali spoke with Gilbertson about her investigation and how YouTube creators feel about their content being used without their consent.

Vous pourriez aimer

Darknet Diaries

Darknet Diaries

Jack Rhysider

TED Radio Hour

NPR

Hard Fork

The New York Times

All-In with Chamath, Jason, Sacks & Friedberg

All-In with Chamath, Jason, Sacks & Friedberg

All-In Podcast, LLC

This Week in Tech (Audio)

This Week in Tech (Audio)

TWiT

TechStuff

iHeartPodcasts

Acquired

Ben Gilbert and David Rosenthal

WSJ’s The Future of Everything

WSJ’s The Future of Everything

The Wall Street Journal

Daily Tech News Show

Daily Tech News Show

Tom Merritt

PJ Vogt

Moteur de recherche

Moteur de recherche

Radio-Canada

Elon Musk Podcast

Elon Musk Podcast

Stage Zero

Spark

CBC

The Vergecast

The Verge

Double Tap

Accessible Media Inc.

Security Now (Audio)

Security Now (Audio)

TWiT

BBC Inside Science

BBC Inside Science

BBC Radio 4

Waveform: The MKBHD Podcast

Waveform: The MKBHD Podcast

Vox Media Podcast Network

Hands-On Tech (Audio)

Hands-On Tech (Audio)

TWiT

Tech Talk with Alan Perry

Tech Talk with Alan Perry

Alan Perry