Wednesday, April 15, 2026
Washington DC
New York
Toronto
Distribution: (800) 510 9863
Press ID
  • Login
RH NEWSROOM National News and Press Releases. Local and Regional Perspectives. Media Advisories.
Yonkers Observer
  • Home
  • World
  • Politics
  • Finance
  • Technology
  • Health
  • Culture
  • Entertainment
  • Trend
No Result
View All Result
  • Home
  • World
  • Politics
  • Finance
  • Technology
  • Health
  • Culture
  • Entertainment
  • Trend
No Result
View All Result
Yonkers Observer
No Result
View All Result
Home Technology

Four Takeaways on the Race to Amass Data for A.I.

by Yonkers Observer Report
April 6, 2024
in Technology
Share on FacebookShare on Twitter

Online data has long been a valuable commodity. For years, Meta and Google have used data to target their online advertising. Netflix and Spotify have used it to recommend more movies and music. Political candidates have turned to data to learn which groups of voters to train their sights on.

Over the last 18 months, it has become increasingly clear that digital data is also crucial in the development of artificial intelligence. Here’s what to know.

The more data, the better.

The success of A.I. depends on data. That’s because A.I. models become more accurate and more humanlike with more data.

In the same way that a student learns by reading more books, essays and other information, large language models — the systems that are the basis of chatbots — also become more accurate and more powerful if they are fed more data.

Some large language models, such as OpenAI’s GPT-3, released in 2020, were trained on hundreds of billions of “tokens,” which are essentially words or pieces of words. More recent large language models were trained on more than three trillion tokens.

Online data is a precious and finite resource.

Tech companies are using up publicly available online data to develop their A.I. models, faster than new data is being produced. According to one prediction, high-quality digital data will be exhausted by 2026.

Tech companies are going to great lengths to obtain more data.

In the race for more data, OpenAI, Google and Meta are turning to new tools, changing their terms of service and engaging in internal debates.

At OpenAI, researchers created a program in 2021 that converted the audio of YouTube videos into text and then fed the transcripts into one of its A.I. models, going against YouTube’s terms of service, people with knowledge of the matter said.

(The New York Times has sued OpenAI and Microsoft for using copyrighted news articles without permission for A.I. development. OpenAI and Microsoft have said they used news articles in transformative ways that did not violate copyright law.)

Google, which owns YouTube, also used YouTube data to develop its A.I. models, wading into a legal gray area of copyright, people with knowledge of the action said. And Google revised its privacy policy last year so it could use publicly available material to develop more of its A.I. products.

At Meta, executives and lawyers last year debated how to get more data for A.I. development and discussed buying a major publisher like Simon & Schuster. In private meetings, they weighed the possibility of putting copyrighted works into their A.I. model, even if it meant they would be sued later, according to recordings of the meetings, which were obtained by The Times.

One solution may be ‘synthetic’ data.

OpenAI, Google and other companies are exploring using their A.I. to create more data. The result would be what is known as “synthetic” data. The idea is that A.I. models generate new text that can then be used to build better A.I.

Synthetic data is risky because A.I. models can make errors. Relying on such data can compound those mistakes.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Mark Goddard dead: Star of ‘Lost in Space’ was 87

3 years ago

Trailing Trump in Polls, Biden Can Be More Bullish in One Battleground

2 years ago

‘The Territory’ documents rainforest invasions

3 years ago

IATSE contract negotiations kick off for crew members

2 years ago
Yonkers Observer

© 2025 Yonkers Observer or its affiliated companies.

Navigate Site

  • About
  • Advertise
  • Terms & Conditions
  • Privacy Policy
  • Disclaimer
  • Contact

Follow Us

No Result
View All Result
  • Home
  • World
  • Politics
  • Finance
  • Technology
  • Health
  • Culture
  • Entertainment
  • Trend

© 2025 Yonkers Observer or its affiliated companies.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In