Wednesday, October 1, 2025
Washington DC
New York
Toronto
Distribution: (800) 510 9863
Press ID
  • Login
RH NEWSROOM National News and Press Releases. Local and Regional Perspectives. Media Advisories.
Yonkers Observer
  • Home
  • World
  • Politics
  • Finance
  • Technology
  • Health
  • Culture
  • Entertainment
  • Trend
No Result
View All Result
  • Home
  • World
  • Politics
  • Finance
  • Technology
  • Health
  • Culture
  • Entertainment
  • Trend
No Result
View All Result
Yonkers Observer
No Result
View All Result
Home Technology

OpenAI Says DeepSeek May Have Improperly Harvested Its Data

by Yonkers Observer Report
January 29, 2025
in Technology
Share on FacebookShare on Twitter

OpenAI says it is reviewing evidence that the Chinese start-up DeepSeek broke its terms of service by harvesting large amounts of data from its A.I technologies.

The San Francisco-based start-up, which is now valued at $157 billion, said that DeepSeek may have used data generated by OpenAI technologies to teach similar skills to its own systems.

This process, called distillation, is common across the A.I. field. But OpenAI’s terms of service say that the company does not allow anyone to use data generated by its systems to build technologies that compete in the same market.

“We know that groups in the P.R.C. are actively working to use methods, including what’s known as distillation, to replicate advanced U.S. A.I. models,” OpenAI spokeswoman Liz Bourgeois said in statement emailed to The New York Times, referring to the People’s Republic of China.

“We are aware of and reviewing indications that DeepSeek may have inappropriately distilled our models, and will share information as we know more,” she said. “We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the U.S. government to protect the most capable models being built here.”

DeepSeek did not immediately respond to a request for comment.

DeepSeek spooked Silicon Valley tech companies and sent the U.S. financial markets into a tailspin earlier this week after releasing A.I. technologies that matched the performance of anything else on the market.

The prevailing wisdom had been that the most powerful systems could not be built without billions of dollars in specialized computer chips, but DeepSeek said it had created its technologies using far fewer resources.

Like any other A.I. company, DeepSeek built its technologies using computer code and data corralled from across the internet. A.I. companies lean heavily on a practice called open sourcing, freely sharing the code that underpins their technologies — and reusing code shared by others. They see this is as way of accelerating technological development.

They also need massive amounts of online data to train their A.I. systems. These systems learn their skills by pinpointing patterns in text, computer programs, images, sounds and videos. The leading systems learn their skills by analyzing just about all of the text on the internet.

Distillation is often used to train new systems. If a company takes data from proprietary technology, the practice may be legally problematic. But it is often allowed by open source technologies.

OpenAI is now facing more than a dozen lawsuits accusing it of illegally using copyrighted internet data to train its systems. This includes a lawsuit brought by The New York Times against OpenAI and its partner Microsoft.

The suit contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information. Both OpenAI and Microsoft deny the claims.

A Times report also showed that OpenAI has used speech recognition technology to transcribe the audio from YouTube videos, yielding new conversational text that would make an A.I. system smarter. Some OpenAI employees discussed how such a move might go against YouTube’s rules, three people with knowledge of the conversations said.

An OpenAI team, including the company’s president, Greg Brockman, transcribed more than one million hours of YouTube videos, the people said. The texts were then fed into a system called GPT-4, which was widely considered one of the world’s most powerful A.I. models and was the basis of the latest version of the ChatGPT chatbot.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended

Beyoncé announces Balmain clothing line after Adidas split

3 years ago

A guide to L.A.’s ‘Spinning Sunset’ vinyl fair on the Sunset Strip

2 months ago

Trump Adds Tariff Exemptions for Smartphones, Computers and Other Electronics

6 months ago

Sinéad O’Connor’s cause of death revealed a year later

1 year ago
Yonkers Observer

© 2025 Yonkers Observer or its affiliated companies.

Navigate Site

  • About
  • Advertise
  • Terms & Conditions
  • Privacy Policy
  • Disclaimer
  • Contact

Follow Us

No Result
View All Result
  • Home
  • World
  • Politics
  • Finance
  • Technology
  • Health
  • Culture
  • Entertainment
  • Trend

© 2025 Yonkers Observer or its affiliated companies.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In