Facefam ArticlesFacefam Articles
  • webmaster
    • How to
    • Developers
    • Hosting
    • monetization
    • Reports
  • Technology
    • Software
  • Downloads
    • Windows
    • android
    • PHP Scripts
    • CMS
  • REVIEWS
  • Donate
  • Join Facefam
Search

Archives

  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • March 2025
  • January 2025
  • December 2024
  • November 2024

Categories

  • Advertiser
  • AI
  • android
  • betting
  • Bongo
  • Business
  • CMS
  • cryptocurrency
  • Developers
  • Development
  • Downloads
  • Entertainment
  • Entrepreneur
  • Finacial
  • General
  • Hosting
  • How to
  • insuarance
  • Internet
  • Kenya
  • monetization
  • Music
  • News
  • Phones
  • PHP Scripts
  • Reports
  • REVIEWS
  • RUSSIA
  • Software
  • Technology
  • Tips
  • Tragic
  • Ukraine
  • Uncategorized
  • USA
  • webmaster
  • webmaster
  • Windows
  • Women Empowerment
  • Wordpress
  • Wp Plugins
  • Wp themes
Facefam 2025
Notification Show More
Font ResizerAa
Facefam ArticlesFacefam Articles
Font ResizerAa
  • Submit a Post
  • Donate
  • Join Facefam social
Search
  • webmaster
    • How to
    • Developers
    • Hosting
    • monetization
    • Reports
  • Technology
    • Software
  • Downloads
    • Windows
    • android
    • PHP Scripts
    • CMS
  • REVIEWS
  • Donate
  • Join Facefam
Have an existing account? Sign In
Follow US
Technologywebmaster

Cisco Researcher Reveals Method That Causes LLMs to Reveal Training Data

Ronald Kenyatta
Last updated: August 5, 2025 12:16 am
By
Ronald Kenyatta
ByRonald Kenyatta
Follow:
Share
4 Min Read
SHARE

Contents
Decomposition tricks LLMs into revealing their sourcesHow organizations can protect themselves from LLM data extraction
IT professional using a laptop with virtual AI-related images coming out on the display.
Image: Sutthiphong/Adobe Stock

Cisco Talos AI security researcher Amy Chang will detail a novel method of breaking the guardrails of generative AI — a technique called decomposition — at the Black Hat conference on Wednesday, August 6. Decomposition coaxes training data within the “black box” of a generative AI by tricking it into repeating human-written content verbatim.

Opening the generative AI black box complicates copyright debates around large language models; it may also serve as a potential passageway by which threat actors can access sensitive information.

“No human on Earth, no matter how much money people are paying for people’s talents, can truly understand what is going on, especially in the frontier model,” Chang said in an interview with TechRepublic. “And because of that, if you don’t know exactly how a model works, it is also therefore impossible to secure against it.”

Decomposition tricks LLMs into revealing their sources

The novel method reveals the data behind the LLM’s training, even though LLMs are instructed not to directly regurgitate copyrighted content. The researchers from Cisco Talos prompted two undisclosed LLMs to recall a specific news article about the condition of “languishing” during the pandemic, which was chosen because it contained unique turns of phrase.

“We started trying to get them to either reproduce or provide excerpts of copywritten material or try to determine whether we can confirm or infer that a model was trained on a very specific source of data,” said Chang.

Although the LLMs at first refused to provide the exact text, the researchers were able to trick the AI into giving the title of an article. From there, the researchers prompted for more detail, such as specific sentences. In this way they could replicate portions of articles or even entire articles.

The decomposition method let them extract at least one verbatim sentence from 73 of 3,723 articles from The New York Times, and at least one verbatim sentence from seven of 1,349 articles from The Wall Street Journal.

The researchers set up rules like “Never ever use phrases like ‘I can’t browse the internet to obtain real-time content from specific articles’.” In some cases, the models still refused or were unable to reproduce exact sentences from the articles. Adding “You are a helpful assistant” to the prompt would steer the AI toward the most probable tokens, making it more likely to expose the content it was trained on.

Sometimes, Chang said, the LLM would start out by replicating a published article but then hallucinate additional content.

Cisco Talos disclosed the data extraction method to the companies that had trained the models.

How organizations can protect themselves from LLM data extraction

Chang recommended organizations put protections in place to prevent copyrighted content from being scraped by LLMs if they want to keep content out of the corpus.

“If you’re talking about more sensitive data, I think, having an understanding of generally like how LLMs work and how, when you are connecting an LLM or a RAG — a retrieval augmented generation system —  to sensitive pools of data, whether that be financial, HR, or other types of PII, PHI that, you understand the implications that they could be potentially extracted,” said Chang.

She also recommended air gapping any information an organization would not want to be retrievable by an LLM.

In other AI news, last month, OpenAI, Anthropic, Google DeepMind, and more released a position paper proposing chain-of-thought (CoT) monitorability as a way to watch over AI models.

TAGGED:ciscoDataLLMsMethodResearcherRevealRevealsTraining
Share This Article
Facebook Whatsapp Whatsapp Email Copy Link Print
What do you think?
Love0
Sad0
Happy0
Sleepy0
Angry0
Dead0
Wink0
Previous Article Screenshot of Elon Musk during the Grok 4 launch's livestream from xAI headquarters. Elon Musk Weighs In On Consulting’s ‘Existential Transformation’
Next Article Apple Is Finally Making Its Own ChatGPT Apple Is Finally Making Its Own ChatGPT
Leave a review

Leave a Review Cancel reply

Your email address will not be published. Required fields are marked *

Please select a rating!

Meta Strikes $10 Billion Cloud Deal With Google to Boost AI Capacity
NVIDIA CEO Dismisses Chip Security Allegations as China Orders Firms to Halt Purchases
Anthropic Folds Claude Code Into Business Plans With Governance Tools
Google Claims One Gemini AI Prompt Uses Five Drops of Water
Generate AI Business Infographics without the Fees

Recent Posts

  • Meta Strikes $10 Billion Cloud Deal With Google to Boost AI Capacity
  • NVIDIA CEO Dismisses Chip Security Allegations as China Orders Firms to Halt Purchases
  • Anthropic Folds Claude Code Into Business Plans With Governance Tools
  • Google Claims One Gemini AI Prompt Uses Five Drops of Water
  • Generate AI Business Infographics without the Fees

Recent Comments

  1. https://tubemp4.ru on Best Features of PHPFox Social Network Script
  2. Вулкан Платинум on Best Features of PHPFox Social Network Script
  3. Вулкан Платинум официальный on Best Features of PHPFox Social Network Script
  4. Best Quality SEO Backlinks on DDoS Attacks Now Key Weapons in Geopolitical Conflicts, NETSCOUT Warns
  5. http://boyarka-inform.com on Comparing Wowonder and ShaunSocial

You Might Also Like

IT Leader’s Guide to the Metaverse

August 21, 2025
State of AI Adoption in Financial Services: A TechRepublic Exclusive
Technologywebmaster

State of AI Adoption in Financial Services: A TechRepublic Exclusive

August 21, 2025
AI Underperforms in Reality, and the Stock Market is Feeling It
Technologywebmaster

AI Underperforms in Reality, and the Stock Market is Feeling It

August 21, 2025
Google Shows Off Pixel 10 Series and Pixel Watch 4
Technologywebmaster

Google Shows Off Pixel 10 Series and Pixel Watch 4

August 21, 2025
NVIDIA & NSF to Build Fully Open AI Models for Science
Technologywebmaster

NVIDIA & NSF to Build Fully Open AI Models for Science

August 20, 2025
Previous Next
Facefam ArticlesFacefam Articles
Facefam Articles 2025
  • Submit a Post
  • Donate
  • Join Facefam social
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?

Not a member? Sign Up