12K hardcoded API keys and passwords found in public LLM training data

February 28, 2025

Glowing digital key on a dark circuit board symbolizing cybersecurity and data encryption. Cybersecurity awareness, data protection, digital security, IT, information safety, encryption concept.

(Adobe Stock)

Roughly 12,000 hardcoded live API keys and passwords were found on Common Crawl, a large dataset used to train LLMs such as DeepSeek.

Security pros say hardcoded credentials are dangerous because hackers can more easily exploit them to gain access to sensitive data, systems, and networks.

The threat actor in this case practiced LLMJacking, in which cybercriminals abuse stolen API access to Generative AI (GenAI) services by selling the access to third parties.

A Feb. 27 blog by Truffle Security reported that 2.76 billion web pages on Common Crawl contained live secrets. The researchers also found a high reuse rate: 63% were repeated across multiple web pages. In one extreme case, a single API key appeared 57,029 times across 1,871 subdomains.

“LLMJacking is a growing trend that we see which involves threat actors targeting machine identities with access to LLMs, and either abusing this access themselves, or selling it to third parties,” explained Danny Brickman, co-founder and CEO of Oasis Security. “This threat will continue to escalate in the year ahead, amplifying the need for robust NHI security measures.”

Stephen Kowski, Field CTO at SlashNext Email Security, said LLMjacking creates a domino effect where initial credential theft leads to widespread abuse by multiple bad actors who purchase access to compromised AI systems. Kowski said beyond the significant financial impact from unauthorized AI usage charges, these attacks enable the creation of harmful content, including sexually explicit material, bypassing the safety controls built into these systems.

“The most concerning aspect is that once credentials are sold on illicit marketplaces, there’s no predicting what damage will follow as various criminals with different motives can use the victim’s AI infrastructure without their knowledge,” said Kowski.

Kowski said security teams should implement strong authentication methods like multi-factor authentication for all AI service access points while establishing strict role-based permissions that follow least-privilege principles. Teams must also enable comprehensive logging and analytics for AI model usage, monitor for unusual API calls or configuration changes, and set up alerts for billing spikes that could indicate unauthorized AI consumption, said Kowski.