‘AI package hallucination’ can spread malicious code into developer environments

Attackers can easily use ChatGPT to help them spread malicious packages into developer environments, according to new research from Vulcan Cyber.

In a June 6 blog post, Vulcan Cyber researchers explained a new malicious package spreading technique they call “AI package hallucination.” The technique stems from ChatGPT and other generative AI platforms sometimes answering user queries with hallucinated sources, links, blogs and statistics.

Large-language models (LLMs) such as ChatGPT can generate these “hallucinations,” which are URLs, references, and even entire code libraries and functions that do not actually exist. The researchers said ChatGPT will even generate questionable fixes to CVEs, and — in this specific case — offer links to coding libraries that don’t exist, either.

If ChatGPT creates fake code libraries (packages), the Vulcan Cyber researchers said attackers can use these hallucinations to spread malicious packages without using familiar techniques such as typosquatting or masquerading.

“Those techniques are suspicious and already detectable,” the researchers said. “But if an attacker can create a package to replace the ‘fake’ packages recommended by ChatGPT, they might be able to get a victim to download and use it.”

This ChatGPT attack technique underscores how easy it’s become for threat actors to use ChatGPT as a tool to execute an attack, said Bar Lanyado, security researcher, Voyager18 Research Team at Vulcan Cyber.

“It’s very concerning how repetitive its answers are and how easily it responds with hallucinations,” said Lanyado. “We should expect to continue to see risks like this associated with generative AI and that similar attack techniques could be used in the wild. It’s just the beginning, generative AI tech is still pretty new. From a research perspective, it’s likely that we'll see many new security findings in the coming months and years. That said, virtually all generative AI providers are working hard to decrease hallucinations and ensure that their products do not create cyber risks, and that’s reassuring.”

Melissa Bischoping, director of endpoint security research at Tanium, said companies should never download and execute code they don’t understand and haven’t tested — such as open-source GitHub repos, or now ChatGPT recommendations. Bischoping said teams should evaluate for security any code they intend to run, and the team should have private copies of it.

“Do not import directly from public repositories such as those used in the example attack,” said Bischoping. “In this case, attackers are using ChatGPT as a delivery mechanism. However, the technique of compromising supply chain through the use of shared/imported third-party libraries is not novel.

"Use of this strategy will continue, and the best defense is to employ secure coding practices, and thoroughly test and review code intended for use in production environments," she continued. "Don’t blindly trust every library or package you find on the internet, or in a chat with an AI.”

Bud Broomhead, chief executive officer at Viakoo, added that this case serves as an example of yet another chapter in the the arms race that exists between threat actors and defenders.

Much of the dialogue at conferences this year such as the RSA Conference in April was around generative AI, Broomhead said, and now Vulcan Cyber has delivered a great example of how threat actors can use it.

“Ideally, security researchers and software publishers can also leverage generative AI to make software distribution more secure,” said Broomhead.

The industry is in the early stages of generative AI being used for cyber offense and defense, Broomhead continued, who credited Vulcan and other organization with detecting new threats in time to prevent similar exploits.

“Remember, it was only a few months ago I could ask ChatGPT to create a new piece of malware and it would," he said. "Now, it takes very specific and directed guidance for it to inadvertently create it — and hopefully soon even that approach will be prevented by the AI engines.”