Gemini for Workspace susceptible to indirect prompt injection, researchers say

Google’s Gemini for Workspace, which integrates its Gemini large-language model (LLM) assistant across its Workspace suite of tools, is susceptible to indirect prompt injection, HiddenLayer researchers said in a blog post Wednesday.

Indirect prompt injection is a method of manipulating an AI model’s output by inserting malicious instructions into a data source the AI relies on to form its responses, such as a document or email. This differs from direct prompt injection, which involves sending malicious instructions directly to the AI through its user interface.

Gemini for Workspace integrates the Gemini AI assistant directly into Google Workspace applications like Gmail, Google Slides and Google Drive to help the user quickly summarize and create emails and documents.

The HiddenLayer researchers tested various indirect prompt injections across different tools to determine whether they could manipulate Gemini’s output using potentially malicious instructions hidden in emails or shared documents.

Their first test involved injecting instructions into emails sent to the target’s Gmail, which were hidden by setting the font color of the injected text to match the Gmail interface background. The researchers used control tokens <eos> (end of sequence) and <bos> (beginning of sequence) to strengthen the injection, attempting to trick the LLM into believing the injection was part of their system instructions.

When the injected email is sent, and the user asks Gemini to summarize the email, the assistant follows the hidden instructions by, for example, sending the user a poem instead of a summary, the researchers found.

In a proof-of-concept more closely mimicking a malicious phishing attack, the researchers successfully used instructions hidden in an email to get Gemini to tell the user their password was compromised and they needed reset it at www[.]g00gle[.]com/reset. In this case, they also replaced the periods in the URL with similar-looking Arabic unicode to prevent a hyperlink from rendering in the email body.

In Google Slides, the researchers hid their injected instructions in the speaker notes of a presentation slide to get Gemini to generate a message similar to a “Rickroll” instead of a proper summary of the slide. They also noted that Gemini automatically attempts to generate a summary of a slide when the Gemini sidebar is opened, without further user prompting.

Lastly, the researchers showed how Gemini in Google Drive can pull context from any file on the Google account, including shared documents, making it possible for a third party to perform an indirect prompt injection by sharing a file with the target. They successfully performed the “Rickroll” injection, in which an attempt to summarize one document caused Gemini to follow instructions hidden in a separate document in a shared folder.

The HiddenLayer researchers disclosed the Gmail and Slides issues to Google, which classified them as intended behaviors, according to the blog post.

HiddenLayer previously reported on similar vulnerabilities in Gemini that enabled both direct “jailbreaking” and indirect prompt injection via Gemini Advanced Google Workspace extension.