
Google’s Gemini chatbot is vulnerable to a prompt-injection exploit that could trick users into falling for phishing scams, without them ever seeing it coming.
The flaw allows attackers to embed hidden instructions in seemingly benign emails. When a user clicks Summarize This Email using Gemini for Google Workspace, the chatbot can be manipulated into generating fake security alerts, prompting victims to click malicious links or call scam phone numbers.
According to the anonymous researcher who originally discovered and reported the vulnerability, the technique “involves clever and unorthodox tactics designed to deceive the model, often requiring an understanding of its operational mechanics to achieve desired outcomes.”
Understanding the prompt-injection flaw
Since the malicious email doesn’t include any attachments, it’s not always viewed as a red flag—either by users or their SPAM filters. Moreover, since it exploits HTML and CSS code, it’s easily hidden within the body of the email itself. Once it’s been embedded, Gemini for Google Workspace will process it just like any other set of instructions.
“Because the injected text is rendered in white-on-white (or otherwise hidden), the victim never sees the instruction in the original message, only the fabricated ‘security alert’ in the AI generated summary,” Marco Figueroa, researcher with 0DIN, said.
It’s important to note that neither Google, the anonymous researcher, nor the team at 0DIN has seen any verified reports of this happening to any Gemini users; however, this specific prompt-injection attack was demonstrated as a proof of concept by 0DIN’s researchers.
Exploring Google’s layered security
Google has gone to great lengths to secure its Gemini platform. Some of these security controls include:
- Cataloging vulnerabilities and adversarial data within the current line of generative AI platforms.
- Introducing models that can detect hidden or malicious prompts while classifying them with greater accuracy.
- Using hardcoded reminders to ensure their large language models (LLMs) perform user-directed tasks and ignore any requests that could be deemed harmful or malicious.
- Identifying suspicious URLs and blocking the rendering of images from external URLs.
- Requesting more information or confirmation directly from the user in some instances, also known as “human-in-the-loop.”
- Providing notifications to users, with contextual information whenever Gemini’s internal controls mitigate a security issue.
Dark Reading reported that some of these safeguards have yet to be fully implemented. Google has also confirmed that it will be introducing additional safeguards for Gemini in the coming months.
Protecting the integrity of Gemini
Even though this particular flaw hasn’t been exploited yet, AI developers need to be aware that their tools could be used as delivery mechanisms by cunning hackers and other malicious actors. This prompt-injection method is specific to Gemini for Google Workspace, but it’s easy to see how a hacker could apply similar techniques to other AI platforms, such as ChatGPT and Grok.