Defending AI Agents Against Persuasion Attacks — Google ADK Callback Guardrails
How three independent callback layers stop the "treasure hunt" credential exfiltration technique — and where they still fall short. Prompt injection is the AI vulnerability everyone talks about. But there's a related attack that gets less attention: persuasion. No hidden text, no encoding tricks. Just a story that convinces the agent to hand over your credentials. I built a demo of this using Google ADK , using Module 7 of the Google ADK hands-on course as the template. The full project — agents, demo files, and experimental branches — is on GitHub at andyrat33/AI-Persuasion . The post covers the attack end-to-end, three layers of defence, and two real bypasses. All prompts are included so you can run through it without needing to dig into the repo. The attack: redefining what credentials mean Standard prompt injection sneaks instructions into the AI's input: hidden text in a PDF, instructions embedded in a web page the agent browses. Persuasion attacks don...