Fighting the Unfixable: The State of Prompt Injection Defense
·2193 words·11 mins
Prompt injection is architecturally unfixable in current LLMs, but defense-in-depth works. Training-time defenses like Instruction Hierarchy, inference-time techniques like Spotlighting, and architectural isolation create practical systems. Microsoft’s LLMail-Inject showed thatadaptive attacks succeed at 32% against single defenses, 0% against layered approaches. Real failures like GitHub Actions compromise prove that securing obvious surfaces isn’t enough. Like SQL injection, it’s manageable with layering.