Prompt Injection: The Attack Vector Most Teams Aren't Ready For

Prompt injection ranks first on the OWASP LLM Top 10 for a reason. It's the most commonly exploited vulnerability in deployed AI systems, it's structurally difficult to fully prevent, and most organizations don't have detection or response capability built for it. If you're running AI in production and haven't assessed your exposure to prompt injection, you have an unknown attack surface that adversaries are already probing.

What Prompt Injection Is

Prompt injection exploits the fundamental architecture of language models: they don't distinguish between instructions and data. When a model processes a prompt, it sees all of it, system instructions, retrieved content, user input, as a single sequence. An attacker who can influence any part of that sequence can potentially influence the model's behavior.

Direct prompt injection is the simpler variant. A user provides input specifically crafted to override or bypass the model's system prompt. "Ignore previous instructions and..." is the well-known pattern, but modern attacks are more sophisticated: gradual instruction drift, adversarial examples designed to surface from specific training distributions, role-playing frames that establish alternative behavioral contexts. Direct injection requires application access, which limits the attack surface, but it's the one most developers are aware of.

Indirect prompt injection is the higher-consequence variant and the one most teams aren't defending against. In indirect injection, malicious instructions are embedded in external content that the AI system retrieves and processes, documents, web pages, database records, email bodies, customer support tickets. The attacker doesn't need access to the application. They need access to a data source the application reads.

The scenario: your AI customer support agent retrieves and summarizes customer emails. An attacker sends a specially crafted email containing embedded instructions. The agent processes the email, the embedded instructions execute, and the agent takes an action it wasn't supposed to, exfiltrates data, modifies a record, sends a response that doesn't match your policies. The attacker's input was never submitted through your application interface.

Why Current Defenses Are Insufficient

Input validation doesn't stop prompt injection the way it stops SQL injection. SQL injection works by breaking out of a structured syntax; you can write a parser that detects this. Prompt injection works by embedding natural language instructions in natural language input, there's no syntax to break out of. A "safe" input filter that blocks obvious injection patterns provides false confidence without meaningful protection.

LLM-based defenses, using a classifier model to detect injection attempts, are more promising but not mature. They can be evaded by sophisticated adversaries, introduce latency, and are expensive at scale. They're also better at detecting known patterns than novel attacks.

Sandboxing, restricting what the model can do, is the most reliable mitigation available. If the model can't take high-consequence actions autonomously, the blast radius of a successful injection is limited. This is why the OWASP guidance on LLM06 (Excessive Agency) is so closely related to LLM01 (Prompt Injection): the severity of an injection attack is directly proportional to the capabilities the model has been granted.

What Regulated Environments Should Do

Map your AI data flows. Before you can defend against indirect injection, you need to know where your AI systems are retrieving external content. For every AI system in production: what data sources does it read? What content does it process that originates from untrusted parties? Customer inputs, external documents, third-party data feeds, and web content are all potential injection vectors. This mapping exercise often reveals system behaviors that the people who approved the system didn't fully understand.

Apply capability minimization. Every AI system should have the minimum set of actions required for its function. An AI that summarizes documents doesn't need to send email. An AI that answers customer questions doesn't need to modify account records. Design for least privilege from the start, not as an afterthought when something goes wrong. Document what each system is permitted to do and why, so that additions require explicit approval.

Require human confirmation for consequential actions. For any action with real-world consequences, sending a communication, modifying a record, initiating a transaction, accessing sensitive data, require explicit human confirmation rather than autonomous model execution. This design pattern significantly reduces the damage a successful injection can cause, because the injected instructions can't complete the consequential action without human involvement.

Build detection, not just prevention. No prevention control is complete. Build logging and monitoring for model outputs that deviate from expected patterns, responses that include unusual instructions, data that appears inconsistent with the user's legitimate request, actions taken outside of normal parameters. This is immature territory, most vendors don't offer good tooling for it yet, but the absence of off-the-shelf solutions doesn't mean you can skip detection. Custom monitoring on high-risk AI systems is the current state of the art.

Test your systems before adversaries do. Red-teaming AI systems specifically for prompt injection is a distinct discipline from traditional penetration testing. It requires people who understand the attack mechanics and can design injection payloads for your specific architecture and data flows. This testing should happen before production deployment and on a regular cadence, not once, because the attack surface evolves as your system evolves.

"The organizations that will be caught off guard by prompt injection attacks aren't the ones who didn't know about it. They're the ones who knew, acknowledged it was a risk, and decided detection could wait."

The Regulatory Angle

For regulated industries, prompt injection isn't just a technical risk. It's a control failure. If an AI system processing regulated data, PHI, PII, financial data, regulated manufacturing records, can be manipulated to exfiltrate or corrupt that data through prompt injection, you have a gap that regulators will view as a fundamental control failure, not a technical incident.

The question your CISO needs to be able to answer: for every AI system that touches regulated data, has prompt injection been assessed, and what controls are in place? "We didn't think anyone would try that" is not an answer that survives regulatory scrutiny. The vector is documented, publicly known, and actively exploited. Your control environment needs to account for it.

What Prompt Injection Is

Why Current Defenses Are Insufficient

What Regulated Environments Should Do

The Regulatory Angle

Questions about what this means for your organization?