Indirect prompt injection. RAG poisoning. Tool-use chains that chain. We deliver reproducible exploitation, a threat model you can act on, and remediation PRs — not a CVSS spreadsheet.
Most “AI red team” reports are a spreadsheet of direct jailbreaks the model family already patched. That isn't the threat model. The threat model is an attacker you never see — contaminating the context your agent reads on its way to a tool call it should never have been allowed to make.
We treat agentic systems the way offensive security treats networks: map the attack surface, chain primitives, reach a meaningful impact. You get a reproducible exploit chain, a threat model, and remediation PRs — not a severity rubric.
They leave instructions inside a document the agent will later read: a Jira ticket, a Markdown KB page, a support email, a scraped web page, a PDF attachment. Unicode tag characters, HTML comments, zero-width joiners — whatever your sanitizer does not strip.
Embedders are content-agnostic. They happily encode an attacker's instruction with high cosine similarity to whatever query the attacker anticipated. We measure retrieval rank across a test query set and show you exactly which queries pull the poison.
Without structured prompt fencing (signed system blocks, content-role separation), the model cannot distinguish “your instructions” from “content it retrieved.” The attacker's payload is now indistinguishable from the developer's intent.
The model invokes a tool — but the arguments come from an attacker. MCP servers with over-broad scopes, agents with no human-in-the-loop on state-changing calls, and missing outbound allowlists turn a language problem into an action problem.
The attacker does not need to exfiltrate from your network — your agent does it for them, over legitimate SMTP, with your SPF signature. We catalog every outbound channel the agent can reach and show which are auditable.
Not a CVSS spreadsheet. A report you can hand to engineering and ship fixes from on Monday morning.
Data flow, trust boundaries, tool surface, context sources. Maps every way untrusted content reaches the model and every tool the model can reach from there.
Each finding is a replayable chain: ingestion vector, retrieval proof, context trace, tool call, impact. No “theoretical” findings.
Systematic evaluation across OWASP LLM01 variants, unicode tag injection, multi-turn context smuggling, and cross-domain pivoting. Coverage percentages, not vibes.
Prompt fencing, output guards, tool scope restriction, outbound allowlists, MCP gateway policy. Merged to main during the engagement where feasible.
Your exploit chains become eval cases. Run in CI on every model change. When a future update reintroduces the bug, you know the same day.
30-minute readout for leadership. No jargon, one chart, three decisions needed. Separate technical walkthrough for the engineering team.
Architecture walkthrough, data flow mapping, tool surface enumeration. We freeze the scope to one application and one bounded dataset.
Sanitizer bypasses, embedding abuse, context smuggling, tool misuse, MCP scope confusion. We catalog what works on your specific system, not what worked on someone else's blog post.
Primitives are wired into reproducible end-to-end chains that reach a real impact: exfiltration, unauthorized action, privilege escalation, or policy violation.
Written report, executive briefing, regression harness. Remediation pairing with your team to close the top findings before we leave.
Excerpt from a real engagement report, anonymized. This is the level of detail every finding ships at — no black boxes, no “trust us.”
# engagement: customer-support-rag / scope: prod · severity: HIGH
# finding R-01 · indirect prompt injection via KB ingestion
stage_1_inject:
vector: markdown file submitted via /kb/upload
payload: hidden instruction in footnote · unicode tag U+E0041..
sanitizer: BYPASSED — regex strips <script>, not tags
stage_2_retrieve:
embedder: text-embedding-3-large
chunk: 512 tokens, overlap 64
poisoned_chunk_rank: top-3 for 71% of support queries
stage_3_contaminate:
context_window_share: 18% (poisoned / total)
llm: claude-3.5-sonnet · temperature 0.2
system_prompt_integrity: LOST — no structured prompt fencing
stage_4_execute:
tool_calls: [search_tickets, get_customer, send_email]
mcp_permission: scope: "customer:read" — insufficient isolation
observed_behavior: called send_email with attacker-controlled body
stage_5_exfiltrate:
channel: outbound email to attacker domain
payload: last 20 ticket bodies, base64-encoded in signature block
detection_lag: 11 days until flagged by SIEM
remediation:
- reject non-ASCII tag-characters at upload
- prompt fencing with signed system block + content_tags
- drop send_email from agent toolset · route through human approval
- outbound-email allowlist on MCP gatewaySupport copilots, sales agents, research agents, RAG systems. Anything with untrusted content flowing into a model with tool access.
Your app is shipping a GenAI feature. Appsec is asking questions they've never asked before. We speak both languages.
Your devs run Claude Code / Cursor / Devin against private repos. You want to know what happens when a crafted PR or README exploits the agent.
Evals measure model capability. We measure your system's attack surface. A perfectly aligned model still enables exfiltration if the agent wrapping it has over-broad tool scopes and no fencing.
Preferred: a staging environment with prod-like data. Required: read access to the architecture, prompt templates, tool definitions, and retrieval corpus. We operate under a rules-of-engagement document signed before kickoff.
We don't try to. Model jailbreaks are a race you cannot win. Our job is to find the system-level failure modes that remain regardless of which model family you use — indirect injection, context smuggling, tool misuse, retrieval abuse, MCP scope confusion.
Two-week focused red team of one application, one dataset, one tool surface. Larger engagements scope per additional surface. We never run multi-month engagements — attention decays, quality drops.
No. We run in staging or a dedicated instance with synthetic data. Exploitation is reproduced; not operationalized. Rules of engagement include explicit no-touch lists.
2-week engagements, fixed price, written report. Scoping call is free and takes 30 minutes.
Request a red team →