letting LLMs hold the shell

2026-04-21 llmoffsecseval

The first time I let an LLM run nmap unsupervised, it found a misconfigured SMB share I’d missed during manual enumeration. It also ran rm -rf /tmp/seval_* to “clean up after itself” without asking. Both of these things happened in the same session.

This is the core tension in agentic offensive tooling: the model is simultaneously better and worse than you at the job. Better because it doesn’t get bored scanning 500 ports. Worse because it has no intuition for when it’s about to do something irreversible.

the guardrail problem

Most agent frameworks solve this with allowlists. You pre-approve commands, the agent picks from the menu. This is safe and also useless for offensive work — half the value is the model improvising a one-liner you wouldn’t have thought of.

seval takes a different approach: block the shapes of dangerous commands (shell metachar injection, filesystem destruction, shutdown sequences) but leave the offensive surface wide open. If sqlmap wants to dump a database, that’s the point. If it wants to rm -rf /, that’s not.

# blocked patterns (regex)
rm\s+(-[rRf]+\s+)?/
shutdown|reboot|halt|poweroff
;\s*rm|&&\s*rm|\|\|\s*rm

what actually works

After a few hundred hours of seval sessions, the patterns that produce good results:

Constrain the goal, not the tooling. “Find SQL injection in this app” is better than “run sqlmap with these flags.” The model picks better flags than I would half the time.
Let it fail fast. Timeouts on every tool call. If hydra hangs for 60 seconds, kill it and let the model try something else.
Persistent memory matters more than context window. The model forgets what it tried 20 messages ago. SQLite with FTS5 means it can search its own history.

what’s next

eugene takes this further — fully autonomous, no human in the loop at all. Run it on a Pi, point it at a network, come back later. That’s a different post.