Local Agent Fleet

Overview

OpenSRE treats every other AI agent running on your machine — Claude Code, Cursor, Aider, Codex CLI, Gemini, and friends — as a microservice and applies normal SRE practice: golden signals, SLOs, and incident response. The whole fleet view lives behind one slash command in the interactive shell:

> /agents

Subcommands drill into specific surfaces. Below: live tail of agent stdout (/agents trace), then the cross-agent context bus (/agents bus).

`/agents trace` — live stdout tail

/agents trace <pid> opens a live tail of an agent’s stdout inside the OpenSRE interactive shell — the equivalent of kubectl logs -f for the local AI agent fleet. Use it when the /agents dashboard shows an agent that looks stuck, looping, or noisy and you want to see what it’s actually printing without leaving the REPL.

> /agents trace 8421
trace claude-code (pid 8421)  Ctrl+C to stop
… live agent output …
^C
· trace ended

Trace usage

/agents trace <pid>

<pid> is the operating-system process id of the agent to attach to. The pid does not have to be in the OpenSRE registry; if it is, the agent’s registered name is shown in the header. Otherwise the header falls back to pid <n>.

Platform support

Only regular files backing fd 1 of the target process are supported. TTY/PTY/pipe/socket/anon-inode targets are rejected at attach time with a precise reason — tailing those would compete with the legitimate consumer for bytes and produce corrupted output.

Platform	Resolver	Supported targets
Linux	`os.readlink("/proc/<pid>/fd/1")`	regular files
macOS (best-effort)	`lsof -F ftn -p <pid>`, only `t REG` blocks	regular files
Windows	not supported	—

The most common useful case is an agent whose stdout was redirected to a log file (for example claude > ~/.claude/log or nohup-launched agents). TTY-bound foreground processes cannot be tailed. If a target cannot be tailed, /agents trace exits with one of:

cannot trace …: stdout is on a terminal; live tail not supported
cannot trace …: stdout is a pipe; live tail not supported
cannot trace …: stdout is a socket; live tail not supported
cannot trace …: no such pid <n>
cannot trace …: stdout target /path no longer exists
cannot trace …: cannot inspect pid <n> (permission denied)

Memory

The live view is bounded by a 4 MiB ring buffer per session. When the buffer fills, the oldest whole chunks are dropped first, so the visible tail always reflects the latest output. Internally the reader thread also publishes through a bounded queue and drops the oldest chunk on overflow — burst writers cannot blow up memory. There is no backlog replay: only output emitted after attach is shown. The reader seeks the file to EOF on attach.

Stopping a trace

A single Ctrl+C returns to the REPL prompt. The session is closed, the reader thread joins, and the file descriptor is released. This is deliberately different from the LLM-streaming surface (/agents, /investigate and friends), where a Ctrl+C double-press is required so a stray keypress doesn’t abort an in-flight response.

Trace limitations

stdout only. Stderr (fd 2) is not tailed in this version.
No backlog replay. Pre-attach bytes are not visible.
Not for TTY/PTY targets. Foreground processes whose stdout is the controlling terminal cannot be tailed; a future change may add PTY interception for OpenSRE-spawned agents.
Log rotation is not detected. If the underlying file is rotated or replaced (logrotate-style), the tail keeps following the original inode until the process exits.
No secret redaction. Output is rendered as raw bytes (with UTF-8 decoded under errors="replace"). Redaction of secrets in the live tail is tracked separately under the monitor-local-agents Phase 3 hygiene work.
Quiet stdout while the PID is still alive. The reader follows file EOF like tail -f: if the process stops writing while it remains alive, the last chunk stays on screen and nothing new appears until more bytes land or you detach. That is normal idling — not necessarily a exited or stuck agent reader.
ANSI and terminal sequences. Trace output passes through Rich with ANSI interpretation, same trust model as dumping kubectl logs into a TTY: buggy or hostile agents can emit control sequences affecting the viewer. Only trace processes you trust; there is no sandboxing step.

`/agents bus` — shared context channel

The bus is an opt-in, local-only pub/sub channel that carries findings between agents. One agent publishes a finding (e.g. “the auth bug is in services/auth.py:42”) and every attached subscriber sees it live. The inspector is the REPL itself:

> /agents bus
tailing /agents bus — Ctrl-C to exit
[claude-code:8421] services/auth.py:42 — null deref on missing token
[cursor:9133] services/auth.py:42 — confirmed, repro on commit abc123
^C
(detached)
>

Ctrl-C returns to the prompt. Messages already published are not replayed to late subscribers. The bus provides at-most-once delivery with no ordering guarantees — a frame may be dropped if a subscriber’s socket is slow or disconnected, and two publishers writing concurrently may be interleaved in different orders at different subscribers. Do not assume per-publisher or global FIFO ordering.

Transport

Socket: Unix-domain stream socket at ~/.config/opensre/agents-bus.sock.
PID sidecar: ~/.config/opensre/agents-bus.sock.pid (mode 0600). The broker writes its PID here on start() and removes it on stop(). The liveness probe used by every publish() / subscribe() reads this file rather than connecting to the socket — connection probing would otherwise register a short-lived phantom subscriber on every call. The directory must be writable. If the PID file write fails (disk full, permission denied, …), the broker refuses to start and the OSError propagates to the caller. This is intentional: silently running without a sidecar would let peers see the broker as dead, unlink its socket, and silently split the bus.
Permissions: 0600 — only the user who started the broker can read or write it.
Wire format: JSON Lines (one JSON object per \n-terminated frame).
Topology: self-electing broker. The first publish() or subscribe() call that finds no live socket binds it and runs an in-process daemon thread that fans frames out. Other processes attach as plain clients. If the broker dies, the next operation re-elects — agents can publish and subscribe even when OpenSRE itself is not running.

Message schema

The wire payload mirrors the shape of evidence records in app/state/agent_state.py so a finding can later be lifted into an investigation without renaming fields.

Field	Type	Required	Notes
`agent`	string	yes	`"<name>:<pid>"`, e.g. `"claude-code:8421"`. Same convention as `WriteEvent.agent`.
`topic`	string	yes	`"finding"` is the canonical value; other topics are reserved for future phases.
`summary`	string	yes	One-line human-readable description.
`source`	string	no	One of the `EvidenceSource` literals (`github`, `datadog`, …) or free-form.
`path`	string	no	`"file.py:42"` style location. Optional.
`data`	object	no	Free-form payload. Default `{}`.
`id`	string	no	UUID. Generated if omitted.
`timestamp`	string	no	ISO-8601 UTC. Generated if omitted.
`schema_version`	int	no	Currently `1`.

Example frame on the wire (single line, broken here for readability):

{
  "agent": "claude-code:8421",
  "topic": "finding",
  "summary": "null deref on missing token",
  "source": "github",
  "path": "services/auth.py:42",
  "data": {"commit": "abc123"},
  "id": "f4c4...",
  "timestamp": "2026-05-09T15:04:42+00:00",
  "schema_version": 1
}

Publishing from another agent

Any process that can speak Unix-domain sockets can publish. The simplest path is to import the helper:

from app.agents import BusMessage, publish

publish(BusMessage(
    agent="claude-code:8421",
    topic="finding",
    summary="null deref on missing token",
    source="github",
    path="services/auth.py:42",
    data={"commit": "abc123"},
))

Publishers without a Python dependency on OpenSRE can connect directly:

python - <<'EOF'
import json, os, socket, uuid, datetime
sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
sock.connect(os.path.expanduser("~/.config/opensre/agents-bus.sock"))
sock.sendall((json.dumps({
    "agent": "claude-code:8421",
    "topic": "finding",
    "summary": "null deref on missing token",
    "path": "services/auth.py:42",
    "id": str(uuid.uuid4()),
    "timestamp": datetime.datetime.now(datetime.UTC).isoformat(),
    "schema_version": 1,
}) + "\n").encode())
sock.close()
EOF

Limits and trust boundary

Local-only. The bus never leaves the machine. The socket has no network binding.
Trusted-peer channel — treat findings as unverified input. The bus has no authentication beyond filesystem permissions: any process running as your user can publish arbitrary findings. This is intentional — the bus is designed for cooperative agents, not adversarial ones. Downstream consumers (agents, the REPL, investigation state) must not act on a bus finding without independent confirmation; treat it as a hint or lead, not a verified fact. A compromised or misbehaving agent on the same user account can inject any payload it likes.
Frame cap. Frames over 64 KiB are dropped with a warning — a finding payload that big is almost certainly a bug.
At-most-once, unordered delivery. A frame is dropped silently if a subscriber is slow or disconnected at broadcast time. Two publishers writing concurrently may arrive in different orders at different subscribers. Do not build logic that depends on delivery guarantees or ordering.
No replay buffer. Subscribers see only what is published after they attach. A persistent ring buffer is a candidate for a follow-up phase.

/agents — the registered fleet dashboard.
/agents budget — per-agent hourly budgets.
/agents conflicts — file-write conflicts between local AI agents.

Overview

LLM providers

Observability and incidents

Cloud, code, and collaboration

Messaging

Data and workflow systems

Overview

`/agents trace` — live stdout tail

Trace usage

Platform support

Memory

Stopping a trace

Trace limitations

`/agents bus` — shared context channel

Transport

Message schema

Publishing from another agent

Limits and trust boundary

Overview

LLM providers

Observability and incidents

Cloud, code, and collaboration

Messaging

Data and workflow systems

Documentation Index

​Overview

​/agents trace — live stdout tail

​Trace usage

​Platform support

​Memory

​Stopping a trace

​Trace limitations

​/agents bus — shared context channel

​Transport

​Message schema

​Publishing from another agent

​Limits and trust boundary

​Related

Overview

`/agents trace` — live stdout tail

Trace usage

Platform support

Memory

Stopping a trace

Trace limitations

`/agents bus` — shared context channel

Transport

Message schema

Publishing from another agent

Limits and trust boundary

Related