Skip to main content

Investigation pipeline architecture

Contributor guide to how a single investigation runs end-to-end: the six-stage pipeline, the ReAct evidence-gathering loop, and the guardrails that keep it bounded. Companion to investigation-tool-calling.md, which covers tool schema / LLM invoke payload mechanics specifically — this doc covers the pipeline and loop control flow around that.

Where code lives

ConcernLocation
Stage orderingtools/investigation/lifecycle.py (run_connected_investigation)
Public runner entrypointtools/investigation/capability.py
Integration discoverytools/investigation/stages/resolve_integrations/node.py
Alert classification/extractiontools/investigation/stages/intake/node.py
Pre-loop tool planningtools/investigation/stages/plan_evidence/node.py, core/domain/alerts/tool_planning.py
ReAct loop (the agent)tools/investigation/stages/gather_evidence/{agent,loop,tools,prompt}.py
Diagnosis parsingtools/investigation/stages/diagnose/node.py, core/domain/diagnosis
Report deliverytools/investigation/reporting/
Shared state contractcore/context/state/ (AgentState, InvestigationState, EvidenceEntry)
Context budget enforcementcore/context_budget.py

Pipeline overview

Each stage is a pure function — (state) -> dict of updates, merged into a shared AgentState via apply_state_updates. A stage exception is reported to Sentry and then re-raised; the pipeline never silently swallows a failure.

Stage by stage

1. resolve_integrations — what tools exist

Looks up which vendor integrations (Datadog, Grafana, EKS, …) this org has connected and credentialed. Not alert-specific — establishes the universe of tools everything downstream can draw from.

2. extract_alert — is this worth investigating

One LLM call classifies the raw alert: noise (chat, greetings, replies in an existing thread) short-circuits the pipeline immediately with no tools run. A real alert gets structured fields extracted — alert_name, severity, alert_source, namespace, error message — plus a computed incident_window.

3. plan_actions — what to check first

Scores every available tool against the alert (score_tools, source match + tool metadata) and keeps the top tool_budget (default 10) as planned_actions, with a written rationale. Advisory: if nothing scores confidently, the loop falls back to its own relevance ranking instead of an empty plan.

4. The ReAct loop — the core evidence-gathering agent

ConnectedInvestigationAgent.run() in tools/investigation/stages/gather_evidence/agent.py. Before the model’s first turn: the tool set is narrowed to a hard cap (select_investigation_tools, MAX_AGENT_TOOL_SCHEMAS = 32) using the plan from stage 3 if present, otherwise alert-source relevance ranking. A handful of “obviously needed” tools may fire as deterministic seed calls before the LLM gets a turn at all, so the loop starts with free evidence already in hand. Guardrails inside the loop:
  • Duplicate detection (InvestigationToolCallCache) — identical tool name + args is served from cache instead of re-executed, and the LLM is told explicitly it already has that result.
  • Stagnation breaker — any iteration where every tool call was a replayed duplicate (no fresh evidence) appends a nudge telling the model to stop repeating itself and try something different. After MAX_STAGNANT_ITERATIONS = 2 such iterations in a row (two nudges), tool access is stripped on the next turn to force a text-only conclusion rather than burning the rest of the loop budget.
  • CLI-backed models (Codex, Claude Code CLI) use a subclass, CLIBackedInvestigationAgent, that overrides conclusion acceptance to refuse an early stop until every planned tool has been called — these models tend to write a final answer as soon as they see some results.
  • LLM invoke failures degrade to a partial “investigation failed” state (degraded_investigation_from_llm_failure) instead of crashing, preserving whatever evidence was already gathered.

5. diagnose — structure the conclusion

The loop’s final free-text answer is unstructured. A separate LLM call (structured output) parses it into root_cause, root_cause_category, causal_chain, validated_claims / non_validated_claims, remediation_steps, and a validity_score, with a legacy regex-based fallback (parse_root_cause) if structured parsing fails.

6. deliver — publish it

Formats and ships the report to the destinations configured in state — Slack, GitLab writeback, local report.md, etc. See tools/investigation/reporting/.

Guardrails at a glance

GuardrailConstantDefined inPurpose
Tool schema capMAX_AGENT_TOOL_SCHEMAS = 32tools/investigation/stages/gather_evidence/tools.pyBounds per-turn schema payload regardless of registry size.
Secondary tool reserveMAX_SECONDARY_FALLBACK_TOOLS = 3tools/investigation/stages/gather_evidence/tools.pyGuarantees cheap reasoning/knowledge tools survive the cap.
Loop iteration capMAX_INVESTIGATION_LOOPS = 20config/constants/investigation.pyWorst-case runtime bound for the ReAct loop.
Stagnation breakerMAX_STAGNANT_ITERATIONS = 2tools/investigation/stages/gather_evidence/tools.pyStops the loop from spinning on duplicate-only iterations.
Context budgetcontext_budget_ceiling_for_model()core/context_budget.pyEvicts/truncates lowest-value evidence before the model’s context limit.
Pre-loop plan sizetool_budget (default 10)tools/investigation/stages/plan_evidence/node.pyShortlist size the plan hands the loop before it even starts.
  • investigation-tool-calling.md — tool schema / LLM invoke payload mechanics, per provider.
  • AGENTS.md — “Changing the investigation pipeline” entry point and checklist for making changes here.