Closed-Loop Learning
OpenSRE captures accuracy feedback after every investigation. When you mark a result as partial or inaccurate, it is classified into a triage taxonomy and recorded as a miss. Theopensre misses command surface lets you review trends, track recurrence, and convert top misses into reproducible benchmark scenarios — closing the loop from production usage back into the eval suite.
Quick reference
| Command | What it does |
|---|---|
opensre misses list | Show recent misses with alert, taxonomy, rating, and root cause. |
opensre misses stats | Taxonomy breakdown plus recurring (alert, taxonomy) pairs. |
opensre misses export --out PATH | Write per-case alert.json files the benchmark runner can consume. |
opensre misses convert MISS_ID | Convert a single miss into a scenario payload (stdout or --out FILE). |
How a miss is captured
After every investigation the CLI shows the accuracy prompt. If you pick partial or inaccurate you’ll be asked for a short note and a taxonomy bucket:- Retrieval gap — the agent did not fetch the evidence it needed.
- Reasoning gap — it had the evidence but drew the wrong conclusion.
- Tool failure — a tool errored, timed out, or returned bad data.
- Routing/prompt failure — the wrong tools or plan were selected.
- Unknown — choose this only when none of the above clearly fit.
~/.opensre/misses.jsonl and an investigation_miss_classified event is emitted to PostHog with the run provenance, taxonomy, and (when available) user_id / org_id. The original feedback record in ~/.opensre/feedback.jsonl is untouched.
Reviewing trends
stats reports the count per taxonomy and the recurring (alert_name, taxonomy) pairs (seen more than once). Recurring pairs are the strongest signal that a regression scenario is overdue.
Converting misses to regressions
opensre misses export writes one scenario per recurring (alert, taxonomy) pair, ordered by how often it has recurred. The output mirrors the existing tests/benchmarks/openrca_scenarios/*/alert.json shape, so the benchmark runner consumes it without any adapter changes:
alert.json whose commonAnnotations.scoring_points dict (expected_root_cause, expected_category, miss_notes) carries the rubric for grading — the same location opensre investigate --evaluate already reads from, and the same one strip_scoring_points_from_alert removes before the agent sees the alert. The _meta block carries non-rubric provenance (miss_id, original_run_id, taxonomy). Commit the directory under tests/benchmarks/ and the next benchmark run will include the new regressions.
Weekly triage workflow
| Step | Owner | SLA |
|---|---|---|
Run opensre misses stats --since 7d and review top recurring pairs | On-call engineer | Monday morning |
Run opensre misses export --since 7d --top 10 --out tests/benchmarks/production_misses/ | On-call engineer | Monday |
Open a PR adding the new scenarios with a benchmark label | On-call engineer | Tuesday |
| Run the benchmark workflow against the PR branch | Reviewer | Wednesday |
Track fix-rate week-over-week using PostHog investigation_miss_classified trends | Eng lead | Ongoing |
investigation_miss_classified (grouped by taxonomy and alert_name) provide the week-over-week trend view referenced by the SLAs.
Privacy
Miss records live entirely on the engineer’s machine in~/.opensre/misses.jsonl. To delete everything captured locally, remove the file.
The investigation_miss_classified PostHog event carries identifiers and structured metadata only:
miss_id,feedback_id,run_idtaxonomy,rating,has_detail(boolean — whether a note was provided, never the note itself)alert_name,pipeline_name,root_cause_category- Optional
user_id,org_idwhen running on a hosted/JWT path
taxonomy_detail) and the captured root_cause string are never sent to PostHog — they only exist in the local JSONL store, so removing ~/.opensre/misses.jsonl removes them entirely.
Tracer