New AnnouncementOpenSRE’s SRE Agent is now Open Source

Changelog

A running log of what ships each day, built by the community, for the community.


Thursday, 2 April

vaibhav upreti, paul, Tan Wee Joe, Vysakh Ramakrishnan, Rohit Rajan

  • 01
    Test

    Synthetic test suite 2x'd — 10 new RDS Postgres scenarios (now 14), restructured into tests/e2e/, added trajectory scoring (measures if the agent investigates in the right order and stops on time) and adversarial Axis 2 tests (forces the agent to reason about red herrings instead of pattern-matching).

  • 02
    Test

    Fixtures hardened — baseline metrics, decoy CloudWatch series, adversarial confounders, and staggered fault onsets injected across all scenarios to match production noise levels.

  • 03
    Feature

    Dual LLM routing — split into reasoning vs tool-call model clients so cheaper models handle routing while full models handle diagnosis.

  • 04
    Refactor

    Codebase flattened — app/agent/* uplifted to app/, demo folder removed, unused code deleted.

  • 05
    Refactor

    Tool refactor — all tool actions renamed and restructured to follow ClaudeCode-style conventions.

  • 06
    Fix

    56 code quality/security fixes — 45 CodeQL warnings + 11 quality findings resolved (silent exceptions, unsafe access, signature mismatches).

  • 07
    Infra

    Release pipeline — installation script, release CI fixes, metrics outpack publishing.

  • 08
    Test

    Test coverage boost — Auth node 0→100%, AWS URL generators 32→100%.

  • 09
    Feature

    Analytics feature shipped — 42 unique users used opensre in the last 8 hours.

  • 10
    Fix

    Model defaults stabilized — replaced hardcoded/non-existent model IDs with stable aliases.


Friday, 3 April

Ceren Camkiran, Ebrahim Sameh, edgarmb14, Kalio, Luke Gimza, Tan Wee Joe, Vaibhav Upreti, davincio

  • 01
    Feature

    Multi-provider LLM support — added OpenRouter, Google Gemini, and NVIDIA NIM alongside existing Anthropic/OpenAI, with automatic provider detection via LLM_PROVIDER env var; chat nodes now respect provider selection instead of hardcoding Anthropic, with provider-keyed caches replacing global sentinels.

  • 02
    Refactor

    Pydantic config rollout — replaced ad-hoc config handling across integrations, tools, state, and CLI with a unified Pydantic-based strict config layer touching ~25 files.

  • 03
    Feature

    Simulation engine overhaul — large refactor of the synthetic test engine (~1,470 additions) covering root-cause diagnosis nodes, DataDog logs tool, and mock Grafana backends, plus expanded per-metric CloudWatch fixture files for the RDS Postgres healthy scenario.

  • 04
    Feature

    Analytics opt-out — analytics provider now detects common CI environments and skips PostHog events and state-file creation in automated pipelines.

  • 05
    Fix

    CLI improvements — better error message for unknown test IDs (hints opensre tests list), simplified JSON output handling in args helper, and lint fixes across CLI modules.

  • 06
    Refactor

    Local Grafana demo moved to wizard CLI — Grafana stack and seeding now live under app/cli/wizard/, Make targets updated, obsolete bundled demo removed, and docs updated to point at make local-grafana-live.

  • 07
    Fix

    Wizard onboarding cleanup — CI space-select logic simplified and the provider prompt is no longer re-asked during opensre onboard.

  • 08
    Infra

    Tooling hygiene — added .editorconfig (4-space Python, 100-char line length), added .ruff_cache/ to .gitignore, removed unused mypy.ini kubernetes stubs, and moved outbound_telemetry files into app/utils/.


Saturday, 4 April

Andrew Van Dyke, Ceren Camkiran, Ebrahim Sameh, Shoaib Ansari, shoaib050326, Vaibhav Upreti, venturevd, vincenthus, Yeoreum Song

  • 01
    Feature

    New opensre update CLI command — hits the GitHub releases API, compares versions, and runs a pip upgrade or prints a re-install command for frozen binaries; supports --check and --yes flags.

  • 02
    Feature

    New opensre health CLI command — reuses the existing integration verification flow to print a compact local health summary including environment and integration store path.

  • 03
    Feature

    MCP server added — a minimal opensre_mcp.py exposes a single run_rca tool wrapping the existing investigation workflow, enabling use from MCP clients such as Copilot and Claude Desktop, with tests covering happy path and malformed input.

  • 04
    Feature

    Google Docs integration shipped — adds a GoogleDocsIntegrationConfig, a full client (create, insert, share, retrieve, validate), and a GoogleDocsCreateReportTool that generates structured postmortem reports with Executive Summary, Root Cause, Timeline, and Remediation sections.

  • 05
    Refactor

    Tool-decorating refactor merged — large rework (~+4927/-2606) touching integrations clients (Coralogix, Honeycomb), wizard flow, integration health, alert templates, and node processing.

  • 06
    Docs

    Mintlify docs moved into repo — docs-mintlify/ directory added (~12k lines) including comparisons, quickstart pages, and CI spellcheck configuration; Vale vocabulary wired up via .vale.ini to eliminate 320 false-positive spellcheck findings.

  • 07
    Fix

    CodeQL quality alerts resolved — consolidated duplicate import / from import statements in jwt_auth.py, flow.py, and a test file, and removed unused JS variables in radar-chart.js and toc-actions.js.


Sunday, 5 April

Ceren Camkiran, Shoaib Ansari, shoaib050326, Shriyash soni, vincenthus

  • 01
    Refactor

    Daily update workflow simplified — Slack delivery removed so the workflow now only generates and commits markdown archives under docs/daily-updates, with telemetry debug redaction broadened to catch sensitive key substrings like oauth_refresh_token.

  • 02
    Infra

    Production Dockerfile shipped — new repo-root Dockerfile uses Python 3.11 slim, installs deps from pyproject.toml, includes langgraph-cli, exposes port 2024, runs as non-root, and adds a HEALTHCHECK against the /ok endpoint; comprehensive structural tests added in app/dockerfile_test.py.

  • 03
    Infra

    MCP server wired up — opensre-mcp CLI entrypoint added to pyproject.toml and setup/OpenClaw configuration documented in docs/SETUP.md.

  • 04
    Fix

    Contributors workflow hardened — retry logic and explicit HTTP/network error handling added to API calls, broad exception swallowing removed from first-commit lookup, and sort order aligned to earliest-first as documented (fixes #212).

  • 05
    Refactor

    TracerClientBase._get typing tightened — input changed to Mapping[str, Any] and return typed as JSONDict; runtime behaviour unchanged and focused unit tests added for URL construction, param passing, and empty-param defaults.

  • 06
    Fix

    Docs links fixed — all opensre.com references replaced with tracer.mintlify.app across README, CONTRIBUTING, SECURITY, and Mintlify config files to unblock access while the custom domain misconfiguration is resolved upstream.

  • 07
    Test

    Test quality improved — MemoryKeyring._entries moved from class-level to instance attribute to prevent state bleed between tests, and client_test.py assertions strengthened with identity checks and raise_for_status verification.


Monday, 6 April

Abhinnavverma, Aniruddha Khandare, Ankit Juneja, Devesh, Ebrahim Sameh, James, Jayant Singh Bisht, qorex, Rohit Rajan, Shoaib Ansari, shoaib050326, vincenthus, Vysakh Ramakrishnan, Yash Kapure

  • 01
    Refactor

    feat(tools): reduce tool creation to a single file (#275).

  • 02
    Docs

    Add guidelines for creating single-file tools (#354).

  • 03
    Feature

    Issue/140 Elasticsearch integration (#343).

  • 04
    Feature

    feat: add OpsGenie integration for alert intake and investigation (#353).

  • 05
    Feature

    feat: Add Vercel integration for deployment monitoring (#351).

  • 06
    Feature

    feat: add Jira integration for incident ticket management (#298).

  • 07
    Feature

    [FEATURE] Add MongoDB integration for database RCA (#348).

  • 08
    Docs

    docs: consolidate MongoDB integration from opensre docs-mintlify (#366).

  • 09
    Feature

    feat: Add Prefect integration for workflow and worker investigation.

  • 10
    Feature

    feat: add opensre onboard local_llm — zero-config Ollama setup (#356).

  • 11
    Feature

    Added the health check FastAPI endpoint for hosted deployments (#340).

  • 12
    Feature

    feat: Created v0.1 script for benchmark generation (#342), with auto-append of results to README after generation (#396).

  • 13
    Feature

    Add opensre version subcommand (#385).

  • 14
    Fix

    fix: stop persisting JWT token in LangGraph state (#395).

  • 15
    Feature

    Enhance UI for terminal command interface (#389).

  • 16
    Feature

    feat: interactive input picker for opensre investigate (#386).

  • 17
    Refactor

    refactor: semantic restructure — services, types, constants, and entrypoints reorganised (#371).

  • 18
    Fix

    fix: resolve all 6 open CodeQL security/quality alerts (#399).

  • 19
    Test

    test: full unit test coverage for all investigation tools (#397).


Tuesday, 7 April

abhishek-marathe04, Ankit Juneja, Matt Van Horn, Raman, Vaibhav Upreti, vincenthus

  • 01
    Feature

    Real-time streaming UI, interactive deploy, and multi-remote support — feat shipped by vincenthus.

  • 02
    Feature

    GitLab integration — added with investigation tools and optional MR commenting (fixes issue #318).

  • 03
    Feature

    Remote LangGraph agent connection — health + trigger support added.

  • 04
    Feature

    Bitbucket integration — added for dev-tools investigation.

  • 05
    Feature

    Kafka integration — added for streaming and consumer-lag RCA.

  • 06
    Feature

    ClickHouse integration — added for OLAP database RCA.

  • 07
    Feature

    SSE streaming — added to remote investigate endpoint.

  • 08
    Refactor

    CLI refactored — chore: refactor cli by Vaibhav Upreti.

  • 09
    Perf

    uv, pytest-xdist, and job consolidation — CI pipeline speed-up cutting build times significantly.

  • 10
    Fix

    Ollama tag matching fix — resolved critical issue causing redundant downloads.

  • 11
    Fix

    Contributors workflow URL typo fixed — fixes issue #212.