Overview
opensre investigate --service <name> kicks off a runtime investigation for a deployed service. Instead of passing an alert payload, OpenSRE gathers live signals from the service (deployment status, recent logs, health probe) and feeds them into the existing investigation pipeline as evidence.
Prerequisites
You must have deployed a service viaopensre deploy (or registered a named remote) and configured a remote ops provider.
-
Deploy or register a named remote:
-
Configure the remote ops provider (once):
Usage
- Resolve
<name>against your named-remote registry - Fetch deployment status via the configured ops provider (e.g. Railway)
- Fetch the most recent ~100 log lines
- Probe the service’s
/healthor/okendpoint - Package all of this into an alert payload
- Run the standard RCA pipeline against it
opensre investigate -i <alert-file>.
Incorporating Slack thread context
Pass--slack-thread CHANNEL/TS to also pull the messages from a specific Slack thread as investigation context. This is useful when an incident originated in a Slack conversation.
SLACK_BOT_TOKENmust be set in the environment. The bot must have thechannels:historyandgroups:historyOAuth scopes for the channel you’re reading.- The
CHANNEL/TSreference can be obtained from Slack’s “Copy link to message” option — it’s the last two path segments of the link.
conversations.replies API and included under the slack_thread key in the alert payload. If fetching fails (bad token, wrong channel, network error), the investigation still proceeds with the error recorded in the payload.
Mutual exclusion
--service cannot be combined with --input, --input-json, --interactive, or --print-template. Use --service on its own.
Extending to other providers
TheRemoteOpsProvider abstract class (in app/remote/ops.py) defines the provider interface. To add support for another provider (EC2, ECS, Vercel, etc.), implement a new subclass with status(), logs(), fetch_logs(), and restart() methods, then register it in resolve_remote_ops_provider().
Known limitations
- Currently supports only Railway — other providers have
status/logshooks but nofetch_logsimplementation yet. - Slack context is thread-scoped — this initial version pulls a specific thread via
--slack-thread. It does not search Slack history or resolve linked runbooks. alert_sourceis re-inferred by the LLM — the LLM in the extract-alert step may infer analert_sourcefrom the log text (e.g. “datadog” if the logs mention Datadog), which routes to provider-specific tools. This is the intended behavior.