Skip to main content

Documentation Index

Fetch the complete documentation index at: https://opensre.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

OpenSRE uses the Dagster GraphQL API to investigate data-pipeline incidents, fetching recent runs and their status, the full event log and root-cause exception for a failed run, asset materialization history, and sensor or schedule tick history. Works against both Dagster OSS (dagster dev and self-hosted dagster-webserver) and Dagster+ (the SaaS).

Prerequisites

  • A reachable dagster-webserver instance:
    • Dagster OSS: run dagster dev -f jobs.py locally or deploy dagster-webserver to your infra. Default port 3000.
    • Dagster+: an active deployment, e.g. https://<org>.dagster.cloud/<deployment> or https://<org>.<region>.dagster.cloud/<deployment>.
  • Network access from the OpenSRE environment to the webserver
  • For Dagster+: a User Token generated under Organization Settings → Tokens → User Tokens (not an Agent Token; Agent Tokens authenticate Hybrid agents and are rejected by the GraphQL endpoint)

Setup

Option 1: Onboarding wizard

opensre onboard
Pick Dagster from the integration menu. The wizard asks for:
  • Dagster webserver URLhttp://localhost:3000 for OSS local dev, or https://<deployment>.dagster.cloud/<env> for Dagster+ (the client appends /graphql itself, so either form is fine)
  • Dagster API token — required for Dagster+; leave blank for unauthenticated OSS
The wizard validates the endpoint with a GraphQL version probe before saving, writes DAGSTER_ENDPOINT to your .env, and persists the API token (when provided) to your system keychain.

Option 2: Legacy CLI

opensre integrations setup dagster
You will be prompted for the GraphQL endpoint and (optional) API token.

Option 3: Manual configuration

Add to your .env:
DAGSTER_ENDPOINT=https://your-org.dagster.cloud/prod
DAGSTER_API_TOKEN=...
VariableDefaultDescription
DAGSTER_ENDPOINTRequired. Base URL of the dagster-webserver. The client appends /graphql itself, so paste any of https://host/deployment, https://host/deployment/, https://host/deployment/graphql — all collapse to the same canonical base.
DAGSTER_API_TOKEN(empty)Required for Dagster+ deployments. Leave empty for unauthenticated local OSS Dagster. Sent as the Dagster-Cloud-Api-Token header.
Credentials configured via Options 1 and 2 are also persisted to ~/.opensre/integrations.json with 0o600 permissions:
{
  "version": 1,
  "integrations": [
    {
      "id": "dagster-prod",
      "service": "dagster",
      "status": "active",
      "credentials": {
        "endpoint": "https://your-org.dagster.cloud/prod",
        "api_token": "..."
      }
    }
  ]
}

Where to find your Dagster+ token and endpoint

Endpoint: look at the URL in your browser when logged into Dagster+. It is the part up through the deployment name, e.g. https://acme.dagster.cloud/prod if the address bar shows https://acme.dagster.cloud/prod/runs. EU accounts use a regional subdomain such as https://acme.eu.dagster.cloud/prod. Trailing /graphql is accepted and stripped automatically. API token:
  1. Click the user menu (your icon) → Organization Settings
  2. Open the Tokens tab
  3. Click + Create user token, give it a name like opensre-integration
  4. Copy the token immediately (Dagster+ shows it once and never again)
User Tokens inherit the user’s per-deployment role. A user account that has at least the Viewer role on the target deployment is sufficient for read-only investigation queries.
Token type matters. Use a User Token, not an Agent Token. Agent Tokens authenticate Hybrid agents talking to the Agents API and are rejected (HTTP 401) by the GraphQL endpoint.

Investigation tools

When OpenSRE investigates a Dagster-related alert, five diagnostic tools are available:
  • List runs — recent pipeline/job runs with status, job name, timestamps, and pre-computed duration; filterable by status and job name
  • Get run logs — event log for a specific run with ExecutionStepFailureEvent and RunFailureEvent entries; surfaces user-code exceptions from error.cause (e.g. the ValueError underlying Dagster’s DagsterExecutionStepExecutionError wrapper) and pre-counts multi-step failures
  • List assets with materialization — Dagster assets with their latest materialization timestamp + run id; useful for spotting stale or never-materialized assets
  • List sensor ticks — recent tick history for a sensor (identified by full SensorSelector triplet: repository location, repository, sensor name)
  • List schedule ticks — recent tick history for a schedule (identified by full ScheduleSelector triplet: repository location, repository, schedule name)

Verify

opensre integrations verify dagster
Expected output:
SERVICE    SOURCE       STATUS    DETAIL
dagster    local env    passed    Connected to Dagster version 1.13.6.
The verifier issues a query { version } probe against the configured endpoint and reports the running Dagster version on success.

Troubleshooting

SymptomFix
HTTP 401 with HTML bodyThe Dagster+ edge proxy rejected the request. Most likely causes: (1) the token is an Agent Token not a User Token; (2) the user owning the token lacks role on the target deployment; (3) the token was revoked or regenerated. Verify under Organization Settings → Tokens → User Tokens and confirm the user has access to the deployment in the URL.
Invalid JSON in response: Expecting valueThe endpoint was reached but did not respond with JSON. Usually means the URL is wrong (e.g. you pasted a path that hits the Dagster+ UI instead of the GraphQL endpoint). The client appends /graphql automatically; paste only the base URL through the deployment name.
Request to Dagster failed: Connection refuseddagster-webserver is not running at the configured endpoint. Start it with dagster dev -f jobs.py for local OSS, or check the Dagster+ deployment status.
runsOrError.__typename == InvalidPipelineRunsFilterErrorThe status filter passed an unrecognized RunStatus value. Valid values: QUEUED, NOT_STARTED, MANAGED, STARTING, STARTED, SUCCESS, FAILURE, CANCELING, CANCELED.
logsForRun returns RunNotFoundErrorThe run id does not exist on this deployment. Confirm the run id and the deployment slug in the endpoint match.
Sensor query returns SensorNotFoundErrorThe SensorSelector triplet (repository_location_name, repository_name, sensor_name) did not match a sensor in the deployment. List sensors in the Dagster UI to confirm the exact names.
Schedule query returns ScheduleNotFoundErrorThe ScheduleSelector triplet (repository_location_name, repository_name, schedule_name) did not match a schedule in the deployment. List schedules in the Dagster UI to confirm the exact names.

Security best practices

  • Use a dedicated User Token scoped to a service-style user account when possible. Dagster+ does not have first-class service accounts; the community pattern is a separate user whose token you use.
  • Keep tokens out of source control — use .env (gitignored) or the persistent store at ~/.opensre/integrations.json.
  • The GraphQL queries OpenSRE issues are read-only: list runs, fetch event logs, list assets, fetch sensor ticks. No mutations are sent.
  • Rotate tokens periodically. Tokens can be revoked from the same Organization Settings → Tokens page.
  • For local OSS Dagster without auth, restrict the webserver to localhost or your private network. Do not expose dagster dev’s default port to the internet.