Skip to main content

Overview

Better Stack comprises two MCP servers: Uptime (monitoring, incidents, on-call, status pages) and Telemetry (logs, metrics, dashboards, error tracking). Both share the same team/org context but require separate auth tokens.

How to Add Better Stack

1

Add to Civic

Add the Better Stack Uptime and/or Better Stack Telemetry servers to your Civic environment through the server directory. These are two separate servers — add whichever you need (or both).
2

Authorize

On first use, you will be redirected to Better Stack to authorize the connection. No API keys or secrets to manage manually.
Two separate authorizations — Uptime and Telemetry are independent products. You will need to authorize each one separately the first time you use it.
3

Test Connection

  • For Uptime: try "List all monitors" or "Who is currently on-call?"
  • For Telemetry: try "List all log sources" or "List all dashboards"

What You Can Do

Uptime Monitoring

Manage monitors, view response times and availability, track incidents and heartbeats

Incident Management

List, acknowledge, resolve, escalate, and comment on incidents with full timeline access

On-Call & Escalation

View on-call schedules, escalation policies, and severity notification channels

Status Pages

List status pages, manage resources, create incident reports, and post status updates

Log Querying

Query logs via ClickHouse, explore sources, extract JSON fields, and group by patterns

Metrics & Dashboards

List metrics, check cardinality, build queries, create and arrange dashboard charts

Error Tracking

List applications, query errors, resolve or ignore error patterns, and view session replays

Heartbeat Monitoring

Monitor cron jobs and scheduled tasks — incidents trigger automatically on missed pings

Use Cases

Uptime & Incidents

  • Monitor Health: "Show current status of all monitors"
  • Availability Reports: "Get availability for the API monitor for March 2026"
  • Incident Triage: "List all open incidents""Acknowledge incident 1234""Resolve incident 1234"
  • Incident Timeline: "Show the timeline for incident 1234"
  • On-Call Check: "Who is currently on-call?"
  • Status Page Updates: "Create an incident report: title 'API Degradation', message 'Investigating elevated error rates'"

Telemetry & Logs

  • Log Search: "Show the last 50 error logs from my-app"
  • Error Patterns: "Find the most common error patterns in the last 24 hours"
  • Slow Requests: "Show slow API requests over 1 second from today"
  • Metrics Exploration: "List available metrics for source 12345"
  • Dashboard Management: "List all dashboards""Add a line chart showing log volume per hour"
  • Error Tracking: "List recent unresolved errors""Resolve error pattern NullPointerException in UserService"

Available Tools

Uptime — Monitors & Heartbeats

list_monitors — List all monitors and their current status
get_monitor_response_times — Get response times broken down by region and phase (DNS, connect, TLS, transfer). Returns ~24h of data in 15-min buckets.
get_monitor_availability — Get availability percentage for a monitor over a date range
list_heartbeats — List all heartbeat monitors for cron jobs and scheduled tasks
get_heartbeat_availability — Get availability for a heartbeat over a date range

Uptime — Incidents

list_incidents — List incidents with filtering by status, monitor, heartbeat, and date range
acknowledge_incident — Acknowledge an open incident
resolve_incident — Resolve an incident
reopen_incident — Reopen a resolved incident (within 24 hours of resolution only)
escalate_incident — Escalate to a user, team, schedule, or policy. Call get_incident_escalation_options first to discover valid targets.
add_incident_comment — Add a comment to an incident timeline
get_incident_timeline — Full audit trail of state changes, notifications, and acknowledgements
get_incident_escalation_options — Discover valid escalation targets (User, Team, Schedule, Policy)

Uptime — On-Call & Status Pages

list_on_call_calendars — List on-call rotations and events
list_escalation_policies — List all escalation policies
list_status_pages — List all status pages and their resources
get_status_page_resources — Show monitors and resources on a status page
create_incident_report — Create a public incident report with affected resources
post_status_update — Post a status update to an incident report with optional subscriber notification

Telemetry — Sources & Logs

list_sources — List all log sources
get_source_details — Get ingestion token, host URL, table name, and retention settings for a source
get_source_fields — Get available fields for a source (returns nothing if source has no recent data)
telemetry_build_explore_query_tool — Generate a ClickHouse query from plain English
telemetry_query — Execute a direct ClickHouse query (requires cloud connection credentials)
create_source — Create a new log source with a specified platform type

Telemetry — Metrics & Dashboards

list_metrics — List available metrics for a source
get_metrics_and_cardinality — Show cardinality for all metrics
get_metric_query_instructions — Get correct aggregation functions and example queries for a metric
list_dashboards — List all dashboards
get_dashboard_layout — Show the layout and charts of a dashboard
add_chart — Add a chart to a dashboard
edit_chart — Edit an existing chart
move_charts — Rearrange chart positions on a dashboard (validates layout atomically)
export_dashboard — Export a dashboard as JSON
import_dashboard — Import a dashboard from JSON
get_chart_building_instructions — Documentation for supported chart types, units, axes, and layout rules
telemetry_chart — Preview chart queries with automatic error surfacing

Telemetry — Error Tracking

list_applications — List all error-tracking applications
create_application — Create a new error-tracking application with a platform type (e.g. javascript_errors, python_errors)
get_errors_query_instructions — Get the query schema for error tracking (different from logs)
update_error_state — Set error state to resolved, ignored, or unresolved. When ignoring, ignore_next_count must be 10, 100, or 1000.
create_cloud_connection — Create direct ClickHouse credentials for raw SQL queries. Connections expire after 1 hour.

Two separate auth tokens — Uptime and Telemetry are independent products, each requiring separate authorization.Team ID required for most create/list operations. Use list_teams to find it.Log query tips:
  • Use remote(<table_name>) for recent data (<30 min); use s3Cluster(primary, <table_name_s3>) with _row_type = 1 for historical.
  • All log fields live inside the raw JSON column. Extract with JSONExtract(raw, 'field', 'Nullable(String)').
  • Group noisy logs by _pattern to surface recurring message structures.
Dashboard grid is 12 columns wide. Use {{source}}, {{start_time}}, {{end_time}}, and {{time}} variables in dashboard queries.No cross-tool linking — Uptime incidents and Telemetry logs are not automatically correlated.

Guardrails

This server is covered by the 14 universal guardrails. Server-specific guardrails are coming soon.
Configure guardrails via the Civic UI or ask the Configurator Agent: “Add guardrails to my Better Stack server.”