Better Stack
Overview
Better Stack comprises two MCP servers: Uptime (monitoring, incidents, on-call, status pages) and Telemetry (logs, metrics, dashboards, error tracking). Both share the same team/org context but require separate auth tokens.
How to Add Better Stack
- 1Add to Civic
Add the Better Stack Uptime and/or Better Stack Telemetry servers to your Civic environment through the server directory. These are two separate servers — add whichever you need (or both).
- 2Authorize
On first use, you will be redirected to Better Stack to authorize the connection. No API keys or secrets to manage manually.
infoTwo separate authorizations — Uptime and Telemetry are independent products. You will need to authorize each one separately the first time you use it.
- 3Test Connection
- For Uptime: try
"List all monitors"or"Who is currently on-call?" - For Telemetry: try
"List all log sources"or"List all dashboards"
- For Uptime: try
What You Can Do
Manage monitors, view response times and availability, track incidents and heartbeats
List, acknowledge, resolve, escalate, and comment on incidents with full timeline access
View on-call schedules, escalation policies, and severity notification channels
List status pages, manage resources, create incident reports, and post status updates
Query logs via ClickHouse, explore sources, extract JSON fields, and group by patterns
List metrics, check cardinality, build queries, create and arrange dashboard charts
List applications, query errors, resolve or ignore error patterns, and view session replays
Monitor cron jobs and scheduled tasks — incidents trigger automatically on missed pings
Use Cases
Uptime & Incidents
- Monitor Health:
"Show current status of all monitors" - Availability Reports:
"Get availability for the API monitor for March 2026" - Incident Triage:
"List all open incidents"→"Acknowledge incident 1234"→"Resolve incident 1234" - Incident Timeline:
"Show the timeline for incident 1234" - On-Call Check:
"Who is currently on-call?" - Status Page Updates:
"Create an incident report: title 'API Degradation', message 'Investigating elevated error rates'"
Telemetry & Logs
- Log Search:
"Show the last 50 error logs from my-app" - Error Patterns:
"Find the most common error patterns in the last 24 hours" - Slow Requests:
"Show slow API requests over 1 second from today" - Metrics Exploration:
"List available metrics for source 12345" - Dashboard Management:
"List all dashboards"→"Add a line chart showing log volume per hour" - Error Tracking:
"List recent unresolved errors"→"Resolve error pattern NullPointerException in UserService"
Available Tools
Uptime — Monitors & Heartbeats
list_monitors
list_monitors — List all monitors and their current status
get_monitor_response_times
get_monitor_response_times — Get response times broken down by region and phase (DNS, connect, TLS, transfer). Returns ~24h of data in 15-min buckets.
get_monitor_availability
get_monitor_availability — Get availability percentage for a monitor over a date range
list_heartbeats
list_heartbeats — List all heartbeat monitors for cron jobs and scheduled tasks
get_heartbeat_availability
get_heartbeat_availability — Get availability for a heartbeat over a date range
Uptime — Incidents
list_incidents
list_incidents — List incidents with filtering by status, monitor, heartbeat, and date range
acknowledge_incident
acknowledge_incident — Acknowledge an open incident
resolve_incident
resolve_incident — Resolve an incident
reopen_incident
reopen_incident — Reopen a resolved incident (within 24 hours of resolution only)
escalate_incident
escalate_incident — Escalate to a user, team, schedule, or policy. Call get_incident_escalation_options first to discover valid targets.
add_incident_comment
add_incident_comment — Add a comment to an incident timeline
get_incident_timeline
get_incident_timeline — Full audit trail of state changes, notifications, and acknowledgements
get_incident_escalation_options
get_incident_escalation_options — Discover valid escalation targets (User, Team, Schedule, Policy)
Uptime — On-Call & Status Pages
list_on_call_calendars
list_on_call_calendars — List on-call rotations and events
list_escalation_policies
list_escalation_policies — List all escalation policies
list_status_pages
list_status_pages — List all status pages and their resources
get_status_page_resources
get_status_page_resources — Show monitors and resources on a status page
create_incident_report
create_incident_report — Create a public incident report with affected resources
post_status_update
post_status_update — Post a status update to an incident report with optional subscriber notification
Telemetry — Sources & Logs
list_sources
list_sources — List all log sources
get_source_details
get_source_details — Get ingestion token, host URL, table name, and retention settings for a source
get_source_fields
get_source_fields — Get available fields for a source (returns nothing if source has no recent data)
telemetry_build_explore_query_tool
telemetry_build_explore_query_tool — Generate a ClickHouse query from plain English
telemetry_query
telemetry_query — Execute a direct ClickHouse query (requires cloud connection credentials)
create_source
create_source — Create a new log source with a specified platform type
Telemetry — Metrics & Dashboards
list_metrics
list_metrics — List available metrics for a source
get_metrics_and_cardinality
get_metrics_and_cardinality — Show cardinality for all metrics
get_metric_query_instructions
get_metric_query_instructions — Get correct aggregation functions and example queries for a metric
list_dashboards
list_dashboards — List all dashboards
get_dashboard_layout
get_dashboard_layout — Show the layout and charts of a dashboard
add_chart
add_chart — Add a chart to a dashboard
edit_chart
edit_chart — Edit an existing chart
move_charts
move_charts — Rearrange chart positions on a dashboard (validates layout atomically)
export_dashboard
export_dashboard — Export a dashboard as JSON
import_dashboard
import_dashboard — Import a dashboard from JSON
get_chart_building_instructions
get_chart_building_instructions — Documentation for supported chart types, units, axes, and layout rules
telemetry_chart
telemetry_chart — Preview chart queries with automatic error surfacing
Telemetry — Error Tracking
list_applications
list_applications — List all error-tracking applications
create_application
create_application — Create a new error-tracking application with a platform type (e.g. javascript_errors, python_errors)
get_errors_query_instructions
get_errors_query_instructions — Get the query schema for error tracking (different from logs)
update_error_state
update_error_state — Set error state to resolved, ignored, or unresolved. When ignoring, ignore_next_count must be 10, 100, or 1000.
create_cloud_connection
create_cloud_connection — Create direct ClickHouse credentials for raw SQL queries. Connections expire after 1 hour.
Two separate auth tokens — Uptime and Telemetry are independent products, each requiring separate authorization.
Team ID required for most create/list operations. Use list_teams to find it.
Log query tips:
- Use
remote(<table_name>)for recent data (<30 min); uses3Cluster(primary, <table_name_s3>)with_row_type = 1for historical. - All log fields live inside the
rawJSON column. Extract withJSONExtract(raw, 'field', 'Nullable(String)'). - Group noisy logs by
_patternto surface recurring message structures.
Dashboard grid is 12 columns wide. Use {{source}}, {{start_time}}, {{end_time}}, and {{time}} variables in dashboard queries.
No cross-tool linking — Uptime incidents and Telemetry logs are not automatically correlated.
Guardrails
This server is covered by the 14 universal guardrails. Server-specific guardrails are coming soon.
Configure guardrails via the Civic UI or ask the Configurator Agent: "Add guardrails to my Better Stack server."