Skip to content

Observability

FraiseQL provides comprehensive observability through metrics, distributed tracing, and structured logging.

FraiseQL exposes a /metrics endpoint in Prometheus format. The metric names and label schema are:

Terminal window
# No TOML config needed — metrics endpoint is always available at /metrics
curl http://localhost:8080/metrics
MetricTypeDescription
fraiseql_http_requests_totalCounterTotal HTTP requests
fraiseql_http_responses_2xxCounterTotal 2xx HTTP responses
fraiseql_http_responses_4xxCounterTotal 4xx HTTP responses
fraiseql_http_responses_5xxCounterTotal 5xx HTTP responses
MetricTypeDescription
fraiseql_graphql_queries_totalCounterTotal GraphQL queries executed
fraiseql_graphql_queries_successCounterTotal successful GraphQL queries
fraiseql_graphql_queries_errorCounterTotal failed GraphQL queries
fraiseql_graphql_query_duration_msGaugeAverage query execution time in milliseconds
fraiseql_validation_errors_totalCounterTotal validation errors
fraiseql_parse_errors_totalCounterTotal parse errors
fraiseql_execution_errors_totalCounterTotal execution errors
MetricTypeDescription
fraiseql_database_queries_totalCounterTotal database queries executed
fraiseql_database_query_duration_msGaugeAverage database query time in milliseconds

FraiseQL monitors connection pool pressure and emits scaling recommendations. The pool does not resize itself at runtime (the underlying library has no resize API) — use these metrics to decide when to raise pool_max in fraiseql.toml and restart.

MetricTypeDescription
fraiseql_pool_tuning_sizeGaugeCurrent configured pool size
fraiseql_pool_tuning_queue_depthGaugePending connection requests in the pool queue
fraiseql_pool_tuning_adjustments_totalCounterScaling recommendations emitted (not actual resizes)
MetricTypeDescription
fraiseql_cache_hitsCounterTotal cache hits
fraiseql_cache_missesCounterTotal cache misses
fraiseql_cache_hit_ratioGaugeCache hit ratio (0–1)
MetricTypeDescription
fraiseql_apq_hits_totalCounterAutomatic Persisted Query cache hits
fraiseql_apq_misses_totalCounterAutomatic Persisted Query cache misses
fraiseql_apq_stored_totalCounterAutomatic Persisted Queries stored
fraiseql_apq_redis_errors_totalCounterRedis errors in APQ store (fail-open; only present with redis-apq feature)
fraiseql_ws_connections_totalCounterWebSocket connection attempts (labeled by result)
fraiseql_ws_subscriptions_totalCounterWebSocket subscription attempts (labeled by result)
fraiseql_trusted_documents_hits_totalCounterTrusted document cache hits
fraiseql_trusted_documents_misses_totalCounterTrusted document cache misses
fraiseql_trusted_documents_rejected_totalCounterRejected untrusted documents
fraiseql_pkce_redis_errors_totalCounterRedis errors in PKCE state store (fail-open; only present with redis-pkce feature)
fraiseql_rate_limit_redis_errors_totalCounterRedis errors in rate limiter (fail-open; only present with redis-rate-limiting feature)
fraiseql_multi_root_queries_totalCounterMulti-root GraphQL queries executed in parallel
fraiseql_observer_dlq_overflow_totalCounterObserver DLQ entries dropped due to max_dlq_size cap
fraiseql_schema_reloads_totalCounterSuccessful schema hot-reloads (via admin API or SIGUSR1)
fraiseql_schema_reload_errors_totalCounterFailed schema reload attempts

Transport-level metrics are emitted alongside per-query metrics:

  • fraiseql_rest_requests_total — REST request count
  • fraiseql_grpc_requests_total — gRPC request count
  • All fraiseql_query_* metrics include a transport label (graphql, rest, grpc, websocket)

Common labels across metrics:

LabelDescription
operationGraphQL operation name
typequery, mutation, subscription
statussuccess, error
error_codeError code if failed
transportgraphql, rest, grpc, websocket

FraiseQL ships a pre-built 12-panel Grafana 10+ dashboard covering latency percentiles, connection pool health, cache stats, and error rates. Fetch it at runtime:

Terminal window
curl http://localhost:8080/api/v1/admin/grafana-dashboard > fraiseql-dashboard.json

Import the JSON into Grafana (Dashboards → Import). The dashboard wires directly to your Prometheus datasource with no manual panel configuration.

Example PromQL queries for custom panels:

# Request rate
rate(fraiseql_http_requests_total[5m])
# Average query duration (milliseconds gauge)
fraiseql_graphql_query_duration_ms
# Error rate
rate(fraiseql_graphql_queries_error[5m]) / rate(fraiseql_graphql_queries_total[5m])
# Cache hit ratio (use the pre-computed gauge)
fraiseql_cache_hit_ratio
# 5xx server error rate
rate(fraiseql_http_responses_5xx[5m])
# Connection pool queue depth (alert when sustained > 0 → increase pool_max)
fraiseql_pool_tuning_queue_depth

FraiseQL exposes admin endpoints under /api/v1/admin/ for operational tooling. These require an authenticated request (admin role or equivalent policy configured via [security]).

POST /api/v1/admin/explain — runs EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) for a named query against the connected database and returns the full plan alongside the generated SQL.

Request body:

{
"query_name": "posts",
"parameters": { "limit": 10, "status": "published" }
}

Response:

{
"query_name": "posts",
"sql_source": "v_post",
"generated_sql": "SELECT data FROM v_post WHERE ...",
"parameters": ["published", 10],
"explain_output": [ ... ]
}

Use this endpoint to understand query plans, verify index usage, and diagnose slow queries without needing direct database access.

GET /api/v1/admin/grafana-dashboard — returns the pre-built Grafana dashboard JSON described above.

OpenTelemetry tracing is compiled into the server by default. When no endpoint is configured, there is zero overhead — no gRPC connection attempt occurs.

Configure via [tracing] in fraiseql.toml or standard OpenTelemetry environment variables (env vars act as fallbacks when TOML fields are omitted):

[tracing]
service_name = "fraiseql-api"
otlp_endpoint = "http://otel-collector:4317"
otlp_export_timeout_secs = 10

FraiseQL creates spans for each request with attributes including:

AttributeDescription
graphql.operation.nameOperation name
graphql.operation.typequery/mutation/subscription
graphql.documentQuery document (if enabled)
db.systemDatabase type
db.statementSQL query (if enabled)
db.operationSELECT/INSERT/UPDATE/DELETE
user.idAuthenticated user ID
tenant.idTenant ID

Spans include a transport attribute (graphql, rest, grpc, websocket) to distinguish traffic sources in your tracing backend.

FraiseQL propagates trace context via headers:

traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01
tracestate: fraiseql=user:123
Terminal window
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317
OTEL_EXPORTER_OTLP_PROTOCOL=grpc
OTEL_SERVICE_NAME=fraiseql-api

Logging is configured via environment variables:

Terminal window
# Log level
RUST_LOG=info # error | warn | info | debug | trace
RUST_LOG=fraiseql=debug,info # per-crate level (fraiseql at debug, everything else at info)
# Log format (JSON output for production log aggregators)
FRAISEQL_LOG_FORMAT=json # json | pretty (default: pretty in dev, json in prod)
{
"timestamp": "2024-01-15T10:30:00.123Z",
"level": "INFO",
"target": "fraiseql_server::graphql",
"message": "Query executed",
"span": {
"request_id": "abc-123",
"user_id": "user-456"
},
"fields": {
"operation": "getUser",
"duration_ms": 45,
"cache_hit": true
}
}

Use the standard RUST_LOG directive syntax for per-crate levels:

Terminal window
RUST_LOG=fraiseql_server=info,fraiseql_core::cache=debug,fraiseql_core::db=warn,tower_http=debug,sqlx=warn

FraiseQL exposes health endpoints automatically — no configuration required:

Basic health:

Terminal window
curl http://localhost:8080/health
# {"status": "ok"}

Detailed health:

Terminal window
curl http://localhost:8080/health/detailed
{
"status": "ok",
"checks": {
"database": {
"status": "ok",
"latency_ms": 2
},
"cache": {
"status": "ok",
"size": 1500,
"max_size": 10000
},
"schema": {
"status": "ok",
"version": "1.0.0",
"loaded_at": "2024-01-15T10:00:00Z"
}
},
"version": "2.0.0",
"uptime_seconds": 3600
}
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /health/detailed
port: 8080
initialDelaySeconds: 5
periodSeconds: 5

Example Prometheus alerting rules:

groups:
- name: fraiseql
rules:
- alert: HighErrorRate
expr: |
rate(fraiseql_graphql_queries_error[5m]) /
rate(fraiseql_graphql_queries_total[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High GraphQL error rate"
- alert: HighQueryDuration
expr: |
fraiseql_graphql_query_duration_ms > 1000
for: 5m
labels:
severity: warning
annotations:
summary: "Average query duration above 1 second"
- alert: LowCacheHitRate
expr: |
fraiseql_cache_hit_ratio < 0.5
for: 15m
labels:
severity: info
annotations:
summary: "Cache hit rate below 50%"
- alert: High5xxRate
expr: |
rate(fraiseql_http_responses_5xx[5m]) > 1
for: 5m
labels:
severity: critical
annotations:
summary: "Elevated 5xx server error rate"
Terminal window
OTEL_TRACES_SAMPLER=traceidratio
OTEL_TRACES_SAMPLER_ARG=0.1 # sample 10% of requests

For error-focused sampling, configure your OTel collector or use a tail-based sampler (e.g., Grafana Tempo, OpenTelemetry Collector’s tailsampling processor) to keep 100% of error traces while downsampling success traces.

Log rotation is handled by your container runtime or log aggregator (Fluent Bit, Logstash, Loki, etc.) — not by FraiseQL. Write logs to stdout and let your infrastructure handle rotation and retention.

FraiseQL does not include user_id or request_id as metric labels by default — these high-cardinality values are kept in structured logs and traces instead, where cardinality is not a concern.

  1. Check /metrics endpoint is reachable from your Prometheus instance
  2. Verify Prometheus scrape config points to the correct port and path
  3. Ensure FraiseQL started successfully (fraiseql run with no errors)
  1. Verify OTEL_EXPORTER_OTLP_ENDPOINT is set and reachable
  2. Check OTEL_TRACES_SAMPLER_ARG — a low value (e.g., 0.001) may drop traces in low-traffic tests
  3. Check trace context propagation headers
  1. Adjust RUST_LOG level per component (e.g., RUST_LOG=warn,fraiseql=info)
  2. Filter in your log aggregator (Loki, Elasticsearch, etc.)
  1. Start FraiseQL — metrics are always available, no config needed:

    Terminal window
    fraiseql run
  2. Test metrics endpoint:

    Terminal window
    curl http://localhost:8080/metrics

    Expected output (partial):

    # HELP fraiseql_http_requests_total Total HTTP requests
    # TYPE fraiseql_http_requests_total counter
    fraiseql_http_requests_total 42
    # HELP fraiseql_graphql_queries_total Total GraphQL queries executed
    # TYPE fraiseql_graphql_queries_total counter
    fraiseql_graphql_queries_total 38
    # HELP fraiseql_cache_hit_ratio Cache hit ratio (0-1)
    # TYPE fraiseql_cache_hit_ratio gauge
    fraiseql_cache_hit_ratio 0.92
  3. Execute some queries to generate metrics:

    Terminal window
    # Run a few queries
    for i in {1..10}; do
    curl -s -X POST http://localhost:8080/graphql \
    -H "Content-Type: application/json" \
    -d '{"query": "{ __typename }"}' > /dev/null
    done
  4. Verify metrics updated:

    Terminal window
    curl http://localhost:8080/metrics | grep fraiseql_graphql_queries_total

    Expected output:

    fraiseql_graphql_queries_total 48
  5. Test health endpoints:

    Terminal window
    # Basic health
    curl http://localhost:8080/health

    Expected output:

    {"status": "ok"}
    Terminal window
    # Detailed health
    curl http://localhost:8080/health/detailed

    Expected output:

    {
    "status": "ok",
    "checks": {
    "database": {
    "status": "ok",
    "latency_ms": 2
    },
    "schema": {
    "status": "ok",
    "version": "1.0.0"
    }
    },
    "version": "2.0.0",
    "uptime_seconds": 3600
    }
  6. Verify structured logging:

    Terminal window
    # Make a request and check logs
    curl -s -X POST http://localhost:8080/graphql \
    -H "Content-Type: application/json" \
    -d '{"query": "{ me { id } }"}'

    Check FraiseQL stdout for JSON log lines:

    {
    "timestamp": "2024-01-15T10:30:00.123Z",
    "level": "INFO",
    "target": "fraiseql_server::graphql",
    "message": "Query executed",
    "span": {
    "request_id": "abc-123",
    "user_id": "user-456"
    },
    "fields": {
    "operation": "me",
    "duration_ms": 12,
    "cache_hit": false
    }
    }
  7. Test OpenTelemetry tracing (if enabled):

    Terminal window
    OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 \
    OTEL_SERVICE_NAME=fraiseql-api \
    fraiseql run

    After making requests, check your tracing backend (Jaeger, Zipkin, etc.) for spans.

If /metrics returns empty or connection refused:

  1. Check FraiseQL started successfully:

    Terminal window
    fraiseql run 2>&1 | head -20
  2. Verify the metrics endpoint is on port 8080 (same port as GraphQL):

    Terminal window
    curl http://localhost:8080/metrics

If database connection metrics are absent:

  1. Verify database connectivity:

    Terminal window
    curl http://localhost:8080/health/detailed | jq .checks.database
  2. Check [database] pool configuration in fraiseql.toml

FraiseQL does not include user_id or request_id as metric labels by default — no configuration needed.

If traces don’t show up in your backend:

  1. Verify OTLP endpoint is reachable from the FraiseQL process:

    Terminal window
    curl http://otel-collector:4317
  2. Set OTEL_TRACES_SAMPLER_ARG=1.0 temporarily to sample 100% for testing

  3. Ensure trace context propagation:

    Terminal window
    curl -H "traceparent: 00-0af7651916cd43dd8448eb211c80319c-b7ad6b7169203331-01" \
    http://localhost:8080/graphql

Deployment

Deployment — Production monitoring setup

Performance

Performance — Using metrics to optimize