Federation Reference
Complete reference documentation for FraiseQL’s federation capabilities. Federation Guide
This guide covers solutions for common issues when using FraiseQL’s federation and NATS capabilities.
Symptoms: GatewayError: Failed to fetch SDL from subgraph 'orders' at startup, or circuit breaker opening during operation.
Causes:
gateway.tomlSolutions:
# 1. Verify subgraph is reachable from the gateway hostcurl http://order-service:4002/health
# 2. Check gateway.toml subgraph URLscat gateway.toml | grep url
# 3. Test the GraphQL endpoint directlycurl http://order-service:4002/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{ _service { sdl } }"}'Symptoms: GatewayError: Type 'User' is defined in multiple subgraphs: users, orders
Causes: The same type name appears in multiple subgraphs. The built-in gateway requires each type to be owned by exactly one subgraph.
Solutions: Ensure each type is defined in only one subgraph. If both services need a User type, have the non-owning service reference it via @key entity resolution instead of redefining it.
Symptoms: Queries that span multiple subgraphs time out or return partial results.
Solutions:
# Increase circuit breaker recovery window[gateway.circuit_breaker]failure_threshold = 10recovery_timeout_secs = 60# Check subgraph latency for _entities queriescurl -w "@curl-format.txt" http://order-service:4002/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{ _entities(representations: [{__typename: \"Order\", id: \"...\"}]) { id } }"}'ERROR: INVALID_GRAPHQL subgraph "my-service" SDL is not valid GraphQLThe SDL contains issues like:
str instead of String, int instead of Int'Order' | None instead of OrderWhereInput typesUse fraiseql compile --sdl to export the schema as SDL directly from the compiler (bypasses the runtime endpoint):
fraiseql compile schema.py --sdl > subgraph.graphqlAlternatively, use __schema introspection instead of _service { sdl }:
curl localhost:4001/graphql \ -H "Content-Type: application/json" \ -d '{"query":"{ __schema { types { name fields { name type { name kind ofType { name } } } } } }"}' \ | jq > schema-introspection.jsonSymptoms: DatabaseConnectionError: Failed to connect to database 'inventory'
Causes:
Solutions:
# 1. Test database connectivitycurl http://localhost:8080/health
# 2. Check configurationcat fraiseql.toml | grep -A 5 "\[database\]"
# 3. Verify credentialsecho $INVENTORY_DATABASE_URL # Check URL format
# 4. Test manual connectionpsql $INVENTORY_DATABASE_URL -c "SELECT 1"# 4. Enable connection debug logging — set environment variable:# RUST_LOG=debug## Or configure the database in fraiseql.toml:[database]url = "${INVENTORY_DATABASE_URL}"pool_max = 10Symptoms: PoolExhaustedError: No available connections in pool for database 'inventory'
Causes:
Solutions:
# 1. Increase pool size[database]pool_min = 2pool_max = 50 # Increased from default 20connect_timeout_ms = 30000 # Timeout waiting for connectionidle_timeout_ms = 3600000 # Close idle connections after 1 hour# 2. Monitor connection pool via the metrics endpoint:# GET /metrics → fraiseql_database_pool_active, fraiseql_database_pool_idle
# 3. Identify slow queries — set slow query threshold in fraiseql.toml:# [database]# log_slow_queries_ms = 1000 # Log queries slower than 1 second# Or use RUST_LOG=debug for full query logging.Symptoms: FederationTimeoutError: Federated query to 'inventory' timed out after 5000ms
Causes:
Solutions:
# 1. Increase federation timeout[federation]default_timeout = 10000 # Increased from 5000msbatch_size = 50 # Smaller batches = faster queries
# 2. Per-database timeout[federation.database_timeouts]inventory = 10000payments = 15000 # Slower database needs more time# 2. Configure federation batch size and timeout in fraiseql.toml:[federation]batch_size = 100default_timeout = 10000# The Python type declares the shape only — no database routing in the decorator:@fraiseql.typeclass Order: id: ID items: list[OrderItem] # Federated from inventory DB — see fraiseql.toml-- 3. Add database indexes on the inventory database:CREATE INDEX idx_tb_order_item_fk_order ON tb_order_item(fk_order);Symptoms: CircularReferenceError when loading schema or executing deeply nested queries.
Causes:
Solutions:
# BAD: Circular reference — avoid federating back to the originating type@fraiseql.typeclass Order: items: list[OrderItem] # Federated from inventory DB
@fraiseql.typeclass OrderItem: order: Order # Circular! OrderItem federates back to Order
# GOOD: Only federate in one direction@fraiseql.typeclass Order: items: list[OrderItem] # Federated from inventory DB (one way only)
@fraiseql.typeclass OrderItem: order_id: ID # Just store the ID — don't federate back to OrderSymptoms: ForeignKeyError: Order references Product ID that doesn't exist
Causes:
Solutions:
# The Python type declares the shape only — no runtime DB access in Python@fraiseql.mutation(sql_source="fn_add_item_to_order", operation="CREATE")def add_item_to_order(order_id: ID, product_id: ID) -> OrderItem: """Add item to order. Validation is handled in the SQL function.""" pass-- Validation belongs in the SQL function (fn_add_item_to_order).-- The function can return a mutation_response with status 'failed:not_found'-- if the product doesn't exist:---- CREATE FUNCTION fn_add_item_to_order(p_order_id UUID, p_product_id UUID)-- RETURNS mutation_response AS $$-- DECLARE-- v_product_exists BOOLEAN;-- v_result mutation_response;-- BEGIN-- SELECT EXISTS(SELECT 1 FROM tb_product WHERE id = p_product_id)-- INTO v_product_exists;-- IF NOT v_product_exists THEN-- v_result.status := 'failed:not_found';-- v_result.message := 'Product not found';-- RETURN v_result;-- END IF;-- -- ... insert logic-- END;-- $$ LANGUAGE plpgsql;-- Enforce referential integrity at the database level:ALTER TABLE tb_order_item ADD CONSTRAINT fk_tb_order_item_fk_order FOREIGN KEY (fk_order) REFERENCES tb_order(pk_order), ADD CONSTRAINT fk_tb_order_item_fk_product FOREIGN KEY (fk_product) REFERENCES tb_product(pk_product);Symptoms: SagaCompensationError: Compensation step 'reserve_inventory' failed
Causes:
Solutions:
-- Saga compensation is implemented as SQL functions.-- The fn_ function should be idempotent and return a mutation_response.-- Example: make fn_release_reservation idempotent:CREATE OR REPLACE FUNCTION fn_release_reservation(p_id UUID)RETURNS mutation_response AS $$DECLARE v_result mutation_response;BEGIN -- Idempotent: only update if not already released UPDATE tb_reservation SET status = 'released' WHERE id = p_id AND status != 'released';
v_result.status := 'success'; v_result.message := 'Reservation released'; RETURN v_result;END;$$ LANGUAGE plpgsql;# Configure saga/observer timeout in fraiseql.toml:[observers]backend = "nats"nats_url = "nats://localhost:4222"Symptoms: Order created but saga never completes; stuck in pending status.
Causes:
Solutions:
# 1. Monitor saga/mutation progress via structured logs.# Set RUST_LOG=debug to see each request with its requestId.# Use pg_notify observers in fraiseql.toml to track step completions:# [observers]# backend = "nats"# nats_url = "nats://localhost:4222"-- 2. Check stuck sagas in databaseSELECT *FROM tb_saga_executionWHERE status = 'pending' AND created_at < NOW() - INTERVAL '5 minutes'ORDER BY created_at DESC;
-- 3. Manual cleanup of stuck sagas (as a SQL function called via mutation)-- Define a fn_cleanup_stuck_saga SQL function that:-- - Validates the saga is in pending state-- - Marks it as compensated or triggers compensation steps-- - Returns a mutation_response with the outcome# Expose the cleanup as a FraiseQL mutation (compile-time definition only):@fraiseql.mutation(sql_source="fn_cleanup_stuck_saga", operation="CUSTOM")def cleanup_stuck_saga(saga_id: ID) -> bool: """Manually trigger compensation for stuck saga.""" passSymptoms: Query with federated field takes 10+ seconds
Causes:
Solutions:
# 1. Check if federation is batching correctly@fraiseql.querydef orders_with_items(limit: int = 100) -> list[Order]: """ With batching: Should be 2 queries total - 1 query: SELECT id, data FROM v_order LIMIT 100 - 1 query: SELECT id, data FROM v_order_item WHERE fk_order IN (...) """ return fraiseql.config(sql_source="v_order")
# 2. Denormalize to reduce federated queries@fraiseql.typeclass Order: id: ID item_count: int # Denormalized count from SQL view — avoids federation
# Federated from inventory DB — configured in fraiseql.toml items: list[OrderItem]
# 3. Use selective queries to avoid full federation@fraiseql.querydef order_summary(id: ID) -> OrderSummary | None: """ Query a summary view instead of the full federated type when only aggregate fields are needed. """ return fraiseql.config(sql_source="v_order_summary")# Enable query logging to verify batching — set via environment variable:# RUST_LOG=debug fraiseql run-- 4. Add indexes on the inventory database:CREATE INDEX idx_tb_order_item_fk_order_fk_product ON tb_order_item(fk_order, fk_product);Symptoms: NatsConnectionError: Failed to connect to NATS server
Causes:
Solutions:
# 1. Check NATS server statusnats server info
# 2. Test connectionnats ping
# 3. Check configurationcat fraiseql.toml | grep -A 3 "\[nats\]"
# 4. Verify URL formatecho $NATS_URL # Should be: nats://host:4222
# 5. Start NATS if not runningdocker run -it --rm -p 4222:4222 natsSymptoms: AuthorizationError: NATS authentication failed
Causes:
Solutions:
# 1. Update credentials[nats.auth]type = "token"token = "${NATS_TOKEN}" # Ensure env var is set
# 2. Verify tokenecho $NATS_TOKEN
# 3. Use NKey authentication (more secure)[nats.auth]type = "nkey"nkey = "${NATS_NKEY}"# 4. Generate new credentialsnats user create fraiseql-usernats nkey gen user -o fraiseql.nk # NKeySymptoms: StreamNotFoundError: Stream 'orders' not found
Causes:
Solutions:
# 1. List existing streamsnats stream list
# 2. Check stream configurationnats stream info orders
# 3. Create missing streamnats stream add orders \ --subjects "fraiseql.order.>" \ --max-msgs 1000000 \ --max-bytes 10GB \ --retention limits# 4. Ensure stream is configured[nats.jetstream.streams.orders]subjects = ["fraiseql.order.>"]replicas = 3max_msgs = 1000000max_bytes = 10737418240Symptoms: Consumer far behind in processing; queue backs up
Causes:
Solutions:
# 1. Check consumer statusnats consumer info orders order-processor
# Output shows:# Pending: 50000 # Many messages waiting# Delivered: 1000# Acked: 800
# 2. Increase processing capacity# Scale up consumer service: 1 instance -> 3 instances
# 3. Check consumer queue groupnats consumer info orders order-processor
# 4. Increase ack wait if processing is slow[nats.jetstream.consumers.order-processor]ack_wait = "60s" # Increased from 30s# 5. Configure NATS observers in fraiseql.toml.# FraiseQL's Rust runtime handles NATS event dispatch — not Python.# Use the [observers] section to configure subjects and topics:[observers]backend = "nats"nats_url = "nats://localhost:4222"Symptoms: Event published but subscribers don’t receive it
Causes:
Solutions:
# 1. Check consumer statusnats consumer info orders order-processor
# Look for:# - NumPending (messages waiting)# - NumAckPending (unacked messages)
# 2. Check subject matches# Published to: fraiseql.order.created# Subscribed to: fraiseql.order.> (match)# Subscribed to: orders.created (no match)# 3. Verify FraiseQL is running and observers are configured in fraiseql.toml.# RUST_LOG=debug will log received NATS events.# Look for "nats: received message on subject fraiseql.order.created" in the output.# 4. Check max deliver limit# If message is redelivered more than max_deliver times,# it goes to dead letter queue[nats.jetstream.consumers.order-processor]max_deliver = 3 # Redelivered max 3 times# Check dead letter queue via NATS CLI:nats consumer info orders fraiseql-dlqnats stream view fraiseql-dlqSymptoms: Status changed events arrive before creation event
Causes:
Solutions:
# 1. Configure NATS partitioning in fraiseql.toml so the same order ID# always routes to the same partition, preserving ordering:[nats.partitions]enabled = truekey = "order_id" # Same order always goes to same partitioncount = 8# 2. Use a single-consumer durable subscriber (no queue group)# so events are processed strictly in sequence per subject:nats consumer add orders order-seq \ --deliver all \ --ack explicit \ --wait 30s \ --max-deliver 3# 3. Use durable consumer with explicit ack[nats.jetstream.consumers.order-processor]deliver_policy = "all" # Start from beginningack_policy = "explicit" # Must explicitly ACKack_wait = "30s" # Timeout if not ACKedmax_deliver = 3 # Retry 3 timesSymptoms: Same event processed multiple times; duplicate orders created
Causes:
Solutions:
-- 1. Implement idempotency at the database level.-- The SQL function that processes the event should be idempotent.-- Use INSERT ... ON CONFLICT DO NOTHING with a unique event_id column:CREATE TABLE tb_processed_event ( pk_processed_event BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, id UUID DEFAULT gen_random_uuid() UNIQUE NOT NULL, identifier TEXT UNIQUE NOT NULL, -- event_id processed_at TIMESTAMPTZ DEFAULT now());
-- In the processing function:-- INSERT INTO tb_processed_event (identifier) VALUES (p_event_id)-- ON CONFLICT (identifier) DO NOTHING;-- IF NOT FOUND THEN RETURN; END IF; -- Already processed# 2. Use NATS JetStream's built-in message deduplication.# Set a deduplication window when creating the stream:nats stream add orders \ --subjects "fraiseql.order.>" \ --dedup-window 24h \ --max-msgs 1000000
# Publishers include a Nats-Msg-Id header for deduplication:nats pub fraiseql.order.created '{"order_id":"..."}' \ --header Nats-Msg-Id:evt_550e8400Symptoms: Order created successfully but notification service doesn’t receive event
Causes:
Solutions:
-- 1. Use the transactional outbox pattern for guaranteed delivery.-- Write events to a tb_pending_event table inside the same transaction-- as the mutation, then a separate process (or pg_notify) publishes to NATS:CREATE TABLE tb_pending_event ( pk_pending_event BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, id UUID DEFAULT gen_random_uuid() UNIQUE NOT NULL, identifier TEXT UNIQUE NOT NULL, -- idempotency key subject TEXT NOT NULL, -- NATS subject payload JSONB NOT NULL, created_at TIMESTAMPTZ DEFAULT now(), published_at TIMESTAMPTZ);
-- Inside fn_create_order, after inserting the order, also insert the event:-- INSERT INTO tb_pending_event (identifier, subject, payload)-- VALUES (-- 'order.created.' || v_order_id,-- 'fraiseql.order.confirmed',-- jsonb_build_object('order_id', v_order_id, ...)-- );# 2. Configure FraiseQL observers to publish tb_pending_event rows to NATS.# The Rust runtime handles polling/pg_notify and publishing:[observers]backend = "nats"nats_url = "nats://localhost:4222"# 3. Monitor NATS publish errors via the FraiseQL metrics endpoint:curl http://localhost:8080/metrics | grep nats_publish# fraiseql_nats_publish_total{status="success"} 1234# fraiseql_nats_publish_total{status="error"} 0Symptoms: Notification service processes event but queries return empty
Causes:
Solutions:
-- 1. Publish only after transaction commits.-- Use the transactional outbox pattern (see above): insert the event row-- in the same transaction as the mutation. The outbox publisher only-- dispatches to NATS after the PostgreSQL transaction is durably committed.# 2. Include all necessary data in the event payload so downstream services# do not need to query the API before the data is replicated.# Publish from the SQL function via the outbox, including a full snapshot:## INSERT INTO tb_pending_event (identifier, subject, payload)# VALUES (# 'order.confirmed.' || v_order_id,# 'fraiseql.order.confirmed',# jsonb_build_object(# 'order_id', v_order_id,# 'customer_id', v_customer_id,# 'total', v_total::text,# 'items', v_items_json -- full snapshot, no follow-up query needed# )# );# 3. Configure retry/backoff for the NATS observer in fraiseql.toml:[observers]backend = "nats"nats_url = "nats://localhost:4222"-- Problem: Saga holds lock while federation waits-- Solution: Use lower isolation levelSET TRANSACTION ISOLATION LEVEL READ COMMITTED;# Use cursor-based (Relay) pagination to avoid loading large result sets at once.# Define the query with relay=True in fraiseql.config():@fraiseql.querydef orders(limit: int = 100) -> list[Order]: """Use cursor pagination — configured in fraiseql.toml federation section.""" return fraiseql.config(sql_source="v_order", relay=True)# Then query with cursor-based pagination from the client:query { orders(first: 100, after: "cursor-from-previous-page") { edges { node { id } cursor } pageInfo { hasNextPage endCursor } }}# SQLite has a single writer. Configure a smaller pool size in fraiseql.toml# to reduce write contention:[database]pool_max = 1 # SQLite: single writerpool_min = 1# Enable WAL mode on the SQLite database for better read concurrency:sqlite3 database.db "PRAGMA journal_mode=WAL;"Federation Reference
Complete reference documentation for FraiseQL’s federation capabilities. Federation Guide
NATS Reference
Reference documentation for NATS integration and JetStream configuration. NATS Guide
Error Handling
Patterns for handling errors in federated and event-driven applications. Error Handling Guide
General Troubleshooting
Diagnose connection, query, and infrastructure issues. Troubleshooting Index