Skip to content

Performance Issues

Solutions for FraiseQL performance problems.

  • Requests taking > 200ms (p95)
  • Inconsistent latency
  • Occasionally slow responses mixed with fast ones
  • Load hasn’t changed but latency increased
  1. Check logs for slow queries:
Terminal window
LOG_LEVEL=debug fraiseql run
  1. Monitor during slow period:
Terminal window
docker stats fraiseql
curl http://localhost:9000/metrics | grep fraiseql_request_duration
  1. Check database for slow queries:
SELECT mean_exec_time, calls, mean_exec_time * calls as total_time, query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC LIMIT 5;
-- Show slowest queries by total impact
SELECT mean_exec_time, calls, query FROM pg_stat_statements
ORDER BY mean_exec_time * calls DESC LIMIT 5;
  1. Analyze slow query:
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM posts WHERE user_id = 123;
-- Look for: "Seq Scan" → Add index, "Hash Aggregate" → Check if necessary

Add Missing Index (Most common):

Terminal window
# Check EXPLAIN output for "Seq Scan" or "Full Table Scan"
EXPLAIN ANALYZE SELECT * FROM posts WHERE user_id = 123;
# If full scan, add index
CREATE INDEX idx_posts_user_id ON posts(user_id);

Optimize N+1 Queries:

FraiseQL should batch automatically
If seeing 100 queries for 10 items, check logs

Query Complexity:

Deeply nested queries:
users { posts { comments { author { posts { ... } } } } }
Solution: Simplify query depth, use multiple queries

  • Usually fast (50ms)
  • Occasionally very slow (2-5 seconds)
  • Happens randomly or under load
  • Correlates with database load
  1. Database Lock Contention

    • Multiple writers competing
    • Long transactions holding locks
    • Deadlocks being retried
  2. Connection Pool Saturation

    • All connections in use
    • New requests waiting for free connection
    • Leads to cascading delays
  3. Memory Pressure

    • Swapping to disk
    • Garbage collection pauses
    • Buffer cache thrashing
  4. Network Latency

    • To database
    • Load balancer/DNS lookup
    • Network packet loss

Identify Root Cause:

Terminal window
# 1. Check database connection pool
curl http://localhost:9000/metrics | grep db_connections
# If near max: INCREASE POOL SIZE
PGBOUNCER_MAX_POOL_SIZE=50
# 2. Check memory usage during spike
docker stats fraiseql --no-stream
# 3. Check for GC pauses (Python)
LOG_LEVEL=debug
# 4. Check database locks (PostgreSQL)
SELECT * FROM pg_locks WHERE NOT granted;
# 5. Check network latency
ping database.example.com

Fix Connection Pool:

Terminal window
# Current: 3 instances × 5 pool size = 15 connections
# If using 100% regularly, increase:
# Option 1: Increase pool size
PGBOUNCER_MAX_POOL_SIZE=30
# Option 2: Add more instances
# Scale from 3 to 5 instances
# Option 3: Use read replicas
# Route reads to replica instead of primary

Fix Memory Pressure:

Terminal window
# Increase container memory
docker-compose.yml:
fraiseql:
deploy:
resources:
limits:
memory: 2G # Increase from 1G
# Or reduce batch size
BATCH_SIZE=100 # Smaller batches use less memory

  • Database CPU high even though application isn’t busy
  • Database becoming bottleneck
  • Slow queries affecting all users
  • Auto-scaling of app doesn’t help
  1. Too Many Queries

    • N+1 problem
    • No query caching
    • Inefficient queries
  2. Missing Indexes

    • Full table scans
    • Unnecessary I/O
  3. Suboptimal Queries

    • Complex joins
    • Inefficient grouping
    • Unnecessary subqueries
  4. Hot Spots

    • One table/query causing load
    • Contention on popular record

Find Slow Query:

SELECT mean_exec_time, calls, query FROM pg_stat_statements
ORDER BY mean_exec_time * calls DESC LIMIT 5;
-- Shows: slowest × most-called = most impact

Optimize with EXPLAIN:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM posts WHERE user_id = 123;
-- Look for:
-- - "Seq Scan" → Add index
-- - "Hash Aggregate" → Check if necessary
-- - "Nested Loop" → Might need better index

Add Strategic Indexes:

-- On WHERE columns
CREATE INDEX idx_posts_user_id ON posts(user_id);
-- Composite index for common filters
CREATE INDEX idx_posts_user_published ON posts(user_id, published);
-- Partial index for specific values
CREATE INDEX idx_posts_published ON posts(id) WHERE published = true;

Cache Expensive Queries:

@cached(ttl=3600)
@fraiseql.query
def user_stats(user_id: ID) -> UserStats:
# Expensive aggregation computed once per hour
pass

  • Memory usage growing over time
  • Container killed with OOM
  • Memory usage > 90%
  • Slowdowns due to swap/GC
  1. Memory Leak

    • Objects not being garbage collected
    • Circular references
    • Large caches growing unbounded
  2. Large Query Results

    • Fetching massive datasets
    • No pagination
    • Loading entire table into memory
  3. Caching Issues

    • Cache growing unbounded
    • Old cache entries not evicted

Identify Leak:

Terminal window
# Monitor memory growth over 1 hour
watch -n 60 'docker stats fraiseql'
# If consistently growing, there's a leak
# Check memory per query
for i in {1..100}; do
curl http://localhost:8000/graphql ...
done
docker stats fraiseql # Memory increased?

Fix Large Query Results:

# BAD: Fetches entire table
@fraiseql.query
def all_posts() -> list[Post]:
pass
# GOOD: Paginate
@fraiseql.query
def posts(first: int = 10, after: str | None = None) -> PostConnection:
pass
# Usage
query {
posts(first: 10) { # Only 10 at a time
edges { node { id title } }
pageInfo { hasNextPage }
}
}

Fix Unbounded Cache:

# Set TTL to prevent indefinite growth
@cached(ttl=3600) # Expires after 1 hour
def expensive_query():
pass
# Or use LRU cache with max size
from functools import lru_cache
@lru_cache(maxsize=1000)
def expensive_computation():
pass

Increase Memory Limit:

Terminal window
# Temporary: Increase container memory
docker update --memory 2G fraiseql
# Permanent: Update docker-compose.yml
fraiseql:
deploy:
resources:
limits:
memory: 2G

  • Requests timeout
  • “Cannot get database connection” errors
  • Gets worse under load
  • Fixed by restarting
  1. Connections Not Released

    • Long transactions
    • Unhandled errors
    • Middleware not closing connections
  2. Pool Too Small

    • More concurrent requests than pool size
    • Each request holds connection
  3. Connection Leak

    • Connections created but never returned
    • Growing over time

Check Current State:

-- Get connection count
SELECT COUNT(*) as total_connections,
COUNT(CASE WHEN state != 'idle' THEN 1 END) as active
FROM pg_stat_activity;
-- Get current max
SHOW max_connections;
-- If near max (e.g., 100/100), pool is exhausted

Immediate Relief:

Terminal window
# Increase pool size
PGBOUNCER_MAX_POOL_SIZE=50 # From 20
# Restart application
docker-compose restart fraiseql
# Kill idle connections
# PostgreSQL
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle' AND query_start < NOW() - INTERVAL '5 minutes';

Long-term Fix:

  1. Find why connections aren’t returning:

    Terminal window
    # Check logs for exceptions that don't close connections
    LOG_LEVEL=debug fraiseql run
  2. Reduce transaction time:

    # BAD: Long transaction
    async def mutation(input):
    db.begin()
    save_to_db(input)
    send_email() # 5 seconds
    db.commit()
    # GOOD: Short transaction
    async def mutation(input):
    db.begin()
    save_to_db(input)
    db.commit()
    await send_email() # Outside transaction
  3. Add connection retry:

    # Application auto-retries if no connection available
    # FraiseQL handles this
  4. Scale with more instances:

    If need 100 concurrent connections:
    5 instances × 20 pool = 100 connections

  • Requests return after timeout period
  • “Query execution timeout” errors
  • Random, not specific to one query
  • Gets worse under load
  1. Query Too Slow

    • Needs index
    • Query complexity
  2. Database Overloaded

    • Can’t start executing query in time
    • Waiting for free resources
  3. Timeout Too Low

    • 5 second timeout not enough
    • Query naturally takes 6 seconds

Check Query Speed:

Terminal window
# Run slow query directly
time psql -c "SELECT ..."
# If takes 10 seconds, timeout must be > 10 seconds

Increase Timeout:

Terminal window
# Application level
STATEMENT_TIMEOUT=30000 # 30 seconds
# Connection level (PostgreSQL)
DATABASE_URL=postgresql://...?statement_timeout=30000

Optimize Slow Query:

Terminal window
EXPLAIN ANALYZE SELECT ...;
# Add missing indexes, simplify query

  • Requests return 429 Too Many Requests
  • Valid traffic getting rate limited
  • Limits too strict for actual usage

Check Current Limits:

Terminal window
echo $RATE_LIMIT_REQUESTS
echo $RATE_LIMIT_WINDOW_SECONDS

Adjust Limits (if too strict):

Terminal window
# From: 100 requests per minute
# To: 1000 requests per minute
RATE_LIMIT_REQUESTS=1000
RATE_LIMIT_WINDOW_SECONDS=60
# Per-user limit (requires auth)
PER_USER_RATE_LIMIT=100
PER_USER_RATE_LIMIT_WINDOW_SECONDS=60

Monitor Rate Limit Usage:

Terminal window
# Check Prometheus metrics
curl http://localhost:9000/metrics | grep rate_limit
# If consistently hitting limit, increase it

  • Same query executed multiple times
  • Cache hit rate < 50%
  • Memory not being used efficiently

Check if Caching Enabled:

Terminal window
redis-cli ping
# Should return: PONG
# If error, Redis not running
docker-compose logs redis

Increase TTL:

# From 1 hour to 24 hours
@cached(ttl=86400)
def stable_data():
pass

Check Cache Hit Rate:

Terminal window
# Prometheus metric
curl http://localhost:9000/metrics | grep cache_hit_ratio
# Should be > 80% for production

Monitor Cache Size:

Terminal window
# If too much data cached, memory pressure
redis-cli INFO memory
# If used_memory > available_memory
# Either: increase memory or reduce cache size

  • One slow request blocks others
  • Timeouts cascade across system
  • Gets worse under load
  • System recovers when load drops
  1. Shared Resource Contention

    • Database connection pool shared
    • One slow request holds connections
    • Other requests can’t get connections
  2. Synchronous Calls

    • Long operation blocks request
    • Holds database connection
    • Other requests timeout

Make I/O Async:

# BAD: Blocks request
@fraiseql.mutation
def create_user(email: str) -> User:
user = db.create(email)
send_email(user.email) # Blocks!
return user
# GOOD: Async (don't wait for email)
@fraiseql.mutation
async def create_user(email: str) -> User:
user = await db.create(email)
# Send email in background
asyncio.create_task(send_email(user.email))
return user

Use Circuit Breaker:

# Fail fast if service down
from circuitbreaker import circuit
@circuit(failure_threshold=5, recovery_timeout=60)
def call_external_service():
# If 5 failures, stop trying for 60 seconds
# Return error immediately instead of waiting
pass

Implement Bulkheads:

Separate connection pools:
- User pool: 20 connections
- Admin pool: 10 connections
- Batch pool: 10 connections
Prevents one type from exhausting all

Response Latency (p50, p95, p99)
├── p50: Should be < 100ms
├── p95: Should be < 200ms
└── p99: Should be < 500ms
Database Queries
├── Count: < 5 per request
├── Latency: < 50ms average
└── Connection pool: < 80% utilized
Memory
├── Usage: Should be stable
├── Growth: < 1MB/minute growth
└── GC: No long pause times
Error Rate
├── Target: < 0.1%
└── Alerts: > 1% for 5 minutes
Cache
├── Hit rate: > 80%
├── Size: < 80% of available memory
└── Evictions: Normal (TTL expiry)
Terminal window
# Prometheus scrape configuration
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'fraiseql'
static_configs:
- targets: ['localhost:9000']
# Alert when p95 > 500ms
- alert: HighLatency
expr: fraiseql_request_duration_seconds{quantile="0.95"} > 0.5
# Alert when error rate > 1%
- alert: HighErrorRate
expr: rate(fraiseql_errors_total[5m]) > 0.01
# Alert when connection pool > 90%
- alert: ConnectionPoolNearFull
expr: fraiseql_db_connections_used / fraiseql_db_connections_max > 0.9
# Alert when cache hit rate < 70%
- alert: LowCacheHitRate
expr: fraiseql_cache_hit_ratio < 0.7

Terminal window
# Before optimization
curl -s http://localhost:8000/graphql -d '{"query":"..."}' | jq '.timing'
# After optimization
# Should see improvement in latency and query count
# Use load testing tool
k6 run benchmark.js
# Compare: p95 latency, error rate, throughput