Performance
Performance Guide — Query optimization and indexes for analytics views
FraiseQL exposes an Apache Arrow Flight DataPlane alongside the standard GraphQL endpoint, enabling analytics clients to stream columnar data at wire speed — bypassing row-by-row serialization for aggregate and reporting queries that return millions of rows.
The Arrow Flight DataPlane is ideal for dashboards, BI tools, data pipelines, and any workload where throughput matters more than the flexibility of a GraphQL query.
Before enabling the Arrow Flight DataPlane:
DATABASE_URLpyarrow (pip install pyarrow)apache-arrow (npm install apache-arrow)arrow and arrow-flight cratesEnable the DataPlane in fraiseql.toml:
[analytics]enabled = trueThe Arrow Flight service starts on port 8815 by default. This port is in addition to the standard GraphQL port (8080).
[analytics]enabled = true# flight_port = 8815 # gRPC Flight service port (optional)# max_batch_size = 65536 # Maximum rows per Arrow record batch (optional)# compression = "lz4" # Compression: "lz4", "zstd", or "none" (optional)| Option | Default | Description |
|---|---|---|
enabled | false | Enable the Arrow Flight DataPlane |
flight_port | 8815 | Port for the gRPC Flight service |
max_batch_size | 65536 | Maximum rows per Arrow record batch |
compression | "lz4" | Compression codec for data transfer |
Enable the DataPlane
Add the [analytics] block to fraiseql.toml:
[analytics]enabled = trueStart FraiseQL
fraiseql runYou should see a log line confirming the Flight service has started:
INFO fraiseql::flight: Arrow Flight DataPlane listening on 0.0.0.0:8815Connect a client
Point your Arrow Flight client at grpc://localhost:8815. The service accepts unauthenticated connections on development instances. In production, configure TLS via [analytics.tls] in fraiseql.toml.
The Flight DataPlane reads any v_* view exposed by FraiseQL. Define your aggregation logic in SQL and register the query in your schema.
Create the SQL view
CREATE VIEW v_analytics_sales ASSELECT product_id, SUM(revenue) AS revenue, COUNT(*) AS orders, ROUND(AVG(revenue), 2) AS avg_order_valueFROM tf_salesGROUP BY product_idORDER BY revenue DESC;Register in your schema
import fraiseqlfrom dataclasses import dataclassfrom decimal import Decimal
@fraiseql.typeclass SalesAggregate: """Aggregated sales metrics by product.""" product_id: str revenue: Decimal orders: int avg_order_value: Decimal
@fraiseql.query(sql_source="v_analytics_sales")def analytics_sales() -> list[SalesAggregate]: """Sales aggregates — available via both GraphQL and Arrow Flight.""" passimport { type, query } from 'fraiseql';
@type()class SalesAggregate { productId!: string; revenue!: number; orders!: number; avgOrderValue!: number;}
@query({ sqlSource: 'v_analytics_sales' })async analyticsSales(): Promise<SalesAggregate[]> { return [];}Query via Flight
Any view registered with @fraiseql.query or @query() is automatically accessible through both the GraphQL endpoint and the Arrow Flight DataPlane.
import pyarrow.flight as flight
# Connect to the Flight serviceclient = flight.connect("grpc://localhost:8815")
# Create a Flight descriptor for your querydescriptor = flight.FlightDescriptor.for_command( b'{"query": "SELECT data FROM v_analytics_sales"}')
# Get flight info and read the datainfo = client.get_flight_info(descriptor)reader = client.do_get(info.endpoints[0].ticket)table = reader.read_all()
# Convert to pandas for analysisprint(table.to_pandas())Expected output:
product_id revenue orders avg_order_value0 prod_001 45231.5 312 145.01 prod_002 38920.0 287 135.62 prod_003 29118.0 201 144.9The returned object is a PyArrow Table. From there you can:
table.to_pandas()pl.from_arrow(table)pq.write_table(table, 'output.parquet')import { tableFromIPC } from 'apache-arrow';import { FlightClient } from '@apache-arrow/flight';
const client = new FlightClient({ host: 'localhost', port: 8815 });
const descriptor = { query: 'SELECT data FROM v_analytics_sales'};
const info = await client.getFlightInfo(descriptor);const reader = await client.doGet(info.endpoints[0].ticket);
const batches = [];for await (const batch of reader) { batches.push(batch);}
const table = tableFromIPC(batches);console.table(table.toArray().slice(0, 5));Pass filter parameters in the query JSON to push predicates down to PostgreSQL:
import jsonimport pyarrow.flight as flight
client = flight.connect("grpc://localhost:8815")
command = json.dumps({ "query": "SELECT data FROM v_analytics_sales", "params": { "start_date": "2024-01-01", "end_date": "2024-03-31", "region": "US" }}).encode()
info = client.get_flight_info(flight.FlightDescriptor.for_command(command))reader = client.do_get(info.endpoints[0].ticket)table = reader.read_all()Arrow Flight sends data in Apache Arrow’s columnar binary format with optional LZ4 or Zstd compression. GraphQL sends JSON. For analytical payloads, the difference is substantial:
| Workload | GraphQL (JSON) | Arrow Flight (LZ4) | Speedup |
|---|---|---|---|
| 100k rows, 6 columns | ~2,800 ms | ~180 ms | ~15x |
| 1M rows, 6 columns | ~28,000 ms | ~1,400 ms | ~20x |
| 10M rows, 6 columns | Timeout | ~12,000 ms | — |
| Pandas load (1M rows) | ~9,200 ms | ~420 ms | ~22x |
| Use Case | Recommendation |
|---|---|
| Transactional reads | GraphQL — ACID semantics, mutations, subscriptions |
| Real-time updates | GraphQL — Subscriptions over WebSocket |
| Small result sets (less than 10k rows) | GraphQL — Simpler client libraries, HTTP caching |
| Large aggregations | Arrow Flight — Columnar format, compression |
| Data pipelines | Arrow Flight — Stream processing, Parquet export |
| BI tool integration | Arrow Flight — Standard Arrow ecosystem |
For continuously updated dashboards, use the streaming endpoint. The Flight service keeps the gRPC stream open and pushes new record batches as the underlying view refreshes:
import pyarrow.flight as flight
client = flight.connect("grpc://localhost:8815")
command = b'{"query": "SELECT data FROM v_realtime_counters", "stream": true}'descriptor = flight.FlightDescriptor.for_command(command)
# Blocks and yields batches as data arrivesfor batch in client.do_get(client.get_flight_info(descriptor).endpoints[0].ticket): df = batch.data.to_pandas() print(f"[{batch.data.num_rows} rows] {df['event_count'].sum()} total events")In production, enable TLS for the Flight service:
[analytics]enabled = true
[analytics.tls]enabled = truecert_file = "/certs/server.pem"key_file = "/certs/server.key"ca_file = "/certs/ca.pem" # Optional: for client cert verificationClient connection with TLS:
import pyarrow.flight as flight
# TLS enabled — use grpcs schemeclient = flight.connect("grpcs://analytics.example.com:8815")The Arrow Flight service exposes Prometheus metrics:
| Metric | Description |
|---|---|
fraiseql_flight_requests_total | Total Flight requests |
fraiseql_flight_rows_streamed_total | Total rows streamed |
fraiseql_flight_latency_seconds | Request latency histogram |
fraiseql_flight_errors_total | Error count by type |
Health check endpoint:
curl http://localhost:8080/healthIncludes Flight service status:
{ "status": "ok", "analytics": { "enabled": true, "port": 8815, "connections": 12 }}Current Arrow Flight implementation has these limitations:
“Connection refused” on port 8815
Verify the DataPlane is enabled in fraiseql.toml:
[analytics]enabled = trueCheck FraiseQL logs for the startup message:
INFO fraiseql::flight: Arrow Flight DataPlane listening on 0.0.0.0:8815Confirm the port is not blocked by a firewall
“View not found” errors
Ensure the view name in your Flight query matches the sql_source in your schema definition exactly (case-sensitive).
Slow query performance
Arrow Flight is only as fast as the underlying PostgreSQL query. Check:
EXPLAIN ANALYZE SELECT * FROM v_analytics_sales WHERE ...;Missing indexes on filter columns are the most common cause of slow analytics queries.
Large memory usage
The max_batch_size setting (default 65,536 rows) controls how many rows are buffered in memory. Reduce it for memory-constrained environments:
[analytics]enabled = truemax_batch_size = 10000Performance
Performance Guide — Query optimization and indexes for analytics views
Caching
Caching — Cache heavy aggregations
Federation
Federation — Combine analytics from multiple databases