Analytics & Arrow Flight DataPlane

FraiseQL exposes an Apache Arrow Flight DataPlane alongside the standard GraphQL endpoint, enabling analytics clients to stream columnar data at wire speed — bypassing row-by-row serialization for aggregate and reporting queries that return millions of rows.

The Arrow Flight DataPlane is ideal for dashboards, BI tools, data pipelines, and any workload where throughput matters more than the flexibility of a GraphQL query.

Prerequisites

Before enabling the Arrow Flight DataPlane:

PostgreSQL 14 or later
FraiseQL running with a PostgreSQL DATABASE_URL
Client library for your language:
- Python: pyarrow (pip install pyarrow)
- TypeScript/Node: apache-arrow (npm install apache-arrow)
- Rust: arrow and arrow-flight crates

Configuration

Enable the DataPlane in fraiseql.toml:

[analytics]
enabled = true

The Arrow Flight service starts on port 8815 by default. This port is in addition to the standard GraphQL port (8080).

Available Options

[analytics]
enabled = true
# flight_port = 8815        # gRPC Flight service port (optional)
# max_batch_size = 65536  # Maximum rows per Arrow record batch (optional)
# compression = "lz4"       # Compression: "lz4", "zstd", or "none" (optional)

Option	Default	Description
`enabled`	`false`	Enable the Arrow Flight DataPlane
`flight_port`	`8815`	Port for the gRPC Flight service
`max_batch_size`	`65536`	Maximum rows per Arrow record batch
`compression`	`"lz4"`	Compression codec for data transfer

Setup

Enable the DataPlane

Add the [analytics] block to fraiseql.toml:
```
[analytics]
enabled = true
```
Start FraiseQL
Terminal window
```
fraiseql run
```
You should see a log line confirming the Flight service has started:
```
INFO fraiseql::flight: Arrow Flight DataPlane listening on 0.0.0.0:8815
```
Connect a client

Point your Arrow Flight client at grpc://localhost:8815. The service accepts unauthenticated connections on development instances. In production, configure TLS via [analytics.tls] in fraiseql.toml.

Querying Analytics Data

The Flight DataPlane reads any v_* view exposed by FraiseQL. Define your aggregation logic in SQL and register the query in your schema.

Defining Analytics Views

Create the SQL view

CREATE VIEW v_analytics_sales AS
SELECT
    product_id,
    SUM(revenue)               AS revenue,
    COUNT(*)                   AS orders,
    ROUND(AVG(revenue), 2)     AS avg_order_value
FROM tf_sales
GROUP BY product_id
ORDER BY revenue DESC;

Register in your schema
- Python SDK
- TypeScript SDK
import fraiseql from dataclasses import dataclass from decimal import Decimal @fraiseql.type class SalesAggregate: """Aggregated sales metrics by product.""" product_id: str revenue: Decimal orders: int avg_order_value: Decimal @fraiseql.query(sql_source="v_analytics_sales") def analytics_sales() -> list[SalesAggregate]: """Sales aggregates — available via both GraphQL and Arrow Flight.""" pass
import { type, query } from 'fraiseql'; @type() class SalesAggregate { productId!: string; revenue!: number; orders!: number; avgOrderValue!: number; } @query({ sqlSource: 'v_analytics_sales' }) async analyticsSales(): Promise<SalesAggregate[]> { return []; }
Query via Flight

Any view registered with @fraiseql.query or @query() is automatically accessible through both the GraphQL endpoint and the Arrow Flight DataPlane.

Python Client Example

import pyarrow.flight as flight

# Connect to the Flight service
client = flight.connect("grpc://localhost:8815")

# Create a Flight descriptor for your query
descriptor = flight.FlightDescriptor.for_command(
    b'{"query": "SELECT data FROM v_analytics_sales"}'
)

# Get flight info and read the data
info = client.get_flight_info(descriptor)
reader = client.do_get(info.endpoints[0].ticket)
table = reader.read_all()

# Convert to pandas for analysis
print(table.to_pandas())

Expected output:

   product_id  revenue  orders  avg_order_value
0    prod_001  45231.5     312          145.0
1    prod_002  38920.0     287          135.6
2    prod_003  29118.0     201          144.9

The returned object is a PyArrow Table. From there you can:

Convert to pandas: table.to_pandas()
Convert to polars: pl.from_arrow(table)
Write to Parquet: pq.write_table(table, 'output.parquet')
Stream to DuckDB: Direct Arrow integration

TypeScript Client Example

import { tableFromIPC } from 'apache-arrow';
import { FlightClient } from '@apache-arrow/flight';

const client = new FlightClient({ host: 'localhost', port: 8815 });

const descriptor = {
  query: 'SELECT data FROM v_analytics_sales'
};

const info = await client.getFlightInfo(descriptor);
const reader = await client.doGet(info.endpoints[0].ticket);

const batches = [];
for await (const batch of reader) {
  batches.push(batch);
}

const table = tableFromIPC(batches);
console.table(table.toArray().slice(0, 5));

Filtering with Parameters

Pass filter parameters in the query JSON to push predicates down to PostgreSQL:

import json
import pyarrow.flight as flight

client = flight.connect("grpc://localhost:8815")

command = json.dumps({
    "query": "SELECT data FROM v_analytics_sales",
    "params": {
        "start_date": "2024-01-01",
        "end_date": "2024-03-31",
        "region": "US"
    }
}).encode()

info = client.get_flight_info(flight.FlightDescriptor.for_command(command))
reader = client.do_get(info.endpoints[0].ticket)
table = reader.read_all()

Performance: Arrow Flight vs. GraphQL

Arrow Flight sends data in Apache Arrow’s columnar binary format with optional LZ4 or Zstd compression. GraphQL sends JSON. For analytical payloads, the difference is substantial:

Workload	GraphQL (JSON)	Arrow Flight (LZ4)	Speedup
100k rows, 6 columns	~2,800 ms	~180 ms	~15x
1M rows, 6 columns	~28,000 ms	~1,400 ms	~20x
10M rows, 6 columns	Timeout	~12,000 ms	—
Pandas load (1M rows)	~9,200 ms	~420 ms	~22x

When to Use Each Endpoint

Use Case	Recommendation
Transactional reads	GraphQL — ACID semantics, mutations, subscriptions
Real-time updates	GraphQL — Subscriptions over WebSocket
Small result sets (less than 10k rows)	GraphQL — Simpler client libraries, HTTP caching
Large aggregations	Arrow Flight — Columnar format, compression
Data pipelines	Arrow Flight — Stream processing, Parquet export
BI tool integration	Arrow Flight — Standard Arrow ecosystem

Real-Time Analytics with Streaming

For continuously updated dashboards, use the streaming endpoint. The Flight service keeps the gRPC stream open and pushes new record batches as the underlying view refreshes:

import pyarrow.flight as flight

client = flight.connect("grpc://localhost:8815")

command = b'{"query": "SELECT data FROM v_realtime_counters", "stream": true}'
descriptor = flight.FlightDescriptor.for_command(command)

# Blocks and yields batches as data arrives
for batch in client.do_get(client.get_flight_info(descriptor).endpoints[0].ticket):
    df = batch.data.to_pandas()
    print(f"[{batch.data.num_rows} rows] {df['event_count'].sum()} total events")

TLS Configuration

In production, enable TLS for the Flight service:

[analytics]
enabled = true

[analytics.tls]
enabled = true
cert_file = "/certs/server.pem"
key_file = "/certs/server.key"
ca_file = "/certs/ca.pem"  # Optional: for client cert verification

Client connection with TLS:

import pyarrow.flight as flight

# TLS enabled — use grpcs scheme
client = flight.connect("grpcs://analytics.example.com:8815")

Monitoring

The Arrow Flight service exposes Prometheus metrics:

Metric	Description
`fraiseql_flight_requests_total`	Total Flight requests
`fraiseql_flight_rows_streamed_total`	Total rows streamed
`fraiseql_flight_latency_seconds`	Request latency histogram
`fraiseql_flight_errors_total`	Error count by type

Health check endpoint:

curl http://localhost:8080/health

Includes Flight service status:

{
  "status": "ok",
  "analytics": {
    "enabled": true,
    "port": 8815,
    "connections": 12
  }
}

Limitations

Current Arrow Flight implementation has these limitations:

PostgreSQL only: MySQL, SQLite, and SQL Server are not supported
No Flight SQL: Uses custom JSON descriptors, not standard Flight SQL protocol
No authentication: Unauthenticated in development; use TLS and network policies in production
Read-only: Mutations must use the GraphQL endpoint
No subscriptions: Streaming is pull-based, not push-based like GraphQL subscriptions

Troubleshooting

“Connection refused” on port 8815

Verify the DataPlane is enabled in fraiseql.toml:
```
[analytics]
enabled = true
```

Check FraiseQL logs for the startup message:

INFO fraiseql::flight: Arrow Flight DataPlane listening on 0.0.0.0:8815

Confirm the port is not blocked by a firewall

“View not found” errors

Ensure the view name in your Flight query matches the sql_source in your schema definition exactly (case-sensitive).

Slow query performance

Arrow Flight is only as fast as the underlying PostgreSQL query. Check:

EXPLAIN ANALYZE SELECT * FROM v_analytics_sales WHERE ...;

Missing indexes on filter columns are the most common cause of slow analytics queries.

Large memory usage

The max_batch_size setting (default 65,536 rows) controls how many rows are buffered in memory. Reduce it for memory-constrained environments:

[analytics]
enabled = true
max_batch_size = 10000

Next Steps

Performance

Performance Guide — Query optimization and indexes for analytics views

Caching

Caching — Cache heavy aggregations

Federation

Federation — Combine analytics from multiple databases