For Data Engineers
Two planes, one schema — transactional GraphQL and columnar Arrow Flight from the same SQL definitions
Your Challenge
Data engineers face two workloads that traditionally require separate systems:
Transactional Reads
- Dashboard charts and KPIs
- Per-entity reads with filters
- Flexible GraphQL queries from the app
- Low-latency, per-request payloads
Analytical Bulk Access
- ML feature extraction to Pandas/Polars
- Full table scans to DuckDB or data lake
- OLAP drill-down with measures/dimensions
- Zero-copy columnar streaming
FraiseQL's two-plane architecture handles both from the same compiled schema — no separate ETL pipeline, no data duplication.
Two Planes, One Schema
✅ JSON Plane — v_* views → GraphQL
Transactional reads and dashboard data. JSONB composition in the database. One SQL query per GraphQL field.
-- v_* views return JSONB for the GraphQL layer
CREATE VIEW v_daily_metrics AS
SELECT
date_trunc('day', created_at)::date AS date,
jsonb_build_object(
'date', date_trunc('day', created_at)::date,
'total_users', count(DISTINCT user_id),
'avg_revenue', avg(revenue),
'p95_latency', percentile_cont(0.95)
WITHIN GROUP (ORDER BY latency)
) AS data
FROM tb_events
GROUP BY date_trunc('day', created_at);
-- Complex aggregations. Window functions. CTEs. All supported.
-- No magic. Just SQL. ✅ Arrow Plane — ta_* tables → Arrow Flight (port 50051)
Columnar bulk access for analytics pipelines. ta_* tables are exposed over Apache Arrow Flight — a gRPC protocol for zero-copy columnar transfer. No per-row serialization.
-- ta_* tables: flat, wide schemas — columnar storage is the goal
CREATE TABLE ta_events (
event_id UUID NOT NULL,
occurred_at TIMESTAMPTZ NOT NULL,
user_id UUID NOT NULL,
event_type TEXT NOT NULL,
value NUMERIC,
region TEXT,
plan TEXT
);
-- FraiseQL exposes ta_* over Arrow Flight on port 50051
-- Arrow Flight ticket types:
-- OptimizedView → query with filters, returns columnar Arrow
-- BulkExport → full table scan in Arrow record batches
-- BatchedQueries → multiple queries in one Flight call Fact Tables (tf_*) — OLAP in SQL
For OLAP queries and drill-down analytics, FraiseQL introduces fact tables — denormalized tables with explicit measures, dimensions, and auto-generated temporal buckets.
Structure
-- tf_* fact table: denormalized OLAP
CREATE TABLE tf_sales (
id UUID DEFAULT gen_random_uuid(),
occurred_at TIMESTAMPTZ NOT NULL, -- temporal column → auto-buckets
-- Denormalized filters (indexed SQL columns)
user_id UUID NOT NULL,
product_id UUID NOT NULL,
-- Measures (numeric, aggregatable)
revenue NUMERIC(10, 2) NOT NULL,
quantity INTEGER NOT NULL,
discount NUMERIC(5, 2) DEFAULT 0,
-- Dimensions (JSONB grouping attributes)
data JSONB NOT NULL DEFAULT '{}'
-- data: { "region": "eu", "channel": "web", "plan": "pro" }
); Python Declaration
@fraiseql.fact_table(
table_name="tf_sales",
measures=["revenue", "quantity", "discount"],
dimension_paths=["data.region", "data.channel", "data.plan"],
temporal_column="occurred_at", # auto-generates hour/day/week/month/quarter
denormalized_filters=["user_id", "product_id"],
)
class SalesFact:
pass
@fraiseql.aggregate_query(
fact_table="tf_sales",
auto_group_by=True,
auto_aggregates=True,
)
def sales_aggregates(
from_date: str | None = None,
to_date: str | None = None,
region: str | None = None,
bucket: str = "day", # hour | day | week | month | quarter
) -> list[SalesAggregate]:
pass The Rust compiler generates optimized SQL with GROUP BY, partial-index-aware WHERE, and temporal bucket expressions. No runtime interpretation.
Naming Conventions at a Glance
Every FraiseQL object has a prefix signalling its role and plane:
| Prefix | Type | Plane | Purpose |
|---|---|---|---|
tb_* | Base table | Storage | Raw transactional data |
v_* | JSON view | JSON / GraphQL | JSONB composition for application reads |
fn_* | SQL function | Write | Mutations, business logic in the database |
tf_* | Fact table | OLAP | Denormalized OLAP with measures & dimensions |
ta_* | Arrow table | Arrow Flight | Columnar bulk export, ML pipelines |
va_* | Vector Arrow view | Arrow Flight | Embedding / vector-native Arrow export |
tv_* | Materialized vector view | Arrow Flight | Pre-materialized Table Vector views |
Highlighted rows belong to the Arrow analytics plane.
Data Tools Integration via Arrow Flight
Polars — Zero-Copy DataFrames
from pyarrow import flight
import polars as pl
client = flight.connect("grpc://localhost:50051")
# Pull ta_events as Arrow (no JSON deserialization)
ticket = flight.Ticket(b'{"type":"OptimizedView","table":"ta_events","from_date":"2025-01-01"}')
df = pl.from_arrow(client.do_get(ticket).read_all())
# Native Polars operations on columnar data
print(df.group_by("event_type").agg(pl.col("value").sum())) DuckDB — In-Process OLAP
from pyarrow import flight
import duckdb
client = flight.connect("grpc://localhost:50051")
ticket = flight.Ticket(b'{"type":"OptimizedView","table":"tf_sales","region":"eu"}')
arrow_table = client.do_get(ticket).read_all()
# DuckDB operates directly on Arrow memory — no copy
result = duckdb.execute(
"SELECT data->>'channel' AS channel, SUM(revenue), COUNT(*) "
"FROM arrow_table GROUP BY 1 ORDER BY 2 DESC"
).df() Pandas — ML Feature Extraction
from pyarrow import flight
client = flight.connect("grpc://localhost:50051")
ticket = flight.Ticket(b'{"type":"BulkExport","table":"ta_events"}')
# Arrow → Pandas (fast zero-copy conversion)
df = client.do_get(ticket).read_all().to_pandas()
# Use directly for ML training
model.fit(df[['feature_a', 'feature_b']], df['label']) BI Tools via GraphQL (JSON Plane)
Dashboard tools consume the JSON plane directly through GraphQL:
- ✅ Tableau — Direct connection to GraphQL
- ✅ Looker — SQL views exposed as models
- ✅ Metabase — Self-service BI on
v_*views - ✅ Superset — Explore views with SQL editor
Analytics SQL Patterns
Pattern 1: Rolling Windows
CREATE VIEW v_rolling_metrics AS
SELECT
date,
jsonb_build_object(
'date', date,
'revenue_30d', SUM(revenue) OVER (
ORDER BY date ROWS BETWEEN 30 PRECEDING AND CURRENT ROW
),
'users_7d_avg', AVG(users) OVER (
ORDER BY date ROWS BETWEEN 7 PRECEDING AND CURRENT ROW
)
) AS data
FROM tb_daily_metrics; Pattern 2: Retention Cohorts (Arrow Export)
-- ta_* for Arrow Flight bulk export to Polars heatmap
CREATE TABLE ta_retention_cohorts AS
SELECT
date_trunc('month', u.created_at)::date AS cohort_month,
date_trunc('month', a.activity_date)::date AS activity_month,
u.user_id,
count(*) AS event_count
FROM tb_users u
JOIN tb_activity a ON a.user_id = u.user_id
GROUP BY 1, 2, u.user_id;
-- Polars builds the heatmap from Arrow Flight:
# df = pl.from_arrow(client.do_get(retention_ticket).read_all())
# cohort_matrix = df.pivot(values="event_count", index="cohort_month", columns="activity_month") Pattern 3: Funnel Analysis
CREATE VIEW v_conversion_funnel AS
SELECT
date_trunc('day', created_at)::date AS day,
jsonb_build_object(
'day', date_trunc('day', created_at)::date,
'signups', COUNT(DISTINCT CASE WHEN event = 'signup' THEN user_id END),
'verified', COUNT(DISTINCT CASE WHEN event = 'verified_email' THEN user_id END),
'paying', COUNT(DISTINCT CASE WHEN event = 'payment' THEN user_id END),
'churned', COUNT(DISTINCT CASE WHEN event = 'churn' THEN user_id END)
) AS data
FROM tb_events
GROUP BY date_trunc('day', created_at); Pattern 4: dbt + FraiseQL
-- dbt manages view definitions; FraiseQL exposes them as GraphQL
models/
├── staging/
│ └── stg_events.sql
├── marts/
│ ├── v_daily_metrics.sql -- v_* JSON view → GraphQL query
│ ├── tf_sales.sql -- tf_* fact table → aggregate query
│ └── ta_retention.sql -- ta_* Arrow table → Arrow Flight
-- dbt generates + maintains the SQL
-- FraiseQL compiles types + exposes both planes
-- Both use the database as source of truth Performance Optimization
Indexing for Dimension Queries
-- Partial index on tf_* dimension for fast GROUP BY
CREATE INDEX idx_tf_sales_region
ON tf_sales ((data->>'region'))
WHERE data->>'region' IS NOT NULL;
-- BRIN index for time-series range scans
CREATE INDEX idx_tf_sales_occurred
ON tf_sales USING BRIN (occurred_at);
-- Partial index for frequent time ranges
CREATE INDEX idx_ta_events_recent
ON ta_events(occurred_at)
WHERE occurred_at > NOW() - INTERVAL '90 days'; Materialized Views for Expensive Aggregations
-- Pre-compute expensive aggregation once per hour
CREATE MATERIALIZED VIEW tv_user_cohorts AS
SELECT
date_trunc('month', created_at)::date AS cohort_date,
COUNT(DISTINCT user_id) AS cohort_size,
SUM(ltv) AS cohort_ltv
FROM tb_events
GROUP BY date_trunc('month', created_at);
-- Refresh on a schedule via pg_cron or your orchestrator
SELECT cron.schedule('0 * * * *', 'REFRESH MATERIALIZED VIEW tv_user_cohorts');