Analytics & BI Platforms with FraiseQL

Pre-compute metrics in SQL. Serve them with GraphQL. Stream columnar data with Arrow Flight.

Analytics Challenge: Two Different Workloads

Analytics and BI platforms require two fundamentally different data access patterns:

Transactional Reads (Dashboards)

  • Complex aggregations over normalized data
  • Time-based slicing (day / week / month)
  • Consistent metrics across every chart
  • Sub-second dashboard load time

Analytical Bulk Access (Data Pipelines)

  • Full table scans for ML feature extraction
  • Zero-copy export to Polars/DuckDB/Pandas
  • OLAP cube queries with drill-down dimensions
  • Real-time columnar streaming

Most stacks use two completely separate systems for these. FraiseQL solves both from the same compiled schema with its two-plane architecture.

Learn about the two-plane architecture →

JSON Plane: GraphQL Dashboards (v_*)

For dashboard widgets and time-series charts, pre-compute metrics in SQL views. Each view returns everything a widget needs in a single query.

Strategy: SQL Aggregation, GraphQL Delivery

Pre-compose metrics in views instead of calculating them in dashboard code:

-- Daily sales metrics (v_* JSON plane)
CREATE VIEW v_daily_sales_metrics AS
SELECT
  date_trunc('day', created_at)::date AS date,
  jsonb_build_object(
    'date', date_trunc('day', created_at)::date,
    'total_revenue', sum(amount)::numeric,
    'order_count', count(*),
    'average_order_value', avg(amount)::numeric
  ) AS data
FROM tb_order
GROUP BY date_trunc('day', created_at);

Define Types and Query

@fraiseql.type
class DailySalesMetrics:
  date: str
  total_revenue: float
  order_count: int
  average_order_value: float

@fraiseql.query(sql_source="v_daily_sales_metrics")
def daily_sales(
  from_date: str,
  to_date: str
) -> list[DailySalesMetrics]:
  pass

Advanced JSON Patterns

Window functions, CTEs, and multi-table aggregations all work inside v_* views:

-- 30-day rolling average
CREATE VIEW v_rolling_metrics AS
SELECT
  date,
  jsonb_build_object(
    'date', date,
    'daily_revenue', amount,
    'rolling_30d_avg', avg(amount) OVER (
      ORDER BY date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
    ),
    'rolling_7d_sum', sum(amount) OVER (
      ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
    )
  ) AS data
FROM (
  SELECT date_trunc('day', created_at)::date AS date, sum(amount) AS amount
  FROM tb_order
  GROUP BY date_trunc('day', created_at)
) daily_sales;

OLAP Plane: Fact Tables (tf_*)

For complex drill-down analytics and OLAP queries, use fact tables — denormalized tables with explicit measures, dimensions, and auto-generated temporal buckets.

Fact Table Structure

-- tf_* fact table: denormalized for OLAP
CREATE TABLE tf_sales (
  id           UUID DEFAULT gen_random_uuid(),
  occurred_at  TIMESTAMPTZ NOT NULL,   -- temporal bucket source
  user_id      UUID NOT NULL,          -- denormalized filter (indexed)
  product_id   UUID NOT NULL,

  -- Measures (aggregatable numeric columns)
  revenue      NUMERIC(10, 2) NOT NULL,
  quantity     INTEGER NOT NULL,
  discount     NUMERIC(5, 2) DEFAULT 0,

  -- Dimensions (flexible JSONB grouping attributes)
  data         JSONB NOT NULL DEFAULT '{}'
  -- data: { "region": "eu", "channel": "web", "plan": "pro" }
);

Declare with Python Decorators

@fraiseql.fact_table(
    table_name="tf_sales",
    measures=["revenue", "quantity", "discount"],
    dimension_paths=["data.region", "data.channel", "data.plan"],
    temporal_column="occurred_at",    # auto-generates day/week/month/quarter buckets
    denormalized_filters=["user_id", "product_id"],
)
class SalesFact:
    pass

@fraiseql.aggregate_query(
    fact_table="tf_sales",
    auto_group_by=True,
    auto_aggregates=True,
)
def sales_aggregates(
    from_date: str | None = None,
    to_date: str | None = None,
    region: str | None = None,
    bucket: str = "day",   # hour | day | week | month | quarter
) -> list[SalesAggregate]:
    pass

The Rust engine compiles this into an optimized SQL query with GROUP BY, partial-index-aware WHERE clauses, and temporal bucket expressions. No runtime interpretation.

Arrow Flight: Columnar Streaming (ta_*)

For bulk data access — ML feature extraction, data lake sync, Polars/DuckDB pipelines — FraiseQL exposes ta_* tables over Apache Arrow Flight on port 50051.

Zero-Copy Analytics with Polars

from pyarrow import flight
import polars as pl

client = flight.connect("grpc://localhost:50051")

# Fetch ta_* table as columnar Arrow (zero-copy)
ticket = flight.Ticket(b'{"type":"OptimizedView","table":"ta_events","from_date":"2025-01-01"}')
df = pl.from_arrow(client.do_get(ticket).read_all())

# Native Polars operations on columnar data — no JSON overhead
print(df.group_by("event_type").agg(pl.col("value").sum()))

DuckDB In-Process Analytics

import duckdb
from pyarrow import flight

client = flight.connect("grpc://localhost:50051")
ticket = flight.Ticket(b'{"type":"BulkExport","table":"tf_sales"}')
arrow_table = client.do_get(ticket).read_all()

# DuckDB operates directly on Arrow memory — no deserialization
result = duckdb.execute(
  "SELECT region, SUM(revenue), COUNT(*) FROM arrow_table GROUP BY region"
).df()

10-100x Faster Export

Arrow columnar transfer eliminates per-row JSON serialization for bulk reads.

Zero-Copy

Polars, DuckDB, and PyArrow operate directly on the Arrow buffer — no copy step.

Same Schema

Both planes — GraphQL JSON and Arrow Flight — come from the same compiled schema.json.

Streaming Batches

BulkExport streams Arrow record batches (~64k rows each) with minimal memory overhead.

Real-World Example: SaaS Metrics Dashboard

Dashboard Widgets Needed:

  • Monthly recurring revenue (MRR) — chart over time
  • Churn rate by plan — grouped bar chart
  • Customer acquisition cost — KPI card
  • Retention cohorts — heatmap (bulk Polars)

FraiseQL Approach:

-- MRR + churn in one v_* view (JSON plane → dashboard charts)
CREATE VIEW v_saas_metrics AS
SELECT
  date_trunc('month', created_at)::date AS period,
  jsonb_build_object(
    'period', date_trunc('month', created_at)::date,
    'mrr', (
      SELECT sum(CASE plan WHEN 'pro' THEN 99 WHEN 'enterprise' THEN 299 ELSE 0 END)
      FROM tb_subscription s
      WHERE s.created_at <= c.period
        AND (s.cancelled_at IS NULL OR s.cancelled_at > c.period)
    ),
    'churn_rate', (
      SELECT count(*) FILTER (WHERE cancelled_at IS NOT NULL) * 100.0
        / NULLIF(count(*), 0)
      FROM tb_subscription
      WHERE date_trunc('month', created_at) = c.period
    )
  ) AS data
FROM (SELECT DISTINCT date_trunc('month', created_at)::date AS period FROM tb_subscription) c;

-- Retention cohort data exported via Arrow Flight (ta_* → Polars heatmap)
CREATE TABLE ta_retention_cohorts (
  cohort_month   DATE NOT NULL,
  activity_month DATE NOT NULL,
  user_id        UUID NOT NULL,
  events_count   INTEGER
);

Result: Dashboard charts use GraphQL with the JSON plane. The cohort heatmap uses Arrow Flight to pull a Polars DataFrame. One schema handles both.

Related Resources

⚡ Analytics Dataplane

Full reference: two-plane architecture, tf_*, ta_*, va_*, naming conventions.

Read →

📊 Data Engineers

Arrow Flight integration, OLAP patterns, and dbt workflows.

Read →

🏢 Enterprise SaaS

Analytics with multi-tenant isolation.

Read →

Build Your Analytics Platform

Get started with FraiseQL analytics patterns today.