Skip to content

Architecture

STELLAR is three layers: a fixed-convention storage layer, a FastAPI app factory that mounts modules dynamically, and a single compiled React SPA whose tabs are gated at runtime by /api/config. Each layer is deliberately small.

flowchart TB
    subgraph "User project"
        Y["stellar.yaml"]
        H["data/raw/*.h5ad"]
        E["data/external/{de,hdwgcna,...}"]
    end

    Y -->|stellar ingest| ING
    H --> ING
    E --> ING

    subgraph "Storage layer  (fixed convention)"
        ING[ingest orchestrator]
        L["data/lance/expression_*.lance<br/>(gene-major)"]
        D["data/duckdb/atlas.duckdb<br/>(metadata + module tables)"]
        S["data/static/coords_*.arrow<br/>(pre-baked UMAP)"]
        P["data/parquet/&lt;module&gt;/*.parquet"]
        ING --> L
        ING --> D
        ING --> S
        ING --> P
        P --> D
    end

    L -.read.-> API
    D -.read.-> API
    S -.read.-> API

    subgraph "Serving layer"
        API["FastAPI app<br/>core routes + module routers"]
        SPA["React SPA<br/>(pre-built bundle, tabs gated by /api/config)"]
        API --> SPA
    end

    SPA -->|stellar deploy| WEB[Static webserver / nginx]

Storage layer — LanceDB + DuckDB

Every project gets the same two stores; no project may bypass the convention.

LanceDB — the cell × gene matrix in gene-major layout

The cell × gene matrix lives in LanceDB as one row per gene with sparse (cell_idx, value) columns. The critical access pattern in a single-cell atlas is "colour the UMAP by gene X", which is an O(1) row read against a gene-major store instead of a column slice through millions of cells. The same Lance files hold IVF-PQ vector indexes for cell- and gene-similarity k-NN queries.

If your project ships a wide matrix alongside the primary matrix (see input.matrices), the expression routes fall back to the wider matrix when the user queries a gene that's missing from the primary panel.

DuckDB — every non-expression table

Everything that isn't expression — cells, donors, genes, DE comparisons + results, hdWGCNA modules, CellChat networks, Milo neighbourhoods, pseudobulk pre-compute — lives in a single embedded DuckDB file: data/duckdb/atlas.duckdb.

DuckDB gives us:

  • Sub-second columnar SQL aggregates against millions of rows.
  • No separate database server to manage, no port to expose.
  • A single file you can rsync to the deploy target alongside the Lance directories.

Module ingest writes parquet under data/parquet/<module>/; the orchestrator then runs a CREATE OR REPLACE TABLE … AS SELECT * FROM read_parquet(...) per module. The DuckDB file is rebuilt every ingest (no in-place migrations).

Static coords — pre-baked Arrow

data/static/coords_*.arrow holds the UMAP projection as Apache Arrow IPC, one file per matrix. The frontend pulls the file once at boot — no round-trip per zoom or pan.

Serving layer — FastAPI app factory + module mounting

stellar.backend.app.create_app(config, project_root, modules) is a factory that returns a fresh FastAPI instance:

app = FastAPI(title=config.project.title, openapi_url=None)

# Always-on core routes (mounted under /api):
#   /api/config, /api/describe, /api/genes, /api/embedding/coords,
#   /api/embedding/expression, /api/embedding/colorby, …
app.include_router(core_routes.router, prefix="/api")

# For every module enabled in stellar.yaml:
for module in enabled_modules(config):
    router = module.routes()
    if router is not None:
        app.include_router(router, prefix="/api")

# The SPA is mounted at config.project.base_url, with /assets/ for hashed
# bundle output.
_mount_frontend(app)

A module that isn't enabled isn't mounted — no runtime branching, no unused routes in the OpenAPI surface, no import of its dependencies.

The factory is the only place that knows the difference between core and module routes. Anything else operates on the assembled FastAPI instance.

Frontend — one compiled bundle, config-driven nav

The SPA ships pre-built in the wheel. There is no per-project React build at deploy time — stellar build-frontend is needed only when you want to rebuild against a different base_url or change the SPA source.

At boot the SPA fetches GET /api/config and reads:

  • project.title, branding.primary_color, branding.logo — header
  • theme
  • modules.<name>.enabled — which tabs to render
  • tabs[*] — the frontend_tabs() output from every enabled module, merged + sorted by order

Modules that aren't enabled contribute no tabs and no nav entries. Adding a new module is therefore a server-side change only — the SPA needs no rebuild as long as the module's tab definition points at an existing frontend route (or a stock module-tab component).

Lifecycle summary

CLI command What it does
stellar init <slug> Scaffold stellar.yaml + data/raw/ skeleton
stellar ingest Orchestrator runs core ingest + every enabled module's ingest()
stellar build-frontend Vite build of the React SPA (only needed for non-default base_url)
stellar serve Uvicorn on :18901 against the app factory
stellar deploy rsync dist/ + static artefacts to the deploy target
stellar doctor Validate stellar.yaml against the data directory

The contract is: the same package code serves every project. No per-project Python forks; no per-project frontend forks; differences live in stellar.yaml and the contents of data/.