Recipe — PBMC 3K, end-to-end¶
A full walkthrough of STELLAR against the 10X Genomics public
PBMC 3K dataset
(~2,700 cells × ~13k genes, scanpy-processed). This is the example
shipped under examples/pbmc_3k/ and the same one CI ingests on every
PR.
By the end of this recipe you'll have:
- a
pbmc_3k.h5adwith HVGs + leiden clustering + UMAP, - the LanceDB + DuckDB stores under
data/, - uvicorn serving the API on
127.0.0.1:18901, - the SPA available at
http://localhost:18901/pbmc_3k/.
No screenshots in this first pass — they'll land in v0.2 once the SPA's layout has stabilised.
Step 0 — Install¶
You need:
stellar-atlasitself (any extras you want);[dev]to pull inhttpxso thebootstrap.pyscript can call scanpy's loader cleanly;scanpyforsc.datasets.pbmc3k_processed()and the standard preprocessing steps.
A bare pip install stellar-atlas is sufficient if you already have a
processed h5ad — the dev extras above are only for fetching + building
the example data.
Step 1 — Build the example h5ad¶
What this does (read the script for the gory details):
- Calls
sc.datasets.pbmc3k_processed()— downloads the cached 10X PBMC 3K dataset (~6 MB). - Picks 2,000 highly-variable genes.
- Runs
sc.pp.neighbors,sc.tl.leiden,sc.tl.umapwith the tutorial defaults. - Rewrites the leiden labels into friendlier names
(
CD14+ Mono,Memory T, …) and stores them inobs["leiden_label"]. - Stamps every cell with
condition="healthy"anddonor_id="PBMC_3K"so the STELLAR cohort schema has values to read. - Writes
data/raw/pbmc_3k.h5ad.
You should see ~30 seconds of scanpy output and a ~6 MB h5ad on disk:
Step 2 — Inspect the example stellar.yaml¶
The interesting bits:
project:
slug: pbmc_3k
title: "PBMC 3K — STELLAR Example"
base_url: "/pbmc_3k/"
input:
matrices:
- { name: primary, path: data/raw/pbmc_3k.h5ad, role: primary }
cohort:
cell_type_column: leiden_label # bootstrap stored friendly names here
condition_column: condition # bootstrap set every cell to "healthy"
donor_column: donor_id # bootstrap set every cell to "PBMC_3K"
umap:
obsm_key: X_umap
Every optional module is enabled: false for this example — we're
exercising the core path.
Step 3 — Ingest¶
The orchestrator:
- Validates
stellar.yamlagainst the pydantic schema. - Reads the h5ad, builds the gene-major Lance store at
data/lance/expression_primary.lance/. - Writes per-cell metadata (
cells_vview) and per-gene metadata intodata/duckdb/atlas.duckdb. - Bakes the UMAP coordinates to
data/static/coords_primary.arrow. - Iterates the enabled modules'
ingest()— none for this example.
The step is idempotent: re-running drops + recreates the stores without leftover state.
Step 4 — Serve¶
uvicorn comes up on 127.0.0.1:18901. Expect log lines like:
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:18901 (Press CTRL+C to quit)
Leave it running and open another terminal for the next step.
Step 5 — Hit the API¶
The runtime config the SPA pulls at boot:
You should see your project title, branding color, module map (all
enabled: false), and the matrix descriptors.
A coords request for the first 500 cells — the same call the SPA's
UMAP renderer makes at boot, modulo limit:
curl -sX POST http://localhost:18901/api/embedding/coords \
-H 'content-type: application/json' \
-d '{"limit": 500}' \
-o /tmp/coords.arrow
ls -lh /tmp/coords.arrow
The response is Apache Arrow IPC (content-type:
application/vnd.apache.arrow.stream); use pyarrow.ipc.open_stream
in Python to read it.
The per-cell-type roster:
…lists every distinct leiden_label with its cell count.
Step 6 — Open the SPA¶
In a browser:
http://localhost:18901/pbmc_3k/
You should see the STELLAR header (teal accent, project title from
stellar.yaml), the UMAP coloured by cell type, and the Intro +
UMAP tabs. Other tabs (DE, Network, …) are absent — they're gated
on the module being enabled in stellar.yaml.
Step 7 — Tear down¶
Ctrl-C in the uvicorn terminal stops the server. Re-running
stellar serve picks up the exact same state — nothing on disk is
ephemeral aside from the Python process itself.
To start fresh:
rm -rf data/lance data/duckdb data/static data/parquet
stellar ingest --config examples/pbmc_3k/stellar.yaml