Milo — neighborhood differential abundance¶
Precomputed Milo results served as an interactive viewer: a beeswarm-on-UMAP of every neighbourhood coloured by log2 fold change, plus a sortable table of the most-shifted neighbourhoods. Tool-callable from the copilot when both modules are enabled.
Milo (Dann et al., Nat. Biotechnol. 2022) tests differential abundance on overlapping cell neighbourhoods rather than discrete clusters, so it catches shifts that don't line up neatly with a cluster boundary. The actual graph construction and DA testing is done upstream with milopy or miloR; STELLAR only consumes the per-neighbourhood table.
| extras_key | milo |
| config_key | milo |
| install | pip install 'stellar-atlas[milo]' |
| frontend tab | Milo |
Enable¶
Input format¶
Three parquet files under source_dir/.
milo_runs.parquet — one row per Milo run¶
A "run" is one neighborhood-DA test — a specific cohort × contrast. You'll typically have one run per condition pair you care about.
| column | type | required | notes |
|---|---|---|---|
run_id |
string | yes | Unique handle; short slug or sha. |
label |
string | yes | Human-readable, e.g. "Disease vs Control". |
comparison |
string | yes | Contrast string used in the test, e.g. "disease-control". |
n_neighborhoods |
int64 | no | Hint only — STELLAR re-derives from the neighborhood table. |
milo_neighborhoods.parquet — one row per neighborhood per run¶
| column | type | required | notes |
|---|---|---|---|
run_id |
string | yes | FK → milo_runs.run_id. |
nh_id |
int64 | yes | Integer id, unique within a run. |
n_cells |
int64 | yes | Cells in this neighborhood. |
x |
float32 | yes | UMAP x-centroid for plotting. |
y |
float32 | yes | UMAP y-centroid for plotting. |
milo_da_results.parquet — DA stats per neighborhood per run¶
| column | type | required | notes |
|---|---|---|---|
run_id |
string | yes | FK → milo_runs.run_id. |
nh_id |
int64 | yes | FK → milo_neighborhoods (run_id, nh_id). |
log2fc |
float32 | yes | DA log2 fold change for this neighborhood. |
pval |
float64 | yes | Raw p-value. |
spatial_fdr |
float64 | yes | Spatial-FDR-adjusted q-value. |
Note
Extra columns are ignored — feel free to keep upstream artefacts
like SpatialFDR, NhoodGroup, etc. alongside.
Producing the input¶
A milopy recipe; the equivalent miloR pipeline (buildGraph →
makeNhoods → testNhoods) produces the same per-neighborhood table.
import anndata as ad
import milopy
import milopy.core as milo
import numpy as np
import pandas as pd
adata = ad.read_h5ad("processed.h5ad")
# 1. Build neighborhoods on the existing kNN graph (uses adata.obsp).
milo.make_nhoods(adata, prop=0.1)
milo.count_nhoods(adata, sample_col="donor_id")
# 2. Run the GLM contrast — anything DESeq2-style works.
milo.DA_nhoods(adata,
design="~ 0 + condition",
model_contrasts="conditiondisease-conditioncontrol")
nh_adata = adata.uns["nhood_adata"] # one obs row per neighborhood
res = nh_adata.obs.copy() # logFC, PValue, SpatialFDR, …
# 3. Reshape to STELLAR's canonical layout.
RUN_ID = "disease_vs_control"
runs = pd.DataFrame([{
"run_id": RUN_ID,
"label": "Disease vs Control",
"comparison": "disease-control",
"n_neighborhoods": len(res),
}])
# Neighborhood centroid = mean UMAP across the nhood's index cells.
idx_cells = adata.obsm["nhoods"].toarray().astype(bool) # (n_cells, n_nhoods)
umap = adata.obsm["X_umap"]
centroids = np.vstack([umap[idx_cells[:, j]].mean(axis=0)
for j in range(idx_cells.shape[1])])
nhoods = pd.DataFrame({
"run_id": RUN_ID,
"nh_id": np.arange(len(res), dtype=np.int64),
"n_cells": idx_cells.sum(axis=0).astype(np.int64),
"x": centroids[:, 0].astype(np.float32),
"y": centroids[:, 1].astype(np.float32),
})
da_results = pd.DataFrame({
"run_id": RUN_ID,
"nh_id": np.arange(len(res), dtype=np.int64),
"log2fc": res["logFC"].to_numpy(np.float32),
"pval": res["PValue"].to_numpy(np.float64),
"spatial_fdr": res["SpatialFDR"].to_numpy(np.float64),
})
runs.to_parquet("data/external/milo/milo_runs.parquet", index=False)
nhoods.to_parquet("data/external/milo/milo_neighborhoods.parquet", index=False)
da_results.to_parquet("data/external/milo/milo_da_results.parquet", index=False)
Need multiple contrasts? Append rows with new run_id values to each
of the three frames before writing.
API surface¶
| route | what |
|---|---|
GET /api/milo/runs |
list runs, with live neighborhood counts |
POST /api/milo/da |
body {run_id, top_n?, fdr_max?, log2fc_min?}; Arrow IPC of (nh_id, n_cells, x, y, log2fc, pval, spatial_fdr) ordered by |log2fc| DESC |
Example calls¶
List the runs in this atlas:
curl -s http://localhost:18901/api/milo/runs | python -m json.tool
# {"runs": [{"run_id": "disease_vs_control",
# "label": "Disease vs Control",
# "comparison": "disease-control",
# "n_neighborhoods": 1247}]}
Pull top 25 most-shifted neighborhoods, FDR ≤ 0.1 (Arrow IPC binary):
curl -sX POST http://localhost:18901/api/milo/da \
-H 'content-type: application/json' \
-d '{"run_id": "disease_vs_control",
"top_n": 25,
"fdr_max": 0.1}' \
-o /tmp/milo_da.arrow
Copilot tools¶
When both milo and copilot are enabled the module contributes two
tools to the Claude agent loop.
list_milo_runs¶
{
"name": "list_milo_runs",
"description": "List Milo neighborhood-DA runs available in this atlas, with neighborhood counts and comparison labels.",
"input_schema": {"type": "object", "properties": {}, "required": []}
}
get_milo_da¶
{
"name": "get_milo_da",
"description": "Pull top neighborhoods for one Milo run, ranked by |log2fc|. Optional spatial-FDR cap.",
"input_schema": {
"type": "object",
"properties": {
"run_id": {"type": "string"},
"top_n": {"type": "integer", "default": 25},
"fdr_max": {"type": "number", "default": 0.1}
},
"required": ["run_id"]
}
}
System prompt fragment¶
Milo neighborhood-DA results are precomputed. A run is a single comparison; discover run_ids with
list_milo_runs, then pull DA hits withget_milo_da. log2fc + spatial_fdr are per-neighborhood (not per-cluster).
Implementation at stellar/modules/milo/__init__.py — see
Extending for the pattern.
Frontend tab¶
The Milo tab appears in the SPA nav when this module is enabled:
run picker + threshold sliders along the top; a Plotly scattergl plot
of each neighbourhood's UMAP centroid (sized by n_cells, coloured by
log2fc on a diverging palette) takes the main pane; the most-shifted
neighbourhoods land in a sortable table below.