Skip to content

Milo — neighborhood differential abundance

Precomputed Milo results served as an interactive viewer: a beeswarm-on-UMAP of every neighbourhood coloured by log2 fold change, plus a sortable table of the most-shifted neighbourhoods. Tool-callable from the copilot when both modules are enabled.

Milo (Dann et al., Nat. Biotechnol. 2022) tests differential abundance on overlapping cell neighbourhoods rather than discrete clusters, so it catches shifts that don't line up neatly with a cluster boundary. The actual graph construction and DA testing is done upstream with milopy or miloR; STELLAR only consumes the per-neighbourhood table.

extras_key milo
config_key milo
install pip install 'stellar-atlas[milo]'
frontend tab Milo

Enable

modules:
  milo:
    enabled: true
    source_dir: data/external/milo      # relative to project root

Input format

Three parquet files under source_dir/.

milo_runs.parquet — one row per Milo run

A "run" is one neighborhood-DA test — a specific cohort × contrast. You'll typically have one run per condition pair you care about.

column type required notes
run_id string yes Unique handle; short slug or sha.
label string yes Human-readable, e.g. "Disease vs Control".
comparison string yes Contrast string used in the test, e.g. "disease-control".
n_neighborhoods int64 no Hint only — STELLAR re-derives from the neighborhood table.

milo_neighborhoods.parquet — one row per neighborhood per run

column type required notes
run_id string yes FK → milo_runs.run_id.
nh_id int64 yes Integer id, unique within a run.
n_cells int64 yes Cells in this neighborhood.
x float32 yes UMAP x-centroid for plotting.
y float32 yes UMAP y-centroid for plotting.

milo_da_results.parquet — DA stats per neighborhood per run

column type required notes
run_id string yes FK → milo_runs.run_id.
nh_id int64 yes FK → milo_neighborhoods (run_id, nh_id).
log2fc float32 yes DA log2 fold change for this neighborhood.
pval float64 yes Raw p-value.
spatial_fdr float64 yes Spatial-FDR-adjusted q-value.

Note

Extra columns are ignored — feel free to keep upstream artefacts like SpatialFDR, NhoodGroup, etc. alongside.

Producing the input

A milopy recipe; the equivalent miloR pipeline (buildGraphmakeNhoodstestNhoods) produces the same per-neighborhood table.

import anndata as ad
import milopy
import milopy.core as milo
import numpy as np
import pandas as pd

adata = ad.read_h5ad("processed.h5ad")

# 1. Build neighborhoods on the existing kNN graph (uses adata.obsp).
milo.make_nhoods(adata, prop=0.1)
milo.count_nhoods(adata, sample_col="donor_id")

# 2. Run the GLM contrast — anything DESeq2-style works.
milo.DA_nhoods(adata,
               design="~ 0 + condition",
               model_contrasts="conditiondisease-conditioncontrol")

nh_adata = adata.uns["nhood_adata"]    # one obs row per neighborhood
res      = nh_adata.obs.copy()         # logFC, PValue, SpatialFDR, …

# 3. Reshape to STELLAR's canonical layout.
RUN_ID = "disease_vs_control"

runs = pd.DataFrame([{
    "run_id":          RUN_ID,
    "label":           "Disease vs Control",
    "comparison":      "disease-control",
    "n_neighborhoods": len(res),
}])

# Neighborhood centroid = mean UMAP across the nhood's index cells.
idx_cells = adata.obsm["nhoods"].toarray().astype(bool)   # (n_cells, n_nhoods)
umap      = adata.obsm["X_umap"]
centroids = np.vstack([umap[idx_cells[:, j]].mean(axis=0)
                       for j in range(idx_cells.shape[1])])

nhoods = pd.DataFrame({
    "run_id":  RUN_ID,
    "nh_id":   np.arange(len(res), dtype=np.int64),
    "n_cells": idx_cells.sum(axis=0).astype(np.int64),
    "x":       centroids[:, 0].astype(np.float32),
    "y":       centroids[:, 1].astype(np.float32),
})

da_results = pd.DataFrame({
    "run_id":      RUN_ID,
    "nh_id":       np.arange(len(res), dtype=np.int64),
    "log2fc":      res["logFC"].to_numpy(np.float32),
    "pval":        res["PValue"].to_numpy(np.float64),
    "spatial_fdr": res["SpatialFDR"].to_numpy(np.float64),
})

runs.to_parquet("data/external/milo/milo_runs.parquet", index=False)
nhoods.to_parquet("data/external/milo/milo_neighborhoods.parquet", index=False)
da_results.to_parquet("data/external/milo/milo_da_results.parquet", index=False)

Need multiple contrasts? Append rows with new run_id values to each of the three frames before writing.

API surface

route what
GET /api/milo/runs list runs, with live neighborhood counts
POST /api/milo/da body {run_id, top_n?, fdr_max?, log2fc_min?}; Arrow IPC of (nh_id, n_cells, x, y, log2fc, pval, spatial_fdr) ordered by |log2fc| DESC

Example calls

List the runs in this atlas:

curl -s http://localhost:18901/api/milo/runs | python -m json.tool
# {"runs": [{"run_id":          "disease_vs_control",
#            "label":           "Disease vs Control",
#            "comparison":      "disease-control",
#            "n_neighborhoods": 1247}]}

Pull top 25 most-shifted neighborhoods, FDR ≤ 0.1 (Arrow IPC binary):

curl -sX POST http://localhost:18901/api/milo/da \
     -H 'content-type: application/json' \
     -d '{"run_id":  "disease_vs_control",
          "top_n":   25,
          "fdr_max": 0.1}' \
     -o /tmp/milo_da.arrow

Copilot tools

When both milo and copilot are enabled the module contributes two tools to the Claude agent loop.

list_milo_runs

{
  "name": "list_milo_runs",
  "description": "List Milo neighborhood-DA runs available in this atlas, with neighborhood counts and comparison labels.",
  "input_schema": {"type": "object", "properties": {}, "required": []}
}

get_milo_da

{
  "name": "get_milo_da",
  "description": "Pull top neighborhoods for one Milo run, ranked by |log2fc|. Optional spatial-FDR cap.",
  "input_schema": {
    "type": "object",
    "properties": {
      "run_id":  {"type": "string"},
      "top_n":   {"type": "integer", "default": 25},
      "fdr_max": {"type": "number",  "default": 0.1}
    },
    "required": ["run_id"]
  }
}

System prompt fragment

Milo neighborhood-DA results are precomputed. A run is a single comparison; discover run_ids with list_milo_runs, then pull DA hits with get_milo_da. log2fc + spatial_fdr are per-neighborhood (not per-cluster).

Implementation at stellar/modules/milo/__init__.py — see Extending for the pattern.

Frontend tab

The Milo tab appears in the SPA nav when this module is enabled: run picker + threshold sliders along the top; a Plotly scattergl plot of each neighbourhood's UMAP centroid (sized by n_cells, coloured by log2fc on a diverging palette) takes the main pane; the most-shifted neighbourhoods land in a sortable table below.