Milo — neighborhood differential abundance¶

Precomputed Milo results served as an interactive viewer: a beeswarm-on-UMAP of every neighbourhood coloured by log2 fold change, plus a sortable table of the most-shifted neighbourhoods. Tool-callable from the copilot when both modules are enabled.

Milo (Dann et al., Nat. Biotechnol. 2022) tests differential abundance on overlapping cell neighbourhoods rather than discrete clusters, so it catches shifts that don't line up neatly with a cluster boundary. The actual graph construction and DA testing is done upstream with milopy or miloR; STELLAR only consumes the per-neighbourhood table.


extras_key	`milo`
config_key	`milo`
install	`pip install 'stellar-atlas[milo]'`
frontend tab	`Milo`

Enable¶

modules:
  milo:
    enabled: true
    source_dir: data/external/milo      # relative to project root

Input format¶

Three parquet files under source_dir/.

`milo_runs.parquet` — one row per Milo run¶

A "run" is one neighborhood-DA test — a specific cohort × contrast. You'll typically have one run per condition pair you care about.

column	type	required	notes
`run_id`	string	yes	Unique handle; short slug or sha.
`label`	string	yes	Human-readable, e.g. `"Disease vs Control"`.
`comparison`	string	yes	Contrast string used in the test, e.g. `"disease-control"`.
`n_neighborhoods`	int64	no	Hint only — STELLAR re-derives from the neighborhood table.

`milo_neighborhoods.parquet` — one row per neighborhood per run¶

column	type	required	notes
`run_id`	string	yes	FK → `milo_runs.run_id`.
`nh_id`	int64	yes	Integer id, unique within a run.
`n_cells`	int64	yes	Cells in this neighborhood.
`x`	float32	yes	UMAP x-centroid for plotting.
`y`	float32	yes	UMAP y-centroid for plotting.

`milo_da_results.parquet` — DA stats per neighborhood per run¶

column	type	required	notes
`run_id`	string	yes	FK → `milo_runs.run_id`.
`nh_id`	int64	yes	FK → `milo_neighborhoods` (run_id, nh_id).
`log2fc`	float32	yes	DA log2 fold change for this neighborhood.
`pval`	float64	yes	Raw p-value.
`spatial_fdr`	float64	yes	Spatial-FDR-adjusted q-value.

Note

Extra columns are ignored — feel free to keep upstream artefacts like SpatialFDR, NhoodGroup, etc. alongside.

Producing the input¶

A milopy recipe; the equivalent miloR pipeline (buildGraph → makeNhoods → testNhoods) produces the same per-neighborhood table.

import anndata as ad
import milopy
import milopy.core as milo
import numpy as np
import pandas as pd

adata = ad.read_h5ad("processed.h5ad")

# 1. Build neighborhoods on the existing kNN graph (uses adata.obsp).
milo.make_nhoods(adata, prop=0.1)
milo.count_nhoods(adata, sample_col="donor_id")

# 2. Run the GLM contrast — anything DESeq2-style works.
milo.DA_nhoods(adata,
               design="~ 0 + condition",
               model_contrasts="conditiondisease-conditioncontrol")

nh_adata = adata.uns["nhood_adata"]    # one obs row per neighborhood
res      = nh_adata.obs.copy()         # logFC, PValue, SpatialFDR, …

# 3. Reshape to STELLAR's canonical layout.
RUN_ID = "disease_vs_control"

runs = pd.DataFrame([{
    "run_id":          RUN_ID,
    "label":           "Disease vs Control",
    "comparison":      "disease-control",
    "n_neighborhoods": len(res),
}])

# Neighborhood centroid = mean UMAP across the nhood's index cells.
idx_cells = adata.obsm["nhoods"].toarray().astype(bool)   # (n_cells, n_nhoods)
umap      = adata.obsm["X_umap"]
centroids = np.vstack([umap[idx_cells[:, j]].mean(axis=0)
                       for j in range(idx_cells.shape[1])])

nhoods = pd.DataFrame({
    "run_id":  RUN_ID,
    "nh_id":   np.arange(len(res), dtype=np.int64),
    "n_cells": idx_cells.sum(axis=0).astype(np.int64),
    "x":       centroids[:, 0].astype(np.float32),
    "y":       centroids[:, 1].astype(np.float32),
})

da_results = pd.DataFrame({
    "run_id":      RUN_ID,
    "nh_id":       np.arange(len(res), dtype=np.int64),
    "log2fc":      res["logFC"].to_numpy(np.float32),
    "pval":        res["PValue"].to_numpy(np.float64),
    "spatial_fdr": res["SpatialFDR"].to_numpy(np.float64),
})

runs.to_parquet("data/external/milo/milo_runs.parquet", index=False)
nhoods.to_parquet("data/external/milo/milo_neighborhoods.parquet", index=False)
da_results.to_parquet("data/external/milo/milo_da_results.parquet", index=False)

Need multiple contrasts? Append rows with new run_id values to each of the three frames before writing.

API surface¶

route	what
`GET /api/milo/runs`	list runs, with live neighborhood counts
`POST /api/milo/da`	body `{run_id, top_n?, fdr_max?, log2fc_min?}`; Arrow IPC of `(nh_id, n_cells, x, y, log2fc, pval, spatial_fdr)` ordered by `\|log2fc\| DESC`

Example calls¶

List the runs in this atlas:

curl -s http://localhost:18901/api/milo/runs | python -m json.tool
# {"runs": [{"run_id":          "disease_vs_control",
#            "label":           "Disease vs Control",
#            "comparison":      "disease-control",
#            "n_neighborhoods": 1247}]}

Pull top 25 most-shifted neighborhoods, FDR ≤ 0.1 (Arrow IPC binary):

curl -sX POST http://localhost:18901/api/milo/da \
     -H 'content-type: application/json' \
     -d '{"run_id":  "disease_vs_control",
          "top_n":   25,
          "fdr_max": 0.1}' \
     -o /tmp/milo_da.arrow

Copilot tools¶

When both milo and copilot are enabled the module contributes two tools to the Claude agent loop.

`list_milo_runs`¶

{
  "name": "list_milo_runs",
  "description": "List Milo neighborhood-DA runs available in this atlas, with neighborhood counts and comparison labels.",
  "input_schema": {"type": "object", "properties": {}, "required": []}
}

`get_milo_da`¶

{
  "name": "get_milo_da",
  "description": "Pull top neighborhoods for one Milo run, ranked by |log2fc|. Optional spatial-FDR cap.",
  "input_schema": {
    "type": "object",
    "properties": {
      "run_id":  {"type": "string"},
      "top_n":   {"type": "integer", "default": 25},
      "fdr_max": {"type": "number",  "default": 0.1}
    },
    "required": ["run_id"]
  }
}

System prompt fragment¶

Milo neighborhood-DA results are precomputed. A run is a single comparison; discover run_ids with list_milo_runs, then pull DA hits with get_milo_da. log2fc + spatial_fdr are per-neighborhood (not per-cluster).

Implementation at stellar/modules/milo/__init__.py — see Extending for the pattern.

Frontend tab¶

The Milo tab appears in the SPA nav when this module is enabled: run picker + threshold sliders along the top; a Plotly scattergl plot of each neighbourhood's UMAP centroid (sized by n_cells, coloured by log2fc on a diverging palette) takes the main pane; the most-shifted neighbourhoods land in a sortable table below.