Skip to content

DE — differential expression

Precomputed differential expression viewer: an interactive volcano + a sortable, filterable table over your DE results. Tool-callable from the copilot when both modules are enabled.

extras_key de
config_key de
install pip install 'stellar-atlas[de]'
frontend tab DE

Enable

modules:
  de:
    enabled: true
    source_dir: data/external/de_results   # relative to project root

Input format

Two parquet files under source_dir/.

comparisons.parquet — one row per DE contrast

column type required notes
comparison_id string yes Unique handle; use a short slug or sha.
family string yes Grouping (e.g. "conditions", "subtypes").
label string yes Human-readable, e.g. "disease vs control · T-cell".
cell_type string yes Cell type the comparison was run within.
group_a string yes Numerator group label.
group_b string yes Denominator / reference group label.
n_a int64 no Cell count in group A.
n_b int64 no Cell count in group B.
method string no "MAST", "wilcoxon", "deseq2", …

results.parquet — long table of (comparison × gene)

column type required
comparison_id string yes
gene string yes
log2fc float32 yes
pval float64 yes
padj float64 yes

Extra columns are preserved

STELLAR enforces the columns above and ignores extras — you can keep pass-through columns like pct_expressed_a, stat, tstat in the parquet, they just aren't surfaced by the API.

Producing the input

Any DE tool works; the only contract is the two parquet files above.

import scanpy as sc
import pandas as pd

sc.tl.rank_genes_groups(adata, "leiden", method="wilcoxon")
rgg = adata.uns["rank_genes_groups"]
rows = []
for group in rgg["names"].dtype.names:
    rows.append(pd.DataFrame({
        "comparison_id": f"leiden:{group}_vs_rest",
        "gene":           rgg["names"][group],
        "log2fc":         rgg["logfoldchanges"][group],
        "pval":           rgg["pvals"][group],
        "padj":           rgg["pvals_adj"][group],
    }))
pd.concat(rows).to_parquet("data/external/de_results/results.parquet")

See the scanpy rank_genes_groups docs for the underlying tool.

Not in v1.0 — convert manually with pandas.read_csv → rename columns to the canonical schema (comparison_id, gene, log2fc, pval, padj) → to_parquet. A stellar.modules.de.helpers namespace is reserved for future conversion utilities; we'll fill it in a 1.x release once a few users have asked.

API surface

When enabled:

route what
GET /api/de/families distinct family values
GET /api/de/comparisons?family=&cell_type= list comparisons, filterable
GET /api/de/comparison/{id} single comparison metadata
POST /api/de/results body {comparison_id, top_n, padj_max, log2fc_min} — returns Arrow IPC

Example calls

List comparison families:

curl -s http://localhost:18901/api/de/families | python -m json.tool
# {"families": ["conditions", "subtypes"]}

List comparisons within a family + cell type:

curl -s 'http://localhost:18901/api/de/comparisons?family=conditions&cell_type=T' \
     | python -m json.tool
# {"comparisons": [
#   {"comparison_id": "T_disease_vs_healthy",
#    "family":        "conditions",
#    "label":         "Disease vs Healthy · T",
#    "cell_type":     "T",
#    "group_a":       "disease",
#    "group_b":       "healthy",
#    "n_a":           1024,
#    "n_b":           987,
#    "method":        "wilcoxon"}]}

Pull top 25 DE results (Arrow IPC binary stream):

curl -sX POST http://localhost:18901/api/de/results \
     -H 'content-type: application/json' \
     -d '{"comparison_id": "T_disease_vs_healthy",
          "top_n":         25,
          "padj_max":      0.05}' \
     -o /tmp/de_results.arrow

The response is an Apache Arrow IPC stream (content-type: application/vnd.apache.arrow.stream) with columns gene, log2fc, pval, padj — the frontend reads it via @apache-arrow/esnext-esm. Read in Python:

import pyarrow.ipc as ipc
with open("/tmp/de_results.arrow", "rb") as f:
    table = ipc.open_stream(f).read_all()
print(table.to_pandas().head())

Copilot tools

When both de and copilot are enabled the module contributes two tools to the Claude agent loop.

list_de_comparisons

Discover comparison IDs (which are project-specific opaque strings).

{
  "name": "list_de_comparisons",
  "description": "List precomputed DE comparisons, optionally filtered by family or cell type. Returns each comparison's id, family, label, cell_type, group_a, group_b.",
  "input_schema": {
    "type": "object",
    "properties": {
      "family":    {"type": "string"},
      "cell_type": {"type": "string"}
    },
    "required": []
  }
}

compare_groups

Pull top genes for one comparison.

{
  "name": "compare_groups",
  "description": "Pull top genes for a precomputed DE comparison. Discover comparison_ids first via list_de_comparisons. Returns gene / log2fc / pval / padj rows sorted by padj.",
  "input_schema": {
    "type": "object",
    "properties": {
      "comparison_id": {"type": "string"},
      "top_n":         {"type": "integer", "default": 25},
      "padj_max":      {"type": "number",  "default": 0.05}
    },
    "required": ["comparison_id"]
  }
}

System prompt fragment

The module contributes this paragraph to the copilot system prompt:

Differential expression (DE) comparisons are precomputed. Discover available comparison_ids with list_de_comparisons (filterable by family / cell_type), then pull top hits with compare_groups. Never invent a comparison_id — they're project-specific strings that must be discovered.

The actual implementation of these tools lives at stellar/modules/de/__init__.py — see Extending for the pattern to mimic in your own module.

Frontend tab

The DE tab appears in the SPA nav when this module is enabled: comparison picker → volcano (Plotly scattergl) → sortable table → "Send up-genes to Enrichment" (active once the enrichment module is on).

FAQ

My DE tool reports avg_log2FC / logFC / coef — what column name does STELLAR want?

Rename it to log2fc. Same for p_val_adjpadj, p_valpval. The strict column names are deliberate — every downstream SQL query in the routes assumes them.

Can I have multiple comparison families?

Yes — the family column is a free-form string and the API filters on it. Common splits: conditions (Treatment vs Control), subtypes (one cell-subtype vs the rest), donors (per-donor).