DE — differential expression¶

Precomputed differential expression viewer: an interactive volcano + a sortable, filterable table over your DE results. Tool-callable from the copilot when both modules are enabled.


extras_key	`de`
config_key	`de`
install	`pip install 'stellar-atlas[de]'`
frontend tab	`DE`

Enable¶

modules:
  de:
    enabled: true
    source_dir: data/external/de_results   # relative to project root

Input format¶

Two parquet files under source_dir/.

`comparisons.parquet` — one row per DE contrast¶

column	type	required	notes
`comparison_id`	string	yes	Unique handle; use a short slug or sha.
`family`	string	yes	Grouping (e.g. `"conditions"`, `"subtypes"`).
`label`	string	yes	Human-readable, e.g. `"disease vs control · T-cell"`.
`cell_type`	string	yes	Cell type the comparison was run within.
`group_a`	string	yes	Numerator group label.
`group_b`	string	yes	Denominator / reference group label.
`n_a`	int64	no	Cell count in group A.
`n_b`	int64	no	Cell count in group B.
`method`	string	no	`"MAST"`, `"wilcoxon"`, `"deseq2"`, …

`results.parquet` — long table of (comparison × gene)¶

column	type	required
`comparison_id`	string	yes
`gene`	string	yes
`log2fc`	float32	yes
`pval`	float64	yes
`padj`	float64	yes

Extra columns are preserved

STELLAR enforces the columns above and ignores extras — you can keep pass-through columns like pct_expressed_a, stat, tstat in the parquet, they just aren't surfaced by the API.

Producing the input¶

Any DE tool works; the only contract is the two parquet files above.

scanpy rank_genes_groupsMAST / Wilcoxon CSV trees

import scanpy as sc
import pandas as pd

sc.tl.rank_genes_groups(adata, "leiden", method="wilcoxon")
rgg = adata.uns["rank_genes_groups"]
rows = []
for group in rgg["names"].dtype.names:
    rows.append(pd.DataFrame({
        "comparison_id": f"leiden:{group}_vs_rest",
        "gene":           rgg["names"][group],
        "log2fc":         rgg["logfoldchanges"][group],
        "pval":           rgg["pvals"][group],
        "padj":           rgg["pvals_adj"][group],
    }))
pd.concat(rows).to_parquet("data/external/de_results/results.parquet")

See the scanpy rank_genes_groups docs for the underlying tool.

Not in v1.0 — convert manually with pandas.read_csv → rename columns to the canonical schema (comparison_id, gene, log2fc, pval, padj) → to_parquet. A stellar.modules.de.helpers namespace is reserved for future conversion utilities; we'll fill it in a 1.x release once a few users have asked.

API surface¶

When enabled:

route	what
`GET /api/de/families`	distinct family values
`GET /api/de/comparisons?family=&cell_type=`	list comparisons, filterable
`GET /api/de/comparison/{id}`	single comparison metadata
`POST /api/de/results`	body `{comparison_id, top_n, padj_max, log2fc_min}` — returns Arrow IPC

Example calls¶

List comparison families:

curl -s http://localhost:18901/api/de/families | python -m json.tool
# {"families": ["conditions", "subtypes"]}

List comparisons within a family + cell type:

curl -s 'http://localhost:18901/api/de/comparisons?family=conditions&cell_type=T' \
     | python -m json.tool
# {"comparisons": [
#   {"comparison_id": "T_disease_vs_healthy",
#    "family":        "conditions",
#    "label":         "Disease vs Healthy · T",
#    "cell_type":     "T",
#    "group_a":       "disease",
#    "group_b":       "healthy",
#    "n_a":           1024,
#    "n_b":           987,
#    "method":        "wilcoxon"}]}

Pull top 25 DE results (Arrow IPC binary stream):

curl -sX POST http://localhost:18901/api/de/results \
     -H 'content-type: application/json' \
     -d '{"comparison_id": "T_disease_vs_healthy",
          "top_n":         25,
          "padj_max":      0.05}' \
     -o /tmp/de_results.arrow

The response is an Apache Arrow IPC stream (content-type: application/vnd.apache.arrow.stream) with columns gene, log2fc, pval, padj — the frontend reads it via @apache-arrow/esnext-esm. Read in Python:

import pyarrow.ipc as ipc
with open("/tmp/de_results.arrow", "rb") as f:
    table = ipc.open_stream(f).read_all()
print(table.to_pandas().head())

Copilot tools¶

When both de and copilot are enabled the module contributes two tools to the Claude agent loop.

`list_de_comparisons`¶

Discover comparison IDs (which are project-specific opaque strings).

{
  "name": "list_de_comparisons",
  "description": "List precomputed DE comparisons, optionally filtered by family or cell type. Returns each comparison's id, family, label, cell_type, group_a, group_b.",
  "input_schema": {
    "type": "object",
    "properties": {
      "family":    {"type": "string"},
      "cell_type": {"type": "string"}
    },
    "required": []
  }
}

`compare_groups`¶

Pull top genes for one comparison.

{
  "name": "compare_groups",
  "description": "Pull top genes for a precomputed DE comparison. Discover comparison_ids first via list_de_comparisons. Returns gene / log2fc / pval / padj rows sorted by padj.",
  "input_schema": {
    "type": "object",
    "properties": {
      "comparison_id": {"type": "string"},
      "top_n":         {"type": "integer", "default": 25},
      "padj_max":      {"type": "number",  "default": 0.05}
    },
    "required": ["comparison_id"]
  }
}

System prompt fragment¶

The module contributes this paragraph to the copilot system prompt:

Differential expression (DE) comparisons are precomputed. Discover available comparison_ids with list_de_comparisons (filterable by family / cell_type), then pull top hits with compare_groups. Never invent a comparison_id — they're project-specific strings that must be discovered.

The actual implementation of these tools lives at stellar/modules/de/__init__.py — see Extending for the pattern to mimic in your own module.

Frontend tab¶

The DE tab appears in the SPA nav when this module is enabled: comparison picker → volcano (Plotly scattergl) → sortable table → "Send up-genes to Enrichment" (active once the enrichment module is on).

FAQ¶

My DE tool reports avg_log2FC / logFC / coef — what column name does STELLAR want?

Rename it to log2fc. Same for p_val_adj → padj, p_val → pval. The strict column names are deliberate — every downstream SQL query in the routes assumes them.

Can I have multiple comparison families?

Yes — the family column is a free-form string and the API filters on it. Common splits: conditions (Treatment vs Control), subtypes (one cell-subtype vs the rest), donors (per-donor).