DE — differential expression¶
Precomputed differential expression viewer: an interactive volcano + a sortable, filterable table over your DE results. Tool-callable from the copilot when both modules are enabled.
| extras_key | de |
| config_key | de |
| install | pip install 'stellar-atlas[de]' |
| frontend tab | DE |
Enable¶
Input format¶
Two parquet files under source_dir/.
comparisons.parquet — one row per DE contrast¶
| column | type | required | notes |
|---|---|---|---|
comparison_id |
string | yes | Unique handle; use a short slug or sha. |
family |
string | yes | Grouping (e.g. "conditions", "subtypes"). |
label |
string | yes | Human-readable, e.g. "disease vs control · T-cell". |
cell_type |
string | yes | Cell type the comparison was run within. |
group_a |
string | yes | Numerator group label. |
group_b |
string | yes | Denominator / reference group label. |
n_a |
int64 | no | Cell count in group A. |
n_b |
int64 | no | Cell count in group B. |
method |
string | no | "MAST", "wilcoxon", "deseq2", … |
results.parquet — long table of (comparison × gene)¶
| column | type | required |
|---|---|---|
comparison_id |
string | yes |
gene |
string | yes |
log2fc |
float32 | yes |
pval |
float64 | yes |
padj |
float64 | yes |
Extra columns are preserved
STELLAR enforces the columns above and ignores extras — you can
keep pass-through columns like pct_expressed_a, stat, tstat
in the parquet, they just aren't surfaced by the API.
Producing the input¶
Any DE tool works; the only contract is the two parquet files above.
import scanpy as sc
import pandas as pd
sc.tl.rank_genes_groups(adata, "leiden", method="wilcoxon")
rgg = adata.uns["rank_genes_groups"]
rows = []
for group in rgg["names"].dtype.names:
rows.append(pd.DataFrame({
"comparison_id": f"leiden:{group}_vs_rest",
"gene": rgg["names"][group],
"log2fc": rgg["logfoldchanges"][group],
"pval": rgg["pvals"][group],
"padj": rgg["pvals_adj"][group],
}))
pd.concat(rows).to_parquet("data/external/de_results/results.parquet")
See the scanpy
rank_genes_groups
docs for the underlying tool.
Not in v1.0 — convert manually with pandas.read_csv → rename
columns to the canonical schema (comparison_id, gene, log2fc,
pval, padj) → to_parquet. A
stellar.modules.de.helpers namespace is reserved for future
conversion utilities; we'll fill it in a 1.x release once a few
users have asked.
API surface¶
When enabled:
| route | what |
|---|---|
GET /api/de/families |
distinct family values |
GET /api/de/comparisons?family=&cell_type= |
list comparisons, filterable |
GET /api/de/comparison/{id} |
single comparison metadata |
POST /api/de/results |
body {comparison_id, top_n, padj_max, log2fc_min} — returns Arrow IPC |
Example calls¶
List comparison families:
curl -s http://localhost:18901/api/de/families | python -m json.tool
# {"families": ["conditions", "subtypes"]}
List comparisons within a family + cell type:
curl -s 'http://localhost:18901/api/de/comparisons?family=conditions&cell_type=T' \
| python -m json.tool
# {"comparisons": [
# {"comparison_id": "T_disease_vs_healthy",
# "family": "conditions",
# "label": "Disease vs Healthy · T",
# "cell_type": "T",
# "group_a": "disease",
# "group_b": "healthy",
# "n_a": 1024,
# "n_b": 987,
# "method": "wilcoxon"}]}
Pull top 25 DE results (Arrow IPC binary stream):
curl -sX POST http://localhost:18901/api/de/results \
-H 'content-type: application/json' \
-d '{"comparison_id": "T_disease_vs_healthy",
"top_n": 25,
"padj_max": 0.05}' \
-o /tmp/de_results.arrow
The response is an Apache Arrow IPC stream
(content-type: application/vnd.apache.arrow.stream) with columns
gene, log2fc, pval, padj — the frontend reads it via
@apache-arrow/esnext-esm. Read in Python:
import pyarrow.ipc as ipc
with open("/tmp/de_results.arrow", "rb") as f:
table = ipc.open_stream(f).read_all()
print(table.to_pandas().head())
Copilot tools¶
When both de and copilot are enabled the module contributes two
tools to the Claude agent loop.
list_de_comparisons¶
Discover comparison IDs (which are project-specific opaque strings).
{
"name": "list_de_comparisons",
"description": "List precomputed DE comparisons, optionally filtered by family or cell type. Returns each comparison's id, family, label, cell_type, group_a, group_b.",
"input_schema": {
"type": "object",
"properties": {
"family": {"type": "string"},
"cell_type": {"type": "string"}
},
"required": []
}
}
compare_groups¶
Pull top genes for one comparison.
{
"name": "compare_groups",
"description": "Pull top genes for a precomputed DE comparison. Discover comparison_ids first via list_de_comparisons. Returns gene / log2fc / pval / padj rows sorted by padj.",
"input_schema": {
"type": "object",
"properties": {
"comparison_id": {"type": "string"},
"top_n": {"type": "integer", "default": 25},
"padj_max": {"type": "number", "default": 0.05}
},
"required": ["comparison_id"]
}
}
System prompt fragment¶
The module contributes this paragraph to the copilot system prompt:
Differential expression (DE) comparisons are precomputed. Discover available comparison_ids with
list_de_comparisons(filterable by family / cell_type), then pull top hits withcompare_groups. Never invent a comparison_id — they're project-specific strings that must be discovered.
The actual implementation of these tools lives at
stellar/modules/de/__init__.py — see Extending for
the pattern to mimic in your own module.
Frontend tab¶
The DE tab appears in the SPA nav when this module is enabled: comparison picker → volcano (Plotly scattergl) → sortable table → "Send up-genes to Enrichment" (active once the enrichment module is on).
FAQ¶
My DE tool reports avg_log2FC / logFC / coef — what column name does STELLAR want?
Rename it to log2fc. Same for p_val_adj → padj, p_val →
pval. The strict column names are deliberate — every downstream
SQL query in the routes assumes them.
Can I have multiple comparison families?
Yes — the family column is a free-form string and the API filters
on it. Common splits: conditions (Treatment vs Control),
subtypes (one cell-subtype vs the rest), donors (per-donor).