Metrics¶
Every metric returns a tidy pandas.DataFrame indexed by path (the concept's full DFS path joined with /). This makes them composable, easy to test, and trivially serialisable to CSV / Parquet for downstream dashboards.
Layered overview¶
| Family | Functions | Question |
|---|---|---|
| Counts | feature_counts |
How many features under each concept? |
| Importance | importance_sum |
How much importance does each concept aggregate? |
| Utilization | utilization |
Which concepts does the model actually use? |
| Ablation | auc_drop |
How much performance drops when a concept's data is missing? |
| Correlation | feature_correlation, nullity_correlation, shap_correlation |
Do concepts cluster as the tree says they should? |
| Missingness | column_missing_rate, joint_missing_rate |
How often does a feature / a whole concept go missing? |
| Coherence | coherence_importance |
Are concepts well-designed (coherent + important)? |
Standard input shape¶
Every metric that takes per-feature signal accepts the same canonical shape:
feature_names: Sequence[str]of length F;importances: np.ndarrayof shape(F,)(already aggregated) or(N, F)(per-sample SHAP-style).
(N, F) is collapsed by the metric layer using np.abs(arr).mean(axis=0) by default. Switch to signed aggregation with signed=True or to summed aggregation with agg="sum".
Standard output shape¶
Every metric returns a DataFrame with at least:
| Column | Type | Meaning |
|---|---|---|
name |
str |
Bare node name. |
kind |
"concept" | "feature" |
Node kind. |
depth |
int |
Distance from the root. |
parent |
str |
Parent name (empty string for root). |
…plus metric-specific columns. The DataFrame index is the /-joined concept path (so two different Age nodes in two different sub-trees never collide).
Choosing between metrics¶
"How big is each branch?"¶
Use feature_counts. Pure structural — does not need any model.
"Which concept does the model rely on the most?"¶
Use importance_sum with SHAP values. Pair with the sunburst plot.
"Are there parts of my tree the model ignores?"¶
Use utilization, pair with utilization_map.
"How much performance is at risk if a whole branch's data goes missing?"¶
Use auc_drop. Pick a strategy (see Ablation Strategies).
"Is the concept tree itself any good?"¶
Use the §H trio: feature_correlation + coherence_importance + coherence_importance_scatter. See Concept-Design Diagnostics.
"Do whole concepts go missing together?"¶
Use nullity_correlation and joint_missing_rate, pair with correlation_block and joint_missing_map.
"Does the model treat two concepts as substitutes?"¶
Use shap_correlation, pair with correlation_block.
Composing metrics¶
Because every output is keyed by path, joining is trivial:
import pandas as pd
from concept_graph_xai import (
coherence_importance,
column_missing_rate,
importance_sum,
joint_missing_rate,
auc_drop,
)
imp = importance_sum(graph, names, shap_values).reset_index()
jmr = joint_missing_rate(graph, X_test).reset_index()
adp = auc_drop(graph, model, X_test, y_test, feature_names=names,
strategy="permutation").reset_index()
summary = (
imp.merge(jmr[["path", "joint_missing_rate"]], on="path")
.merge(adp[["path", "auc_drop_mean"]], on="path")
)
The "realism-weighted AUC drop" view — auc_drop_mean × joint_missing_rate — is intentionally not computed by the package; it is one line in the join above.
Defaults that bite¶
auc_dropskips the root concept by default (skip_root=True). The root represents "everything is missing", which trivially zeros the score and is rarely useful.importance_sumuses mean(|SHAP|) by default. Usesigned=Truefor direction-aware aggregation (concepts that consistently push the prediction up vs. down).utilizationflags a feature as used when |importance| > threshold, default0.0. Raise the threshold to mask noise; lower it to be inclusive.feature_correlationand friends default to Spearman rank correlation — robust to monotonic non-linearities, closer to what tree models exploit.