Distance¶
distance
¶
Stage 4: Wasserstein distance on SHAP-transformed distributions.
Computes the Wasserstein distance (W₁ or W₂) between two 1-D empirical distributions of SHAP-transformed feature values. The transformation σ_j is computed ONCE from D_ref (Stage 3) and applied identically to both reference and monitoring samples.
Functions: wasserstein_1d: W_p distance between two 1-D arrays. compute_swift_scores: Per-feature SWIFT scores for ref vs mon.
wasserstein_1d
¶
Compute the p-th Wasserstein distance between two 1-D empirical distributions.
For order=1 this is the Earth Mover's Distance (L₁ area between CDFs). For order=2 this is the root of the integral of squared CDF differences.
Both are computed via the sorted-quantile formula: W_p^p = (1/N) Σ |F⁻¹_u(i/N) - F⁻¹_v(i/N)|^p using linear interpolation of quantile functions on a common grid so that unequal sample sizes are handled correctly.
Args: u: 1-D array of samples from distribution P. v: 1-D array of samples from distribution Q. order: Wasserstein order (1 or 2).
Returns: Non-negative float W_p(P, Q).
Raises: ValueError: If order is not 1 or 2, or arrays are empty.
Source code in src/swift/distance.py
compute_swift_scores
¶
compute_swift_scores(X_ref: DataFrame, X_mon: DataFrame, bucket_sets: dict[str, BucketSet], order: int = 1) -> dict[str, float]
Compute per-feature SWIFT scores: W_p on SHAP-transformed distributions.
For each feature j: 1. Apply σ_j (from bucket_sets) to both ref and mon columns. 2. Compute W_p between the two transformed arrays.
The transformation σ_j was fitted on D_ref (Stage 3) and is not recomputed here — it is applied identically to both samples.
Args: X_ref: Reference DataFrame (n_ref × p). X_mon: Monitoring DataFrame (n_mon × p). bucket_sets: Dict of feature_name → BucketSet with mean_shap already computed (output of compute_bucket_shap). order: Wasserstein order (1 or 2).
Returns: Dict of feature_name → SWIFT score (non-negative float).