SWIFTMonitor¶

SWIFTMonitor ¶

SWIFTMonitor(model: object = None, order: int = 1, n_permutations: int = 1000, alpha: float = 0.05, correction: Union[CorrectionMethod, str] = BH, n_synthetic: int = 10, max_samples: int | None = None, random_state: int = 42)

Bases: BaseEstimator, TransformerMixin

SHAP-Weighted Impact Feature Testing monitor.

Orchestrates the 5-stage SWIFT pipeline:

1. Extract decision points from trained model
2. Build buckets from decision points
3. Compute bucket-level mean SHAP (SHAP normalization σ_j)
4. Compute Wasserstein distance on SHAP-transformed distributions
5. Permutation-based significance testing with MTC

The transformation σ_j is computed ONCE during fit() and applied identically to all monitoring samples via transform().

PARAMETER	DESCRIPTION
`model`	Trained tree-ensemble model (LightGBM `Booster` or XGBoost `Booster`). Required — passed as a constructor dependency (analogous to `sklearn.feature_selection.SelectFromModel`). TYPE: `object` DEFAULT: `None`
`order`	Wasserstein order (1 → W₁, 2 → W₂). TYPE: `int` DEFAULT: `1`
`n_permutations`	Number of permutations for p-value estimation in `test()`. TYPE: `int` DEFAULT: `1000`
`alpha`	Significance level for multiple testing correction. TYPE: `float` DEFAULT: `0.05`
`correction`	Multiple testing correction method. Accepts enum members or strings (`"bonferroni"`, `"benjamini-hochberg"`, `"bh"`). TYPE: `CorrectionMethod or str` DEFAULT: `"benjamini-hochberg"`
`n_synthetic`	Number of synthetic observations to create for empty buckets during `fit()`. TYPE: `int` DEFAULT: `10`
`max_samples`	Maximum total pool size (n_ref + n_mon) for the permutation test. If exceeded, subsample proportionally. `None` = no limit. TYPE: `int or None` DEFAULT: `None`
`random_state`	Seed for the random number generator, ensuring reproducibility. TYPE: `int` DEFAULT: `42`

ATTRIBUTE	DESCRIPTION
`bucket_sets_`	Per-feature bucket sets with `mean_shap` populated (set by `fit`). TYPE: `dict[str, BucketSet]`
`X_ref_`	Copy of the reference DataFrame (stored for permutation testing). TYPE: `DataFrame`
`shap_values_`	SHAP values computed on the reference data. TYPE: `ndarray`
`feature_names_in_`	Feature names inferred from `X.columns` during `fit`. TYPE: `ndarray`
`n_features_in_`	Number of features seen during `fit`. TYPE: `int`

Examples:

>>> monitor = SWIFTMonitor(model=lgb_model, n_permutations=200)
>>> monitor.fit(X_ref)
SWIFTMonitor(...)
>>> result = monitor.test(X_mon)
>>> result.drifted_features
('feature_3',)

Source code in src/swift/pipeline.py

def __init__(
    self,
    model: object = None,
    order: int = 1,
    n_permutations: int = 1000,
    alpha: float = 0.05,
    correction: Union[CorrectionMethod, str] = CorrectionMethod.BH,
    n_synthetic: int = 10,
    max_samples: int | None = None,
    random_state: int = 42,
) -> None:
    self.model = model
    self.order = order
    self.n_permutations = n_permutations
    self.alpha = alpha
    self.correction = correction
    self.n_synthetic = n_synthetic
    self.max_samples = max_samples
    self.random_state = random_state

fit ¶

fit(X: DataFrame, y: None = None) -> SWIFTMonitor

Fit the SWIFT monitor on reference data.

Executes stages 1–3: extraction → bucketing → SHAP normalization.

PARAMETER	DESCRIPTION
`X`	Reference DataFrame. Feature names are inferred from `X.columns`. TYPE: `pd.DataFrame of shape (n_ref, n_features)`
`y`	Not used; present for API compatibility. TYPE: `ignored` DEFAULT: `None`

RETURNS	DESCRIPTION
`self`

Source code in src/swift/pipeline.py

def fit(
    self,
    X: pd.DataFrame,
    y: None = None,
) -> SWIFTMonitor:
    """Fit the SWIFT monitor on reference data.

    Executes stages 1–3: extraction → bucketing → SHAP normalization.

    Parameters
    ----------
    X : pd.DataFrame of shape (n_ref, n_features)
        Reference DataFrame.  Feature names are inferred from
        ``X.columns``.
    y : ignored
        Not used; present for API compatibility.

    Returns
    -------
    self
    """
    if self.model is None:
        raise ValueError(
            "SWIFTMonitor requires a trained model.  "
            "Pass it via the constructor: SWIFTMonitor(model=my_model)."
        )

    rng = np.random.default_rng(self.random_state)

    # Infer feature names (sklearn convention).
    self.feature_names_in_ = np.asarray(X.columns)
    self.n_features_in_ = len(self.feature_names_in_)
    feature_names = list(self.feature_names_in_)

    self.X_ref_ = X.copy()

    # Stage 1: Extract decision points
    logger.info("Stage 1: Extracting decision points...")
    decision_points = extract_decision_points(self.model, feature_names)

    # Stage 2: Build buckets
    logger.info("Stage 2: Building buckets...")
    bucket_sets = build_all_buckets(decision_points)

    # Compute SHAP values on reference data
    logger.info("Computing SHAP values on reference data...")
    explainer = shap.TreeExplainer(self.model)
    shap_values = explainer.shap_values(X)
    shap_values = np.asarray(shap_values)
    self.shap_values_ = shap_values

    # Stage 3: SHAP normalization
    logger.info("Stage 3: Computing bucket-level mean SHAP...")
    self.bucket_sets_ = compute_bucket_shap(
        bucket_sets,
        X,
        shap_values,
        model=self.model,
        n_synthetic=self.n_synthetic,
        rng=rng,
    )

    n_buckets_total = sum(
        bs.num_buckets for bs in self.bucket_sets_.values()
    )
    logger.info(
        "SWIFT monitor fitted: %d features, %d total buckets.",
        len(feature_names),
        n_buckets_total,
    )

    return self

transform ¶

transform(X: DataFrame) -> DataFrame

Apply the SHAP transformation σ_j to every feature.

Each value x_ij is mapped to the mean SHAP of its bucket: σ_j(x_ij) = mean_shap_j^{bucket(x_ij)}.

PARAMETER	DESCRIPTION
`X`	Input data. TYPE: `pd.DataFrame of shape (n_samples, n_features)`

RETURNS	DESCRIPTION
`DataFrame`	SHAP-transformed DataFrame (same shape and column names).

Source code in src/swift/pipeline.py

def transform(self, X: pd.DataFrame) -> pd.DataFrame:
    """Apply the SHAP transformation σ_j to every feature.

    Each value ``x_ij`` is mapped to the mean SHAP of its bucket:
    ``σ_j(x_ij) = mean_shap_j^{bucket(x_ij)}``.

    Parameters
    ----------
    X : pd.DataFrame of shape (n_samples, n_features)
        Input data.

    Returns
    -------
    pd.DataFrame
        SHAP-transformed DataFrame (same shape and column names).
    """
    check_is_fitted(self)
    result = pd.DataFrame(index=X.index)
    for fname in self.feature_names_in_:
        result[fname] = transform_feature(
            X[fname].values, self.bucket_sets_[fname]
        )
    return result

score ¶

score(X: DataFrame, X_compare: DataFrame | None = None) -> dict[str, float]

Compute per-feature SWIFT scores (stage 4 only, no testing).

PARAMETER	DESCRIPTION
`X`	Monitoring DataFrame. When X_compare is `None`, this is compared against the fitted reference `X_ref_`. TYPE: `pd.DataFrame of shape (n_samples, n_features)`
`X_compare`	Optional second sample. When provided, SWIFT scores are computed between X and X_compare instead of between `X_ref_` and X. The SHAP transformation σ_j is always the one fitted on `X_ref_`. TYPE: `DataFrame or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, float]`	Feature name → SWIFT score (Wasserstein distance on SHAP-transformed distributions).

Source code in src/swift/pipeline.py

def score(
    self,
    X: pd.DataFrame,
    X_compare: pd.DataFrame | None = None,
) -> dict[str, float]:
    """Compute per-feature SWIFT scores (stage 4 only, no testing).

    Parameters
    ----------
    X : pd.DataFrame of shape (n_samples, n_features)
        Monitoring DataFrame.  When *X_compare* is ``None``, this is
        compared against the fitted reference ``X_ref_``.
    X_compare : pd.DataFrame or None
        Optional second sample.  When provided, SWIFT scores are
        computed between *X* and *X_compare* instead of between
        ``X_ref_`` and *X*.  The SHAP transformation σ_j is always
        the one fitted on ``X_ref_``.

    Returns
    -------
    dict[str, float]
        Feature name → SWIFT score (Wasserstein distance on
        SHAP-transformed distributions).
    """
    check_is_fitted(self)

    if X_compare is not None:
        return compute_swift_scores(
            X, X_compare, self.bucket_sets_, order=self.order
        )

    return compute_swift_scores(
        self.X_ref_, X, self.bucket_sets_, order=self.order
    )

test ¶

test(X: DataFrame, X_compare: DataFrame | None = None) -> SWIFTResult

Run the full SWIFT pipeline: score + test + aggregate.

Stages 4–5 + aggregation:

Compute per-feature SWIFT scores (Wasserstein on SHAP-transformed).
Permutation test for p-values.
Multiple testing correction.
Model-level aggregation.

All hyperparameters (order, n_permutations, alpha, correction, max_samples) are taken from instance attributes set in the constructor. Use set_params() to override for individual calls (e.g. different random_state per experiment repetition).

PARAMETER	DESCRIPTION
`X`	Monitoring DataFrame. When X_compare is `None`, this is compared against the fitted reference `X_ref_`. TYPE: `pd.DataFrame of shape (n_samples, n_features)`
`X_compare`	Optional second sample. When provided, the test compares X against X_compare instead of `X_ref_` against X. The SHAP transformation σ_j is always the one fitted on `X_ref_`. TYPE: `DataFrame or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`SWIFTResult`	Per-feature and model-level results.

Source code in src/swift/pipeline.py

def test(
    self,
    X: pd.DataFrame,
    X_compare: pd.DataFrame | None = None,
) -> SWIFTResult:
    """Run the full SWIFT pipeline: score + test + aggregate.

    Stages 4–5 + aggregation:

    - Compute per-feature SWIFT scores (Wasserstein on SHAP-transformed).
    - Permutation test for p-values.
    - Multiple testing correction.
    - Model-level aggregation.

    All hyperparameters (``order``, ``n_permutations``, ``alpha``,
    ``correction``, ``max_samples``) are taken from instance attributes
    set in the constructor.  Use ``set_params()`` to override for
    individual calls (e.g. different ``random_state`` per experiment
    repetition).

    Parameters
    ----------
    X : pd.DataFrame of shape (n_samples, n_features)
        Monitoring DataFrame.  When *X_compare* is ``None``, this is
        compared against the fitted reference ``X_ref_``.
    X_compare : pd.DataFrame or None
        Optional second sample.  When provided, the test compares
        *X* against *X_compare* instead of ``X_ref_`` against *X*.
        The SHAP transformation σ_j is always the one fitted on
        ``X_ref_``.

    Returns
    -------
    SWIFTResult
        Per-feature and model-level results.
    """
    check_is_fitted(self)

    rng = np.random.default_rng(self.random_state)
    correction = CorrectionMethod.resolve(self.correction)

    # Determine the two samples to compare
    if X_compare is not None:
        X_a, X_b = X, X_compare
    else:
        X_a, X_b = self.X_ref_, X

    # Stage 4: Compute SWIFT scores
    logger.info("Stage 4: Computing SWIFT scores...")
    scores = compute_swift_scores(
        X_a, X_b, self.bucket_sets_, order=self.order
    )

    # Stage 5: Permutation test + MTC
    logger.info(
        "Stage 5: Permutation test (B=%d)...", self.n_permutations
    )
    pvalues = permutation_test(
        X_a,
        X_b,
        self.bucket_sets_,
        order=self.order,
        n_permutations=self.n_permutations,
        max_samples=self.max_samples,
        rng=rng,
    )

    decisions = correct_pvalues(pvalues, correction, self.alpha)

    # Build per-feature results
    w_order = (
        WassersteinOrder.W1 if self.order == 1 else WassersteinOrder.W2
    )
    feature_results: list[FeatureSWIFTResult] = []
    for fname in self.feature_names_in_:
        feature_results.append(
            FeatureSWIFTResult(
                feature_name=fname,
                swift_score=scores[fname],
                wasserstein_order=w_order,
                p_value=pvalues[fname],
                is_drifted=decisions[fname],
                num_buckets=self.bucket_sets_[fname].num_buckets,
            )
        )

    # Aggregation
    agg = aggregate_scores(scores)

    result = SWIFTResult(
        feature_results=tuple(feature_results),
        swift_max=agg.swift_max,
        swift_mean=agg.swift_mean,
        alpha=self.alpha,
        correction_method=correction,
    )

    logger.info(
        "SWIFT test complete: %d/%d features drifted (α=%.3f, %s). "
        "SWIFT_max=%.6f, SWIFT_mean=%.6f",
        result.num_drifted,
        result.num_features,
        self.alpha,
        correction.value,
        result.swift_max,
        result.swift_mean,
    )

    return result

plot_buckets ¶

plot_buckets(feature_name: str, X: DataFrame | None = None, X_compare: DataFrame | None = None, labels: tuple[str, str] = ('Reference', 'Comparison'), figsize: tuple[float, float] = (10, 5), title: str | None = None, max_label_buckets: int = 20, x_axis: str = 'bucket') -> tuple

Plot the bucketing profile for a single feature.

Shows mean SHAP per bucket (line + 95 % error band) on the left y-axis and observation density (filled line) on the right y-axis.

PARAMETER	DESCRIPTION
`feature_name`	Feature to visualise. Must be in `feature_names_in_`. TYPE: `str`
`X`	Sample whose density is shown as the primary line. When `None` (default), the fitted reference `X_ref_` is used. TYPE: `DataFrame or None` DEFAULT: `None`
`X_compare`	Optional second sample for density comparison. Must contain feature_name as a column. Each sample's density is normalised to 1.0 independently. TYPE: `DataFrame or None` DEFAULT: `None`
`labels`	Legend labels `(primary_label, comparison_label)`. TYPE: `tuple of str` DEFAULT: `('Reference', 'Comparison')`
`figsize`	Figure size in inches. TYPE: `tuple` DEFAULT: `(10, 5)`
`title`	Custom title. Defaults to `"Bucketing Profile: {feature_name}"`. TYPE: `str or None` DEFAULT: `None`
`max_label_buckets`	Use compact index labels (`B0, B1, …`) when the number of buckets exceeds this threshold. TYPE: `int` DEFAULT: `20`
`x_axis`	`"bucket"` (default) uses integer bucket indices. `"natural"` uses actual feature-value positions (bucket midpoints). TYPE: `('bucket', 'natural')` DEFAULT: `"bucket"`

RETURNS	DESCRIPTION
`(Figure, Axes)`	Matplotlib figure and primary (SHAP) axes.

RAISES	DESCRIPTION
`NotFittedError`	If the monitor has not been fitted.
`ValueError`	If feature_name is not in `feature_names_in_` or X_compare / X do not contain the column.

Source code in src/swift/pipeline.py

def plot_buckets(
    self,
    feature_name: str,
    X: pd.DataFrame | None = None,
    X_compare: pd.DataFrame | None = None,
    labels: tuple[str, str] = ("Reference", "Comparison"),
    figsize: tuple[float, float] = (10, 5),
    title: str | None = None,
    max_label_buckets: int = 20,
    x_axis: str = "bucket",
) -> tuple:
    """Plot the bucketing profile for a single feature.

    Shows mean SHAP per bucket (line + 95 % error band) on the left
    y-axis and observation density (filled line) on the right y-axis.

    Parameters
    ----------
    feature_name : str
        Feature to visualise.  Must be in ``feature_names_in_``.
    X : pd.DataFrame or None
        Sample whose density is shown as the *primary* line.  When
        ``None`` (default), the fitted reference ``X_ref_`` is used.
    X_compare : pd.DataFrame or None
        Optional second sample for density comparison.
        Must contain *feature_name* as a column.  Each sample's
        density is normalised to 1.0 independently.
    labels : tuple of str
        Legend labels ``(primary_label, comparison_label)``.
    figsize : tuple, default (10, 5)
        Figure size in inches.
    title : str or None
        Custom title.  Defaults to
        ``"Bucketing Profile: {feature_name}"``.
    max_label_buckets : int, default 20
        Use compact index labels (``B0, B1, …``) when the number
        of buckets exceeds this threshold.
    x_axis : {"bucket", "natural"}
        ``"bucket"`` (default) uses integer bucket indices.
        ``"natural"`` uses actual feature-value positions (bucket
        midpoints).

    Returns
    -------
    (Figure, Axes)
        Matplotlib figure and primary (SHAP) axes.

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If the monitor has not been fitted.
    ValueError
        If *feature_name* is not in ``feature_names_in_`` or
        *X_compare* / *X* do not contain the column.
    """
    check_is_fitted(self)

    feature_list = list(self.feature_names_in_)
    if feature_name not in feature_list:
        raise ValueError(
            f"Unknown feature '{feature_name}'.  "
            f"Available: {feature_list}"
        )

    feat_idx = feature_list.index(feature_name)

    # Primary density values (None → use ref)
    primary_values = None
    if X is not None:
        if feature_name not in X.columns:
            raise ValueError(
                f"X is missing column '{feature_name}'."
            )
        primary_values = X[feature_name].to_numpy()

    # Comparison density values
    compare_values = None
    if X_compare is not None:
        if feature_name not in X_compare.columns:
            raise ValueError(
                f"X_compare is missing column '{feature_name}'."
            )
        compare_values = X_compare[feature_name].to_numpy()

    return plot_bucket_profile(
        bucket_set=self.bucket_sets_[feature_name],
        feature_values=self.X_ref_[feature_name].to_numpy(),
        shap_values=self.shap_values_[:, feat_idx],
        compare_values=compare_values,
        primary_values=primary_values,
        labels=labels,
        figsize=figsize,
        title=title,
        max_label_buckets=max_label_buckets,
        x_axis=x_axis,
    )

plot_swift_scores ¶

plot_swift_scores(result: SWIFTResult, result_compare: SWIFTResult | None = None, labels: tuple[str, str] = ('Result A', 'Result B'), threshold: float | None = None, sort_by: str = 'score', figsize: tuple[float, float] = (12, 5), title: str | None = None) -> tuple

Plot SWIFT scores per feature from a test() result.

Draws one bar per feature, colored red (drifted) or blue (not drifted), with horizontal reference lines for SWIFT_max, SWIFT_mean, and an optional user-provided threshold.

In comparison mode (when result_compare is given), draws grouped side-by-side bars with neutral coloring.

PARAMETER	DESCRIPTION
`result`	Primary result from `test()`. TYPE: `SWIFTResult`
`result_compare`	Optional second result for grouped comparison. TYPE: `SWIFTResult or None` DEFAULT: `None`
`labels`	Legend labels `(result_label, compare_label)`. TYPE: `tuple of str` DEFAULT: `('Result A', 'Result B')`
`threshold`	Optional detection threshold drawn as a dotted black line. TYPE: `float or None` DEFAULT: `None`
`sort_by`	Feature ordering on the x-axis. `"original"` preserves the order from `feature_names_in_`. TYPE: `('score', 'name', 'original')` DEFAULT: `"score"`
`figsize`	Figure size in inches. TYPE: `tuple` DEFAULT: `(12, 5)`
`title`	Custom title. TYPE: `str or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`(Figure, Axes)`

RAISES	DESCRIPTION
`NotFittedError`	If the monitor has not been fitted.

Source code in src/swift/pipeline.py

def plot_swift_scores(
    self,
    result: SWIFTResult,
    result_compare: SWIFTResult | None = None,
    labels: tuple[str, str] = ("Result A", "Result B"),
    threshold: float | None = None,
    sort_by: str = "score",
    figsize: tuple[float, float] = (12, 5),
    title: str | None = None,
) -> tuple:
    """Plot SWIFT scores per feature from a ``test()`` result.

    Draws one bar per feature, colored red (drifted) or blue (not
    drifted), with horizontal reference lines for ``SWIFT_max``,
    ``SWIFT_mean``, and an optional user-provided *threshold*.

    In comparison mode (when *result_compare* is given), draws
    grouped side-by-side bars with neutral coloring.

    Parameters
    ----------
    result : SWIFTResult
        Primary result from ``test()``.
    result_compare : SWIFTResult or None
        Optional second result for grouped comparison.
    labels : tuple of str
        Legend labels ``(result_label, compare_label)``.
    threshold : float or None
        Optional detection threshold drawn as a dotted black line.
    sort_by : {"score", "name", "original"}
        Feature ordering on the x-axis.  ``"original"`` preserves
        the order from ``feature_names_in_``.
    figsize : tuple, default (12, 5)
        Figure size in inches.
    title : str or None
        Custom title.

    Returns
    -------
    (Figure, Axes)

    Raises
    ------
    sklearn.exceptions.NotFittedError
        If the monitor has not been fitted.
    """
    check_is_fitted(self)
    return plot_feature_swift_scores(
        result=result,
        result_compare=result_compare,
        labels=labels,
        threshold=threshold,
        sort_by=sort_by,
        feature_order=list(self.feature_names_in_),
        figsize=figsize,
        title=title,
    )