Skip to content

Configuration

All SWIFTMonitor parameters can be set at construction time or modified later via set_params().

Parameters Reference

Parameter Type Default Description
model LightGBM / XGBoost (required) Trained tree-ensemble model. Must expose tree structure (e.g., model.booster_.dump_model() for LightGBM).
order int 1 Wasserstein order. 1 for W1 (earth mover's distance), 2 for W2 (more sensitive to variance).
n_permutations int 1000 Number of permutations for the permutation test. More permutations = more precise p-values but slower.
alpha float 0.05 Significance level for the multiple testing correction. Features with corrected p-value < alpha are flagged as drifted.
correction str "benjamini-hochberg" Multiple testing correction method. See MTC Methods below.
n_synthetic int 10 Number of synthetic observations to generate for empty buckets during fit().
max_samples int \| None None Maximum pool size for the permutation test. If the pooled data exceeds this size, it is subsampled. None means no limit.
random_state int 42 Random number generator seed for reproducibility (permutation test and synthetic observations).

MTC Methods

The correction parameter accepts the following values:

Value Aliases Controls Description
"benjamini-hochberg" "bh", "fdr" FDR Controls the false discovery rate. Recommended default — good balance between power and false positive control.
"bonferroni" "bonf" FWER Controls the family-wise error rate. More conservative — use when false positives are costly.

Wasserstein Order

The order parameter controls how the Wasserstein distance is computed:

  • W1 (order=1): Measures the mean absolute difference. Robust to outliers and easy to interpret as the average "shift" in SHAP space.
  • W2 (order=2): Measures the root-mean-square difference. More sensitive to variance changes and large deviations.

For most use cases, W1 is recommended. Use W2 when you care specifically about changes in the spread of the distribution.

Permutation Test Tuning

The n_permutations parameter controls the precision of the p-value estimate:

  • Low (100-200): Fast, suitable for exploratory analysis. P-value resolution: ~0.005-0.01.
  • Medium (1000): Default. Good balance of speed and precision. P-value resolution: ~0.001.
  • High (5000-10000): High precision. Use for final results or when distinguishing between similar p-values matters.

Performance

The permutation test is the most compute-intensive part of the pipeline. If runtime is a concern, start with n_permutations=200 for fast iteration, then increase for final results.

The max_samples parameter can limit the pool size for the permutation test. This is useful when reference and monitoring datasets are very large:

monitor = SWIFTMonitor(
    model=model,
    n_permutations=1000,
    max_samples=5000,  # Subsample if pool > 5000
)

Modifying Parameters

Since SWIFTMonitor follows the scikit-learn API, parameters can be inspected and modified:

# Inspect current parameters
params = monitor.get_params()

# Modify parameters (requires re-fitting if changing model or n_synthetic)
monitor.set_params(alpha=0.01, correction="bonferroni")

Warning

Changing model, n_synthetic, or order after fit() invalidates the fitted state. Call fit() again after modifying these parameters.