Extraction¶
extraction
¶
Stage 1: Decision point extraction from trained models.
Extracts the set of unique split thresholds (decision points) per feature from a trained model. These decision points define the bucket boundaries for the SWIFT pipeline.
Supported model types: - LightGBM Booster (extract_decision_points_lgb) - XGBoost Booster (extract_decision_points_xgb)
extract_decision_points
¶
Auto-dispatch extraction based on model type.
Args: model: A trained LightGBM Booster or XGBoost Booster. feature_names: List of feature names (must match model's feature order).
Returns: Dict mapping feature name -> sorted 1-D array of unique split thresholds.
Raises: TypeError: If the model type is not supported.
Source code in src/swift/extraction.py
extract_decision_points_lgb
¶
Extract unique, sorted split thresholds per feature from a LightGBM Booster.
For each feature, collects every split threshold used across all trees in the ensemble, deduplicates, and returns them in ascending order.
Args: model: A trained LightGBM Booster. feature_names: List of feature names (must match model's feature order).
Returns: Dict mapping feature name -> sorted 1-D array of unique split thresholds. Features never used in any split get an empty array.
Source code in src/swift/extraction.py
extract_decision_points_xgb
¶
Extract unique, sorted split thresholds per feature from an XGBoost Booster.
Uses get_dump(dump_format='json') to obtain JSON-serialised trees,
then recursively collects every numeric split threshold per feature.
Args:
model: A trained xgboost.Booster.
feature_names: List of feature names (must match model's feature order).
Returns: Dict mapping feature name -> sorted 1-D array of unique split thresholds. Features never used in any split get an empty array.