Model_A API

Package iabm

Reusable package components for Model_A industrial-state identification.

class iabm.CrossValidationResult(scores)[source]

Bases: object

Summarize fold-wise validation scores produced by the classifier API.

Exposing the result as a dataclass makes downstream reporting clearer and keeps aggregate statistics close to the original fold-level scores.

Parameters:

scores (ndarray)

scores: ndarray
property mean: float

Return the average score across all folds.

property std: float

Return the standard deviation across all folds.

class iabm.InferenceDataset(features, active_mask, source_frame)[source]

Bases: object

Bundle inference-ready features together with activity bookkeeping.

The source frame and activity mask allow the CLI to reconstruct outputs aligned with the original timestamps, including optional inactive rows.

Parameters:
  • features (DataFrame)

  • active_mask (Series)

  • source_frame (DataFrame)

features: DataFrame
active_mask: Series
source_frame: DataFrame
class iabm.IndustrialDataProcessor(analog_path, digital_path=None, *, threshold=50.0, feature_columns=None)[source]

Bases: object

Prepare industrial analog and digital signals for Model_A workflows.

The processor encapsulates the study-specific preprocessing rules so the rest of the package can work with clean training and inference datasets through a stable, object-oriented interface.

Parameters:
  • analog_path (str)

  • digital_path (Optional[str])

  • threshold (float)

  • feature_columns (Optional[Sequence[str]])

DEFAULT_FEATURE_COLUMNS = ['Vrms1', 'Vrms2', 'Vrms3', 'Irms1', 'Irms2', 'Irms3', 'PF1', 'PF2', 'PF3']
POWER_COLUMNS = ['RP1', 'RP2', 'RP3']
THREE_PHASE_BLOCKS = [['Vrms1', 'Vrms2', 'Vrms3'], ['RP1', 'RP2', 'RP3'], ['Irms1', 'Irms2', 'Irms3'], ['PF1', 'PF2', 'PF3']]
SINGLE_PHASE_BLOCKS = [['Vrms4'], ['RP4'], ['Irms4'], ['PF4']]
prepare_training_data(start, end)[source]

Return supervised features and labels for the requested time range.

Parameters:
  • start (str) – Inclusive lower timestamp bound.

  • end (str) – Inclusive upper timestamp bound.

Returns:

A TrainingDataset containing active rows only, with labels synchronized from the digital signal stream.

Return type:

TrainingDataset

prepare_inference_data(start, end, *, drop_inactive=True)[source]

Return inference-ready analog features without requiring digital labels.

Parameters:
  • start (str) – Inclusive lower timestamp bound.

  • end (str) – Inclusive upper timestamp bound.

  • drop_inactive (bool) – Whether to keep only rows above the activity threshold.

Returns:

An InferenceDataset with the feature matrix, a boolean mask identifying active rows, and the imputed source analog window.

Return type:

InferenceDataset

prepare_evaluation_data(start, end)[source]

Return aligned features and optional labels for model evaluation.

Parameters:
  • start (str) – Inclusive lower timestamp bound.

  • end (str) – Inclusive upper timestamp bound.

Returns:

An EvaluationDataset containing the active feature matrix, optional real labels aligned to the full analog window, the activity mask, and the imputed source analog frame.

Return type:

EvaluationDataset

class iabm.StateClassifier(model_type='rf', params=None, translator=None)[source]

Bases: object

High-level wrapper around the estimator lifecycle used by Model_A.

The class keeps scaling, label encoding, validation, persistence, and inference in one cohesive object so command-line orchestration stays thin and future model variants can share the same interface.

Parameters:
  • model_type (str)

  • params (Optional[Dict[str, Any]])

  • translator (Optional[Callable[[str], str]])

fit(X, y)[source]

Fit the scaler and estimator and return the training accuracy.

Parameters:
  • X (DataFrame) – Training feature matrix.

  • y (Series | ndarray) – Original training labels.

Returns:

In-sample accuracy measured on the fitted training data.

Return type:

float

cross_validate(X, y, *, splits=5, shuffle=True, random_state=42)[source]

Evaluate the configured estimator with stratified cross-validation.

Parameters:
  • X (DataFrame) – Feature matrix.

  • y (Series | ndarray) – Original state labels before encoding.

  • splits (int) – Number of folds in the validation scheme.

  • shuffle (bool) – Whether to shuffle the folds before splitting.

  • random_state (int) – Seed used when shuffling folds.

Returns:

A CrossValidationResult with per-fold scores and summary statistics.

Return type:

CrossValidationResult

predict(X)[source]

Predict original-state labels for new analog observations.

Parameters:

X (DataFrame) – Inference feature matrix.

Returns:

Predicted labels mapped back to the original state identifiers.

Return type:

ndarray

predict_proba(X)[source]

Return class probabilities aligned with the original label space.

Parameters:

X (DataFrame) – Inference feature matrix.

Returns:

A two-dimensional array whose columns follow self.label_encoder.classes_.

Return type:

ndarray

save(file_path)[source]

Persist the full inference artifact required for later reuse.

The saved payload contains every object needed to run predictions on unseen data without retraining: estimator, scaler, label encoder, and feature ordering metadata.

Parameters:

file_path (str)

Return type:

None

classmethod load(file_path, translator=None)[source]

Restore a persisted classifier artifact from disk.

Parameters:
  • file_path (str) – Serialized artifact path created with save().

  • translator (Callable[[str], str] | None) – Optional translation function for user-facing errors.

Returns:

A ready-to-use StateClassifier instance.

Return type:

StateClassifier

class iabm.TrainingDataset(features, labels)[source]

Bases: object

Bundle supervised features and labels for classifier training.

The dataclass keeps the public API explicit and avoids passing loosely coupled tuples around the codebase when training workflows evolve.

Parameters:
  • features (DataFrame)

  • labels (Series)

features: DataFrame
labels: Series

Data Processing

Data preparation utilities for Model_A industrial-state classifiers.

class iabm.data_processor.TrainingDataset(features, labels)[source]

Bases: object

Bundle supervised features and labels for classifier training.

The dataclass keeps the public API explicit and avoids passing loosely coupled tuples around the codebase when training workflows evolve.

Parameters:
  • features (DataFrame)

  • labels (Series)

features: DataFrame
labels: Series
class iabm.data_processor.InferenceDataset(features, active_mask, source_frame)[source]

Bases: object

Bundle inference-ready features together with activity bookkeeping.

The source frame and activity mask allow the CLI to reconstruct outputs aligned with the original timestamps, including optional inactive rows.

Parameters:
  • features (DataFrame)

  • active_mask (Series)

  • source_frame (DataFrame)

features: DataFrame
active_mask: Series
source_frame: DataFrame
class iabm.data_processor.EvaluationDataset(features, labels, active_mask, source_frame)[source]

Bases: object

Bundle features, labels, and alignment data for quality assessment.

Parameters:
  • features (DataFrame)

  • labels (Series | None)

  • active_mask (Series)

  • source_frame (DataFrame)

features: DataFrame
labels: Series | None
active_mask: Series
source_frame: DataFrame
class iabm.data_processor.IndustrialDataProcessor(analog_path, digital_path=None, *, threshold=50.0, feature_columns=None)[source]

Bases: object

Prepare industrial analog and digital signals for Model_A workflows.

The processor encapsulates the study-specific preprocessing rules so the rest of the package can work with clean training and inference datasets through a stable, object-oriented interface.

Parameters:
  • analog_path (str)

  • digital_path (Optional[str])

  • threshold (float)

  • feature_columns (Optional[Sequence[str]])

DEFAULT_FEATURE_COLUMNS = ['Vrms1', 'Vrms2', 'Vrms3', 'Irms1', 'Irms2', 'Irms3', 'PF1', 'PF2', 'PF3']
POWER_COLUMNS = ['RP1', 'RP2', 'RP3']
THREE_PHASE_BLOCKS = [['Vrms1', 'Vrms2', 'Vrms3'], ['RP1', 'RP2', 'RP3'], ['Irms1', 'Irms2', 'Irms3'], ['PF1', 'PF2', 'PF3']]
SINGLE_PHASE_BLOCKS = [['Vrms4'], ['RP4'], ['Irms4'], ['PF4']]
prepare_training_data(start, end)[source]

Return supervised features and labels for the requested time range.

Parameters:
  • start (str) – Inclusive lower timestamp bound.

  • end (str) – Inclusive upper timestamp bound.

Returns:

A TrainingDataset containing active rows only, with labels synchronized from the digital signal stream.

Return type:

TrainingDataset

prepare_inference_data(start, end, *, drop_inactive=True)[source]

Return inference-ready analog features without requiring digital labels.

Parameters:
  • start (str) – Inclusive lower timestamp bound.

  • end (str) – Inclusive upper timestamp bound.

  • drop_inactive (bool) – Whether to keep only rows above the activity threshold.

Returns:

An InferenceDataset with the feature matrix, a boolean mask identifying active rows, and the imputed source analog window.

Return type:

InferenceDataset

prepare_evaluation_data(start, end)[source]

Return aligned features and optional labels for model evaluation.

Parameters:
  • start (str) – Inclusive lower timestamp bound.

  • end (str) – Inclusive upper timestamp bound.

Returns:

An EvaluationDataset containing the active feature matrix, optional real labels aligned to the full analog window, the activity mask, and the imputed source analog frame.

Return type:

EvaluationDataset

Models

Model abstractions for Model_A industrial-state classifiers.

class iabm.models.CrossValidationResult(scores)[source]

Bases: object

Summarize fold-wise validation scores produced by the classifier API.

Exposing the result as a dataclass makes downstream reporting clearer and keeps aggregate statistics close to the original fold-level scores.

Parameters:

scores (ndarray)

scores: ndarray
property mean: float

Return the average score across all folds.

property std: float

Return the standard deviation across all folds.

class iabm.models.FoldLabelEncoderClassifier(estimator)[source]

Bases: BaseEstimator, ClassifierMixin

Wrap an estimator so each fit uses fold-local contiguous class labels.

XGBoost expects class labels presented during fit to be contiguous integers starting at zero. During cross-validation, some training folds may not contain every class present in the global dataset, which makes a globally encoded target vector invalid for that fold. This wrapper applies a fresh label encoding on every fit and maps predictions back to the original labels expected by scikit-learn scorers.

Parameters:

estimator (BaseEstimator)

fit(X, y)[source]

Fit the wrapped estimator with a fold-local label encoding.

Parameters:
  • X (DataFrame | ndarray) – Fold-local feature matrix.

  • y (Series | ndarray) – Fold-local label vector.

Returns:

The fitted wrapper instance.

Return type:

FoldLabelEncoderClassifier

predict(X)[source]

Predict labels and map them back to the original fold label space.

Parameters:

X (DataFrame | ndarray) – Fold-local feature matrix.

Returns:

Predictions expressed in the original label space expected by the scoring function.

Return type:

ndarray

get_params(deep=True)[source]

Expose wrapped-estimator parameters for scikit-learn compatibility.

Parameters:

deep (bool) – Whether to include nested estimator parameters.

Returns:

A parameter dictionary compatible with scikit-learn cloning.

Return type:

Dict[str, Any]

set_params(**params)[source]

Propagate parameter updates to the wrapped estimator when requested.

Parameters:

**params (Any) – Wrapper or nested estimator parameters.

Returns:

The updated wrapper instance.

Return type:

FoldLabelEncoderClassifier

set_score_request(*, sample_weight='$UNCHANGED$')

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns

selfobject

The updated object.

Parameters:
Return type:

FoldLabelEncoderClassifier

class iabm.models.StateClassifier(model_type='rf', params=None, translator=None)[source]

Bases: object

High-level wrapper around the estimator lifecycle used by Model_A.

The class keeps scaling, label encoding, validation, persistence, and inference in one cohesive object so command-line orchestration stays thin and future model variants can share the same interface.

Parameters:
  • model_type (str)

  • params (Optional[Dict[str, Any]])

  • translator (Optional[Callable[[str], str]])

fit(X, y)[source]

Fit the scaler and estimator and return the training accuracy.

Parameters:
  • X (DataFrame) – Training feature matrix.

  • y (Series | ndarray) – Original training labels.

Returns:

In-sample accuracy measured on the fitted training data.

Return type:

float

cross_validate(X, y, *, splits=5, shuffle=True, random_state=42)[source]

Evaluate the configured estimator with stratified cross-validation.

Parameters:
  • X (DataFrame) – Feature matrix.

  • y (Series | ndarray) – Original state labels before encoding.

  • splits (int) – Number of folds in the validation scheme.

  • shuffle (bool) – Whether to shuffle the folds before splitting.

  • random_state (int) – Seed used when shuffling folds.

Returns:

A CrossValidationResult with per-fold scores and summary statistics.

Return type:

CrossValidationResult

predict(X)[source]

Predict original-state labels for new analog observations.

Parameters:

X (DataFrame) – Inference feature matrix.

Returns:

Predicted labels mapped back to the original state identifiers.

Return type:

ndarray

predict_proba(X)[source]

Return class probabilities aligned with the original label space.

Parameters:

X (DataFrame) – Inference feature matrix.

Returns:

A two-dimensional array whose columns follow self.label_encoder.classes_.

Return type:

ndarray

save(file_path)[source]

Persist the full inference artifact required for later reuse.

The saved payload contains every object needed to run predictions on unseen data without retraining: estimator, scaler, label encoder, and feature ordering metadata.

Parameters:

file_path (str)

Return type:

None

classmethod load(file_path, translator=None)[source]

Restore a persisted classifier artifact from disk.

Parameters:
  • file_path (str) – Serialized artifact path created with save().

  • translator (Callable[[str], str] | None) – Optional translation function for user-facing errors.

Returns:

A ready-to-use StateClassifier instance.

Return type:

StateClassifier

Command-Line Interface

Command-line entry point for training and using Model_A classifiers.

iabm.main.parse_arguments(translator)[source]

Build the CLI parser with translated help messages.

Parameters:

translator (Callable[[str], str]) – Translation function returned by setup_i18n().

Returns:

Parsed command-line arguments ready to drive the main workflow.

Return type:

Namespace

iabm.main.main()[source]

Run the end-to-end Model_A workflow for training or prediction.

The entry point keeps orchestration concerns in one place while delegating data preparation and model lifecycle logic to their respective classes.

Training mode prepares labeled features, runs cross-validation, fits the final classifier, and persists both the model artifact and fold metrics. Prediction mode loads a previously trained artifact and applies it to a new analog time window without requiring digital labels at inference time.

Return type:

None

Utilities

Internationalization helpers for the Model_A command-line interface.

iabm.utils.setup_i18n(lang='en')[source]

Return a translation function for the requested interface language.

The project stores human-maintained translations in locales/*/LC_MESSAGES as .po files. This helper reads those catalogs directly so the CLI can be translated even when .mo files have not been compiled yet.

Parameters:

lang (str) – ISO language code requested by the user.

Returns:

A callable compatible with gettext usage that translates a message identifier into the configured language. English falls back to the original message identifiers.

Return type:

Callable[[str], str]