h2o_sonar.methods.utils package

Submodules

h2o_sonar.methods.utils.fairness_utils module

h2o_sonar.methods.utils.fairness_utils.check_cm_input(get_global_cm, group_levels, level, print_frame, group_column): Check input to confusion matrix for binary DIA

h2o_sonar.methods.utils.fairness_utils.check_dia_input(actual_column, high_threshold, low_threshold, predict_column, cutoff=None): Check input of binary DIA class initialization

h2o_sonar.methods.utils.fairness_utils.check_frame(actual_column, predict_column, group_column, frame): Sanity checks for input frame to DIA

h2o_sonar.methods.utils.fairness_utils.check_frame_type(frame): Check frame type for DIA

h2o_sonar.methods.utils.fairness_utils.cm_exp_parser(expression, cm_dict, level)

Small utility function that translates abbreviated metric expressions into executable Python statements:

tp | fp cm_dict[level][0, 0] | cm_dict[level][0, 1] ——- ==> ——————————————– fn | tn cm_dict[level][1, 0] | cm_dict[level][1, 1]

h2o_sonar.methods.utils.fairness_utils.get_binary_metric_dict(): Dictionary of metrics utilized by binary DIA.

h2o_sonar.methods.utils.fairness_utils.get_group_levels(group_column, frame): Get level’s for a particular group column, e.g, {male, female}

h2o_sonar.methods.utils.fairness_utils.get_metrics_list(problem_type): Get DIA metrics for a given problem type (regression or binomial).

h2o_sonar.methods.utils.fairness_utils.get_prroc_dt(frame, y, yhat, pos=1, neg=0, res=0.01)

Calculates precision, recall, and f1 for a datatable of y and yhat values.

Args:: frame: Datatable of actual (y) and predicted (yhat) values. y: Name of actual value column. yhat: Name of predicted value column. pos: Primary target value, default 1. neg: Secondary target value, default 0. res: Resolution by which to loop through cutoffs, default 0.01.
Returns:: Datatable of precision, recall, and f1 values.

h2o_sonar.methods.utils.fairness_utils.get_r2_rmse(frame, actual_column, predict_column): Calculate R2 and RMSE between actual and predicted columns in a Pandas frame.

h2o_sonar.methods.utils.fairness_utils.get_reg_metrics_list(): List of metrics utilized by regression DIA.

h2o_sonar.methods.utils.fairness_utils.mean_squared_error(actual, predicted): Computes the mean squared error.

h2o_sonar.methods.utils.fairness_utils.r_squared(actual, predicted): Computes R^2 (coefficient of determination) regression score function.

h2o_sonar.methods.utils.fairness_utils.root_mean_squared_error(actual, predicted): Computes the root mean squared error.

h2o_sonar.methods.utils.fairness_utils.smd_multinomial(frame, y, group_col, ref_level)

Parameters:

frame: datatable.Frame: Datatable that contains target, group column, and multinomial predictions (probabilities) as columns for each class outcome. For example:

target | group_col | class_1_prob | class_2_prob | … |
y: str: Column that contains the true value for the outcome of interest
group_col: str: Column that contains certain groups of interest for DIA, e.g., {female, male, other}, {high school, college, graduate school, other}
ref_level: str: Reference group level used for disparity calculation.

Returns:

smd_frame: datatable.Frame: A frame in which the first column contains each group level and each column after contains the standardized mean difference between each class outcome and the reference level.

h2o_sonar.methods.utils.fairness_utils.squared_error(actual, predicted): Computes the squared error.

h2o_sonar.methods.utils.h2o_utils module

H2O-3 utilities with proper cluster lifecycle management.

This module manages H2O-3 cluster lifecycle to prevent memory leaks:

Automatic Shutdown: Shuts down old clusters when new ones are created to prevent multiple Java VMs from accumulating
Comprehensive Cleanup: Removes frames, models, and forces GC to immediately reclaim Java heap memory
Test Safety: Fixtures properly clean up after each test and shut down at session end

Memory Management Best Practices: - Session-scoped fixtures reuse clusters when possible - Function-scoped fixtures shut down previous clusters before creating new ones - clean_up_h2o3() removes frames AND models, then forces GC - kill_h2o3() shuts down clusters and resets tracking state

h2o_sonar.methods.utils.h2o_utils.assert_is_type(var, *types, **kwargs): Safe type assert with (cythonized code) bug workaround.

h2o_sonar.methods.utils.h2o_utils.clean_up_h2o3(logger=None)

Clean up H2O-3 data to free memory.

Removes all H2O frames AND models from H2O cluster, then forces Java garbage collection to immediately reclaim memory. This is more aggressive than relying on automatic GC which may delay memory reclamation.

If H2O is not installed, this function is a no-op and returns immediately.

h2o_sonar.methods.utils.h2o_utils.connect_to_h2o3(): Connect to H2O-3 server.

h2o_sonar.methods.utils.h2o_utils.ensure_h2o3_running(auto_start=True, h2o3_config_overrides: dict | None = None, logger=None)

Ensure that H2O-3 server is running - either by starting it or connecting to it. H2O-3 server is started even if the auto_start is not enable in H2O Sonar configuration.

Parameters:

auto_startbool: If True, the H2O-3 server is started if it is not running.
h2o3_config_overridesdict: H2O-3 configuration overrides.
logger: Logger.

h2o_sonar.methods.utils.h2o_utils.h2o_find_free_port(port: int = 54321, max_attempts: int = 10)

Find free port for H2O-3 server.

Parameters:

portint: Starting port. If 0, then any/random free port is found.
max_attemptsint: Maximum number of attempts.

Returns:

int: Free port.

h2o_sonar.methods.utils.h2o_utils.h2o_init(h2o3_config: dict | None = None)

Ensure connection to an H2O instance.

Parameters:

h2o3_configdict: H2O configuration as dictionary with keys defined in h2oaxi.config.H2o3Config e.g. port or memory.

h2o_sonar.methods.utils.h2o_utils.h2o_to_dt(X, col_names=None): Convert H2OFrames to datatables.

h2o_sonar.methods.utils.h2o_utils.is_h2o3_running() → bool: Determine if H2O-3 instance is running or not.

h2o_sonar.methods.utils.h2o_utils.kill_h2o3()

Shutdown H2O-3 cluster if it was started by H2O Sonar.

If H2O is not installed, this function is a no-op and returns immediately.

h2o_sonar.methods.utils.h2o_utils.preprocess_h2o3_data(frame_for_h2o3: Frame, contains_text_transformers: bool, explainer_work_path, config: H2oSonarConfig, sanitization_utils, num_labels: int, features_metadata: dict, meta_keys, persistence: Persistence, logger, vectorizer_path: str = '', lm_path: str = '', target_col: str = '', dropped_cols: list[str] | None = None, remove_preprocessed: bool = True)

Preprocess data for H2O-3.

Parameters:

frame_for_h2o3datatable.Frame: Frame to be preprocessed.
target_colstr: Optional target column name.
dropped_colslist[str] | None: Optional dropped columns list.
contains_text_transformersbool: Indicator of text transformers presence in the model.
features_metadatadict: Model features metadata.
meta_keys: Keys to be used with features metadata dictionary.
num_labelsint: Number of target labels (regression vs. binomial vs. multinomial).
explainer_work_pathstr: Explainer working directory path.
vectorizer_pathstr: Optional vectorizer path.
lm_pathstr: Optional linear model path.
config: Global H2O Sonar configuration with config overrides already applied - if supported by the container runtime.
sanitization_utils: Feature names sanitization utils.
remove_preprocessed: Control removal of preprocessed columns.
persistencepersistences.Persistence: Persistence store.
logger: Logger.

Returns:

datatable.Frame: Frame for H2O-3.

h2o_sonar.methods.utils.h2o_utils.start_h2o3(h2o3_config_overrides: dict | None = None, logger=None): Start H2O-3 on a local H2O instance.

h2o_sonar.methods.utils.h2o_utils.to_h2oframe(data, labels=None)

Convert H2OFrames.

Parameters:

data: h2o.H2OFrame | list | list of lists | pandas.DataFrame | datatable.Frame
| str: Data to be transformed to the frame.
labels: h2o.H2OFrame | list | pandas.DataFrame | None: Optional frame labels.

Returns:

Tuple[h2o.H2OFrame, bool]: Frame created from the data and indicator whether the data were transformed.

h2o_sonar.methods.utils.h2o_utils.upload_data(data)

Uploads the data located at a given path or returns it if it’s a h2o.H2OFrame, else an exception is raised.

Parameters:

datastr, h2o_sonar.core.data.PersistedData: Path to the data or h2o.H2OFrame.

Returns:

h2o.H2OFrame: H2O-3 frame.

h2o_sonar.methods.utils.histogram module

class h2o_sonar.methods.utils.histogram.HistogramBackend(value)

Bases: Enum

MLI = 1

h2o_sonar.methods.utils.histogram.find_nan_index(lst)

h2o_sonar.methods.utils.histogram.has_nan(lst)

h2o_sonar.methods.utils.histogram.histogram_data(df, bins: list = None, grid_resolution: int = 20, is_discrete: bool = False, is_date: bool = False, discrete_threshold: float = 0.06, backend: HistogramBackend = HistogramBackend.MLI, logger=None) → tuple

Get histogram data for given feature.

Supported feature data types:

integer (continuous)
float (continuous)
string (discrete)
date/time (continuous)

This implementation provides MLI and backend:

MLI histogram backend is used by default. It calculates histograms for int, float, string and date features using Numpy (Pandas is used only to convert dates for Numpy). MLI histogram backend allows bins specification and provides valid x-axis labels for all feature types.

Method can decide which backend to use.

Parameters:

df: datatable.Frame:: Data for which to calculate histogram represented as frame with one column (target feature).
bins: list: Optional bins / split points for which to compute histogram (unsupported by AutoReport backend).
grid_resolution: int: Optional grid resolution - the number of equal-width bins / split points in the given range.
is_discrete: bool: Optional specification to override continuous histogram default (False) of integer / float features and create discrete (categorical) histogram instead.
is_date: bool:: Optional specification to force date/time histogram for a string feature.
discrete_threshold: float: Optional threshold for relative difference between min/max gap, to get histogram for numeric df as discrete (if min_gap/max_gap < threshold, plot histogram).
backend: HistogramBackend: Backend to calculate histograms, HistogramBackend.MLI is default.
logger:: Logger for testability.

Returns:

list, list:: x-axis (bins/split point values) and y-axis (histogram frequencies). The number of x-axis and y-axis ticks are the same in case of discrete (categorical) features, but different in case of histogram for continuous features: len(x) = len(frequencies)+1.

h2o_sonar.methods.utils package

Submodules

h2o_sonar.methods.utils.fairness_utils module

h2o_sonar.methods.utils.h2o_utils module

h2o_sonar.methods.utils.histogram module

Module contents