h2o_sonar.methods.utils package
Submodules
h2o_sonar.methods.utils.fairness_utils module
- h2o_sonar.methods.utils.fairness_utils.check_cm_input(get_global_cm, group_levels, level, print_frame, group_column)
Check input to confusion matrix for binary DIA
- h2o_sonar.methods.utils.fairness_utils.check_dia_input(actual_column, high_threshold, low_threshold, predict_column, cutoff=None)
Check input of binary DIA class initialization
- h2o_sonar.methods.utils.fairness_utils.check_frame(actual_column, predict_column, group_column, frame)
Sanity checks for input frame to DIA
- h2o_sonar.methods.utils.fairness_utils.check_frame_type(frame)
Check frame type for DIA
- h2o_sonar.methods.utils.fairness_utils.cm_exp_parser(expression, cm_dict, level)
Small utility function that translates abbreviated metric expressions into executable Python statements:
tp | fp cm_dict[level][0, 0] | cm_dict[level][0, 1] ——- ==> ——————————————– fn | tn cm_dict[level][1, 0] | cm_dict[level][1, 1]
- h2o_sonar.methods.utils.fairness_utils.get_binary_metric_dict()
Dictionary of metrics utilized by binary DIA.
- h2o_sonar.methods.utils.fairness_utils.get_group_levels(group_column, frame)
Get level’s for a particular group column, e.g, {male, female}
- h2o_sonar.methods.utils.fairness_utils.get_metrics_list(problem_type)
Get DIA metrics for a given problem type (regression or binomial).
- h2o_sonar.methods.utils.fairness_utils.get_prroc_dt(frame, y, yhat, pos=1, neg=0, res=0.01)
Calculates precision, recall, and f1 for a datatable of y and yhat values.
- Args:
frame: Datatable of actual (y) and predicted (yhat) values. y: Name of actual value column. yhat: Name of predicted value column. pos: Primary target value, default 1. neg: Secondary target value, default 0. res: Resolution by which to loop through cutoffs, default 0.01.
- Returns:
Datatable of precision, recall, and f1 values.
- h2o_sonar.methods.utils.fairness_utils.get_r2_rmse(frame, actual_column, predict_column)
Calculate R2 and RMSE between actual and predicted columns in a Pandas frame.
- h2o_sonar.methods.utils.fairness_utils.get_reg_metrics_list()
List of metrics utilized by regression DIA.
- h2o_sonar.methods.utils.fairness_utils.mean_squared_error(actual, predicted)
Computes the mean squared error.
- h2o_sonar.methods.utils.fairness_utils.r_squared(actual, predicted)
Computes R^2 (coefficient of determination) regression score function.
- h2o_sonar.methods.utils.fairness_utils.root_mean_squared_error(actual, predicted)
Computes the root mean squared error.
- h2o_sonar.methods.utils.fairness_utils.smd_multinomial(frame, y, group_col, ref_level)
- Parameters:
- frame: datatable.Frame
Datatable that contains target, group column, and multinomial predictions (probabilities) as columns for each class outcome. For example:
target | group_col | class_1_prob | class_2_prob | … |
- y: str
Column that contains the true value for the outcome of interest
- group_col: str
Column that contains certain groups of interest for DIA, e.g., {female, male, other}, {high school, college, graduate school, other}
- ref_level: str
Reference group level used for disparity calculation.
- Returns:
- smd_frame: datatable.Frame
A frame in which the first column contains each group level and each column after contains the standardized mean difference between each class outcome and the reference level.
- h2o_sonar.methods.utils.fairness_utils.squared_error(actual, predicted)
Computes the squared error.
h2o_sonar.methods.utils.h2o_utils module
H2O-3 utilities with proper cluster lifecycle management.
This module manages H2O-3 cluster lifecycle to prevent memory leaks:
Automatic Shutdown: Shuts down old clusters when new ones are created to prevent multiple Java VMs from accumulating
Comprehensive Cleanup: Removes frames, models, and forces GC to immediately reclaim Java heap memory
Test Safety: Fixtures properly clean up after each test and shut down at session end
Memory Management Best Practices: - Session-scoped fixtures reuse clusters when possible - Function-scoped fixtures shut down previous clusters before creating new ones - clean_up_h2o3() removes frames AND models, then forces GC - kill_h2o3() shuts down clusters and resets tracking state
- h2o_sonar.methods.utils.h2o_utils.assert_is_type(var, *types, **kwargs)
Safe type assert with (cythonized code) bug workaround.
- h2o_sonar.methods.utils.h2o_utils.clean_up_h2o3(logger=None)
Clean up H2O-3 data to free memory.
Removes all H2O frames AND models from H2O cluster, then forces Java garbage collection to immediately reclaim memory. This is more aggressive than relying on automatic GC which may delay memory reclamation.
If H2O is not installed, this function is a no-op and returns immediately.
- h2o_sonar.methods.utils.h2o_utils.connect_to_h2o3()
Connect to H2O-3 server.
- h2o_sonar.methods.utils.h2o_utils.ensure_h2o3_running(auto_start=True, h2o3_config_overrides: dict | None = None, logger=None)
Ensure that H2O-3 server is running - either by starting it or connecting to it. H2O-3 server is started even if the
auto_startis not enable in H2O Sonar configuration.- Parameters:
- auto_startbool
If True, the H2O-3 server is started if it is not running.
- h2o3_config_overridesdict
H2O-3 configuration overrides.
- logger
Logger.
- h2o_sonar.methods.utils.h2o_utils.h2o_find_free_port(port: int = 54321, max_attempts: int = 10)
Find free port for H2O-3 server.
- Parameters:
- portint
Starting port. If 0, then any/random free port is found.
- max_attemptsint
Maximum number of attempts.
- Returns:
- int
Free port.
- h2o_sonar.methods.utils.h2o_utils.h2o_init(h2o3_config: dict | None = None)
Ensure connection to an H2O instance.
- Parameters:
- h2o3_configdict
H2O configuration as dictionary with keys defined in
h2oaxi.config.H2o3Confige.g. port or memory.
- h2o_sonar.methods.utils.h2o_utils.h2o_to_dt(X, col_names=None)
Convert H2OFrames to datatables.
- h2o_sonar.methods.utils.h2o_utils.is_h2o3_running() bool
Determine if H2O-3 instance is running or not.
- h2o_sonar.methods.utils.h2o_utils.kill_h2o3()
Shutdown H2O-3 cluster if it was started by H2O Sonar.
If H2O is not installed, this function is a no-op and returns immediately.
- h2o_sonar.methods.utils.h2o_utils.preprocess_h2o3_data(frame_for_h2o3: Frame, contains_text_transformers: bool, explainer_work_path, config: H2oSonarConfig, sanitization_utils, num_labels: int, features_metadata: dict, meta_keys, persistence: Persistence, logger, vectorizer_path: str = '', lm_path: str = '', target_col: str = '', dropped_cols: list[str] | None = None, remove_preprocessed: bool = True)
Preprocess data for H2O-3.
- Parameters:
- frame_for_h2o3datatable.Frame
Frame to be preprocessed.
- target_colstr
Optional target column name.
- dropped_colslist[str] | None
Optional dropped columns list.
- contains_text_transformersbool
Indicator of text transformers presence in the model.
- features_metadatadict
Model features metadata.
- meta_keys
Keys to be used with features metadata dictionary.
- num_labelsint
Number of target labels (regression vs. binomial vs. multinomial).
- explainer_work_pathstr
Explainer working directory path.
- vectorizer_pathstr
Optional vectorizer path.
- lm_pathstr
Optional linear model path.
- config
Global H2O Sonar configuration with config overrides already applied - if supported by the container runtime.
- sanitization_utils
Feature names sanitization utils.
- remove_preprocessed
Control removal of preprocessed columns.
- persistencepersistences.Persistence
Persistence store.
- logger
Logger.
- Returns:
- datatable.Frame
Frame for H2O-3.
- h2o_sonar.methods.utils.h2o_utils.start_h2o3(h2o3_config_overrides: dict | None = None, logger=None)
Start H2O-3 on a local H2O instance.
- h2o_sonar.methods.utils.h2o_utils.to_h2oframe(data, labels=None)
Convert H2OFrames.
- Parameters:
- data: h2o.H2OFrame | list | list of lists | pandas.DataFrame | datatable.Frame
- | str
Data to be transformed to the frame.
- labels: h2o.H2OFrame | list | pandas.DataFrame | None
Optional frame labels.
- Returns:
- Tuple[h2o.H2OFrame, bool]
Frame created from the data and indicator whether the data were transformed.
- h2o_sonar.methods.utils.h2o_utils.upload_data(data)
Uploads the data located at a given path or returns it if it’s a h2o.H2OFrame, else an exception is raised.
- Parameters:
- datastr, h2o_sonar.core.data.PersistedData
Path to the data or h2o.H2OFrame.
- Returns:
- h2o.H2OFrame
H2O-3 frame.
h2o_sonar.methods.utils.histogram module
- h2o_sonar.methods.utils.histogram.find_nan_index(lst)
- h2o_sonar.methods.utils.histogram.has_nan(lst)
- h2o_sonar.methods.utils.histogram.histogram_data(df, bins: list = None, grid_resolution: int = 20, is_discrete: bool = False, is_date: bool = False, discrete_threshold: float = 0.06, backend: HistogramBackend = HistogramBackend.MLI, logger=None) tuple
Get histogram data for given feature.
Supported feature data types:
integer(continuous)float(continuous)string(discrete)date/time(continuous)
This implementation provides MLI and backend:
MLI histogram backend is used by default. It calculates histograms for int, float, string and date features using Numpy (Pandas is used only to convert dates for Numpy). MLI histogram backend allows bins specification and provides valid x-axis labels for all feature types.
Method can decide which backend to use.
- Parameters:
- df: datatable.Frame:
Data for which to calculate histogram represented as frame with one column (target feature).
- bins: list
Optional bins / split points for which to compute histogram (unsupported by AutoReport backend).
- grid_resolution: int
Optional grid resolution - the number of equal-width bins / split points in the given range.
- is_discrete: bool
Optional specification to override continuous histogram default (
False) of integer / float features and create discrete (categorical) histogram instead.- is_date: bool:
Optional specification to force date/time histogram for a string feature.
- discrete_threshold: float
Optional threshold for relative difference between min/max gap, to get histogram for numeric
dfas discrete (ifmin_gap/max_gap < threshold, plot histogram).- backend: HistogramBackend
Backend to calculate histograms,
HistogramBackend.MLIis default.- logger:
Logger for testability.
- Returns:
- list, list:
x-axis (bins/split point values) and y-axis (histogram frequencies). The number of x-axis and y-axis ticks are the same in case of discrete (categorical) features, but different in case of histogram for continuous features:
len(x) = len(frequencies)+1.