h2o_sonar.methods.utils package

Submodules

h2o_sonar.methods.utils.fairness_utils module

h2o_sonar.methods.utils.fairness_utils.check_cm_input(get_global_cm, group_levels, level, print_frame, group_column)

Check input to confusion matrix for binary DIA

h2o_sonar.methods.utils.fairness_utils.check_dia_input(actual_column, high_threshold, low_threshold, predict_column, cutoff=None)

Check input of binary DIA class initialization

h2o_sonar.methods.utils.fairness_utils.check_frame(actual_column, predict_column, group_column, frame)

Sanity checks for input frame to DIA

h2o_sonar.methods.utils.fairness_utils.check_frame_type(frame)

Check frame type for DIA

h2o_sonar.methods.utils.fairness_utils.cm_exp_parser(expression, cm_dict, level)

Small utility function that translates abbreviated metric expressions into executable Python statements:

tp | fp cm_dict[level][0, 0] | cm_dict[level][0, 1] ——- ==> ——————————————– fn | tn cm_dict[level][1, 0] | cm_dict[level][1, 1]

h2o_sonar.methods.utils.fairness_utils.get_binary_metric_dict()

Dictionary of metrics utilized by binary DIA.

h2o_sonar.methods.utils.fairness_utils.get_group_levels(group_column, frame)

Get level’s for a particular group column, e.g, {male, female}

h2o_sonar.methods.utils.fairness_utils.get_metrics_list(problem_type)

Get DIA metrics for a given problem type (regression or binomial).

h2o_sonar.methods.utils.fairness_utils.get_prroc_dt(frame, y, yhat, pos=1, neg=0, res=0.01)

Calculates precision, recall, and f1 for a datatable of y and yhat values.

Args:

frame: Datatable of actual (y) and predicted (yhat) values. y: Name of actual value column. yhat: Name of predicted value column. pos: Primary target value, default 1. neg: Secondary target value, default 0. res: Resolution by which to loop through cutoffs, default 0.01.

Returns:

Datatable of precision, recall, and f1 values.

h2o_sonar.methods.utils.fairness_utils.get_r2_rmse(frame, actual_column, predict_column)

Calculate R2 and RMSE between actual and predicted columns in a Pandas frame.

h2o_sonar.methods.utils.fairness_utils.get_reg_metrics_list()

List of metrics utilized by regression DIA.

h2o_sonar.methods.utils.fairness_utils.mean_squared_error(actual, predicted)

Computes the mean squared error.

h2o_sonar.methods.utils.fairness_utils.r_squared(actual, predicted)

Computes R^2 (coefficient of determination) regression score function.

h2o_sonar.methods.utils.fairness_utils.root_mean_squared_error(actual, predicted)

Computes the root mean squared error.

h2o_sonar.methods.utils.fairness_utils.smd_multinomial(frame, y, group_col, ref_level)
Parameters:
frame: datatable.Frame

Datatable that contains target, group column, and multinomial predictions (probabilities) as columns for each class outcome. For example:

target | group_col | class_1_prob | class_2_prob | … |

y: str

Column that contains the true value for the outcome of interest

group_col: str

Column that contains certain groups of interest for DIA, e.g., {female, male, other}, {high school, college, graduate school, other}

ref_level: str

Reference group level used for disparity calculation.

Returns:
smd_frame: datatable.Frame

A frame in which the first column contains each group level and each column after contains the standardized mean difference between each class outcome and the reference level.

h2o_sonar.methods.utils.fairness_utils.squared_error(actual, predicted)

Computes the squared error.

h2o_sonar.methods.utils.h2o_utils module

h2o_sonar.methods.utils.h2o_utils.assert_is_type(var, *types, **kwargs)

Safe HMLI’s type assert with (cythonized code) bug workaround.

h2o_sonar.methods.utils.h2o_utils.clean_up_h2o3()
h2o_sonar.methods.utils.h2o_utils.connect_to_h2o3()

Connect to HMLI and H2O-3 server:

HMLI client

-> uses H2O-3 client

—————> :port

HMLI server ~ cluster (Java)

-> H2O-3 server (Java)

h2o_sonar.methods.utils.h2o_utils.ensure_h2o3_running(auto_start=True, h2o3_config_overrides: Dict | None = None, logger=None)

Ensure that H2O-3 server is running - either by starting it or connecting to it. H2O-3 server is started even if the auto_start is not enable in H2O Eval Studio configuration.

Parameters:
auto_startbool

If True, the H2O-3 server is started if it is not running.

h2o3_config_overridesDict

H2O-3 configuration overrides.

logger

Logger.

h2o_sonar.methods.utils.h2o_utils.h2o_find_free_port(port: int = 54321, max_attempts: int = 10)

Find free port for H2O-3 / HMLI server.

Parameters:
portint

Starting port. If 0, then any/random free port is found.

max_attemptsint

Maximum number of attempts.

Returns:
int

Free port.

h2o_sonar.methods.utils.h2o_utils.h2o_init(h2o3_config: Dict | None = None)

Ensure connection to an H2O instance.

Parameters:
h2o3_configdict

H2O configuration as dictionary with keys defined in h2oaxi.config.H2o3Config e.g. port or memory.

h2o_sonar.methods.utils.h2o_utils.h2o_to_dt(X, col_names=None)
h2o_sonar.methods.utils.h2o_utils.is_h2o3_running() bool
h2o_sonar.methods.utils.h2o_utils.kill_h2o3()
h2o_sonar.methods.utils.h2o_utils.preprocess_h2o3_data(frame_for_h2o3: Frame, contains_text_transformers: bool, explainer_work_path, config: H2oSonarConfig, sanitization_utils, num_labels: int, features_metadata: Dict, meta_keys, persistence: Persistence, logger, vectorizer_path: str = '', lm_path: str = '', target_col: str = '', dropped_cols: List[str] | None = None, remove_preprocessed: bool = True)

Preprocess data for H2O-3.

Parameters:
frame_for_h2o3datatable.Frame

Frame to be preprocessed.

target_colstr

Optional target column name.

dropped_colsOptional[List[str]]

Optional dropped columns list.

contains_text_transformersbool

Indicator of text transformers presence in the model.

features_metadataDict

Model features metadata.

meta_keys

Keys to be used with features metadata dictionary.

num_labelsint

Number of target labels (regression vs. binomial vs. multinomial).

explainer_work_pathstr

Explainer working directory path.

vectorizer_pathstr

Optional vectorizer path.

lm_pathstr

Optional linear model path.

config

Global H2O Eval Studio configuration with config overrides already applied - if supported by the container runtime.

sanitization_utils

Feature names sanitization utils.

remove_preprocessed

Control removal of preprocessed columns.

persistencepersistences.Persistence

Persistence store.

logger

Logger.

Returns:
datatable.Frame

Frame for H2O-3.

h2o_sonar.methods.utils.h2o_utils.start_h2o3(h2o3_config_overrides: Dict | None = None, logger=None)
h2o_sonar.methods.utils.h2o_utils.to_h2oframe(data, labels=None)

If data is hmli.H2OFrame then returns data unchanged. Is this case labels need to be None! We do not concatenate them for you.

If data is hmli.H2OFrame compatible then create a new H2OFrame from it and return it.

If data is hmli.H2OFrame compatible then create a new H2OFrame from it and return it.

Otherwise throw and exception.

Parameters:
data: Union[hmli.H2OFrame, list, list of lists, pandas.DataFrame]

Data to be transformed to the frame.

labels: Union[hmli.H2OFrame, list, pandas.DataFrame]

Optional frame labels.

Returns:
Tuple[hmli.H2OFrame, bool]

Frame created from the data and indicator whether the data were transformed.

h2o_sonar.methods.utils.h2o_utils.upload_data(data)

Uploads the data located at a given path or returns it if it’s a hmli.H2OFrame, else an exception is raised.

Parameters:
datastr, h2o_sonar.core.data.PersistedData

Path to the data or hmli.H2OFrame.

Returns:
hmli.H2OFrame

H2O-3 frame.

h2o_sonar.methods.utils.histogram module

class h2o_sonar.methods.utils.histogram.HistogramBackend(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

MLI = 1
h2o_sonar.methods.utils.histogram.find_nan_index(lst)
h2o_sonar.methods.utils.histogram.has_nan(lst)
h2o_sonar.methods.utils.histogram.histogram_data(df, bins: list = None, grid_resolution: int = 20, is_discrete: bool = False, is_date: bool = False, discrete_threshold: float = 0.06, backend: HistogramBackend = HistogramBackend.MLI, logger=None) Tuple

Get histogram data for given feature.

Supported feature data types:

  • integer (continuous)

  • float (continuous)

  • string (discrete)

  • date/time (continuous)

This implementation provides MLI and backend:

  • MLI histogram backend is used by default. It calculates histograms for int, float, string and date features using Numpy (Pandas is used only to convert dates for Numpy). MLI histogram backend allows bins specification and provides valid x-axis labels for all feature types.

Method can decide which backend to use.

Parameters:
df: datatable.Frame:

Data for which to calculate histogram represented as frame with one column (target feature).

bins: list

Optional bins / split points for which to compute histogram (unsupported by AutoReport backend).

grid_resolution: int

Optional grid resolution - the number of equal-width bins / split points in the given range.

is_discrete: bool

Optional specification to override continuous histogram default (False) of integer / float features and create discrete (categorical) histogram instead.

is_date: bool:

Optional specification to force date/time histogram for a string feature.

discrete_threshold: float

Optional threshold for relative difference between min/max gap, to get histogram for numeric df as discrete (if min_gap/max_gap < threshold, plot histogram).

backend: HistogramBackend

Backend to calculate histograms, HistogramBackend.MLI is default.

logger:

Logger for testability.

Returns:
list, list:

x-axis (bins/split point values) and y-axis (histogram frequencies). The number of x-axis and y-axis ticks are the same in case of discrete (categorical) features, but different in case of histogram for continuous features: len(x) = len(frequencies)+1.

Module contents