h2o_sonar.methods.utils package

Submodules

h2o_sonar.methods.utils.fairness_utils module

h2o_sonar.methods.utils.fairness_utils.check_cm_input(get_global_cm, group_levels, level, print_frame, group_column): Check input to confusion matrix for binary DIA

h2o_sonar.methods.utils.fairness_utils.check_dia_input(actual_column, high_threshold, low_threshold, predict_column, cutoff=None): Check input of binary DIA class initialization

h2o_sonar.methods.utils.fairness_utils.check_frame(actual_column, predict_column, group_column, frame): Sanity checks for input frame to DIA

h2o_sonar.methods.utils.fairness_utils.check_frame_type(frame): Check frame type for DIA

h2o_sonar.methods.utils.fairness_utils.cm_exp_parser(expression, cm_dict, level)

Small utility function that translates abbreviated metric expressions into executable Python statements:

tp | fp cm_dict[level][0, 0] | cm_dict[level][0, 1] ——- ==> ——————————————– fn | tn cm_dict[level][1, 0] | cm_dict[level][1, 1]

h2o_sonar.methods.utils.fairness_utils.get_binary_metric_dict(): Dictionary of metrics utilized by binary DIA.

h2o_sonar.methods.utils.fairness_utils.get_group_levels(group_column, frame): Get level’s for a particular group column, e.g, {male, female}

h2o_sonar.methods.utils.fairness_utils.get_metrics_list(problem_type): Get DIA metrics for a given problem type (regression or binomial).

h2o_sonar.methods.utils.fairness_utils.get_prroc_dt(frame, y, yhat, pos=1, neg=0, res=0.01)

Calculates precision, recall, and f1 for a datatable of y and yhat values.

Args:: frame: Datatable of actual (y) and predicted (yhat) values. y: Name of actual value column. yhat: Name of predicted value column. pos: Primary target value, default 1. neg: Secondary target value, default 0. res: Resolution by which to loop through cutoffs, default 0.01.
Returns:: Datatable of precision, recall, and f1 values.

h2o_sonar.methods.utils.fairness_utils.get_r2_rmse(frame, actual_column, predict_column): Calculate R2 and RMSE between actual and predicted columns in a Pandas frame.

h2o_sonar.methods.utils.fairness_utils.get_reg_metrics_list(): List of metrics utilized by regression DIA.

h2o_sonar.methods.utils.fairness_utils.mean_squared_error(actual, predicted): Computes the mean squared error.

h2o_sonar.methods.utils.fairness_utils.r_squared(actual, predicted): Computes R^2 (coefficient of determination) regression score function.

h2o_sonar.methods.utils.fairness_utils.root_mean_squared_error(actual, predicted): Computes the root mean squared error.

h2o_sonar.methods.utils.fairness_utils.smd_multinomial(frame, y, group_col, ref_level)

Parameters:

frame: datatable.Frame: Datatable that contains target, group column, and multinomial predictions (probabilities) as columns for each class outcome. For example:

target | group_col | class_1_prob | class_2_prob | … |
y: str: Column that contains the true value for the outcome of interest
group_col: str: Column that contains certain groups of interest for DIA, e.g., {female, male, other}, {high school, college, graduate school, other}
ref_level: str: Reference group level used for disparity calculation.

Returns:

smd_frame: datatable.Frame: A frame in which the first column contains each group level and each column after contains the standardized mean difference between each class outcome and the reference level.

h2o_sonar.methods.utils.fairness_utils.squared_error(actual, predicted): Computes the squared error.

h2o_sonar.methods.utils.h2o_utils module

h2o_sonar.methods.utils.h2o_utils.assert_is_type(var, *types, **kwargs): Safe HMLI’s type assert with (cythonized code) bug workaround.

h2o_sonar.methods.utils.h2o_utils.clean_up_h2o3()

h2o_sonar.methods.utils.h2o_utils.connect_to_h2o3()

Connect to HMLI and H2O-3 server:

HMLI client: -> uses H2O-3 client

—————> :port

HMLI server ~ cluster (Java): -> H2O-3 server (Java)

h2o_sonar.methods.utils.h2o_utils.ensure_h2o3_running(auto_start=True, h2o3_config_overrides: Dict | None = None, logger=None)

Ensure that H2O-3 server is running - either by starting it or connecting to it. H2O-3 server is started even if the auto_start is not enable in H2O Eval Studio configuration.

Parameters:

auto_startbool: If True, the H2O-3 server is started if it is not running.
h2o3_config_overridesDict: H2O-3 configuration overrides.
logger: Logger.

h2o_sonar.methods.utils.h2o_utils.h2o_find_free_port(port: int = 54321, max_attempts: int = 10)

Find free port for H2O-3 / HMLI server.

Parameters:

portint: Starting port. If 0, then any/random free port is found.
max_attemptsint: Maximum number of attempts.

Returns:

int: Free port.

h2o_sonar.methods.utils.h2o_utils.h2o_init(h2o3_config: Dict | None = None)

Ensure connection to an H2O instance.

Parameters:

h2o3_configdict: H2O configuration as dictionary with keys defined in h2oaxi.config.H2o3Config e.g. port or memory.

h2o_sonar.methods.utils.h2o_utils.h2o_to_dt(X, col_names=None)

h2o_sonar.methods.utils.h2o_utils.is_h2o3_running() → bool

h2o_sonar.methods.utils.h2o_utils.kill_h2o3()

h2o_sonar.methods.utils.h2o_utils.preprocess_h2o3_data(frame_for_h2o3: Frame, contains_text_transformers: bool, explainer_work_path, config: H2oSonarConfig, sanitization_utils, num_labels: int, features_metadata: Dict, meta_keys, persistence: Persistence, logger, vectorizer_path: str = '', lm_path: str = '', target_col: str = '', dropped_cols: List[str] | None = None, remove_preprocessed: bool = True)

Preprocess data for H2O-3.

Parameters:

frame_for_h2o3datatable.Frame: Frame to be preprocessed.
target_colstr: Optional target column name.
dropped_colsOptional[List[str]]: Optional dropped columns list.
contains_text_transformersbool: Indicator of text transformers presence in the model.
features_metadataDict: Model features metadata.
meta_keys: Keys to be used with features metadata dictionary.
num_labelsint: Number of target labels (regression vs. binomial vs. multinomial).
explainer_work_pathstr: Explainer working directory path.
vectorizer_pathstr: Optional vectorizer path.
lm_pathstr: Optional linear model path.
config: Global H2O Eval Studio configuration with config overrides already applied - if supported by the container runtime.
sanitization_utils: Feature names sanitization utils.
remove_preprocessed: Control removal of preprocessed columns.
persistencepersistences.Persistence: Persistence store.
logger: Logger.

Returns:

datatable.Frame: Frame for H2O-3.

h2o_sonar.methods.utils.h2o_utils.start_h2o3(h2o3_config_overrides: Dict | None = None, logger=None)

h2o_sonar.methods.utils.h2o_utils.to_h2oframe(data, labels=None)

If data is hmli.H2OFrame then returns data unchanged. Is this case labels need to be None! We do not concatenate them for you.

If data is hmli.H2OFrame compatible then create a new H2OFrame from it and return it.

Otherwise throw and exception.

Parameters:

data: Union[hmli.H2OFrame, list, list of lists, pandas.DataFrame]: Data to be transformed to the frame.
labels: Union[hmli.H2OFrame, list, pandas.DataFrame]: Optional frame labels.

Returns:

Tuple[hmli.H2OFrame, bool]: Frame created from the data and indicator whether the data were transformed.

h2o_sonar.methods.utils.h2o_utils.upload_data(data)

Uploads the data located at a given path or returns it if it’s a hmli.H2OFrame, else an exception is raised.

Parameters:

datastr, h2o_sonar.core.data.PersistedData: Path to the data or hmli.H2OFrame.

Returns:

hmli.H2OFrame: H2O-3 frame.

h2o_sonar.methods.utils.histogram module

class h2o_sonar.methods.utils.histogram.HistogramBackend(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

MLI = 1

h2o_sonar.methods.utils.histogram.find_nan_index(lst)

h2o_sonar.methods.utils.histogram.has_nan(lst)

h2o_sonar.methods.utils.histogram.histogram_data(df, bins: list = None, grid_resolution: int = 20, is_discrete: bool = False, is_date: bool = False, discrete_threshold: float = 0.06, backend: HistogramBackend = HistogramBackend.MLI, logger=None) → Tuple

Get histogram data for given feature.

Supported feature data types:

integer (continuous)
float (continuous)
string (discrete)
date/time (continuous)

This implementation provides MLI and backend:

MLI histogram backend is used by default. It calculates histograms for int, float, string and date features using Numpy (Pandas is used only to convert dates for Numpy). MLI histogram backend allows bins specification and provides valid x-axis labels for all feature types.

Method can decide which backend to use.

Parameters:

df: datatable.Frame:: Data for which to calculate histogram represented as frame with one column (target feature).
bins: list: Optional bins / split points for which to compute histogram (unsupported by AutoReport backend).
grid_resolution: int: Optional grid resolution - the number of equal-width bins / split points in the given range.
is_discrete: bool: Optional specification to override continuous histogram default (False) of integer / float features and create discrete (categorical) histogram instead.
is_date: bool:: Optional specification to force date/time histogram for a string feature.
discrete_threshold: float: Optional threshold for relative difference between min/max gap, to get histogram for numeric df as discrete (if min_gap/max_gap < threshold, plot histogram).
backend: HistogramBackend: Backend to calculate histograms, HistogramBackend.MLI is default.
logger:: Logger for testability.

Returns:

list, list:: x-axis (bins/split point values) and y-axis (histogram frequencies). The number of x-axis and y-axis ticks are the same in case of discrete (categorical) features, but different in case of histogram for continuous features: len(x) = len(frequencies)+1.

h2o_sonar.methods.utils package

Submodules

h2o_sonar.methods.utils.fairness_utils module

h2o_sonar.methods.utils.h2o_utils module

h2o_sonar.methods.utils.histogram module

Module contents