h2o_sonar.methods.utils package
Submodules
h2o_sonar.methods.utils.fairness_utils module
- h2o_sonar.methods.utils.fairness_utils.check_cm_input(get_global_cm, group_levels, level, print_frame, group_column)
Check input to confusion matrix for binary DIA
- h2o_sonar.methods.utils.fairness_utils.check_dia_input(actual_column, high_threshold, low_threshold, predict_column, cutoff=None)
Check input of binary DIA class initialization
- h2o_sonar.methods.utils.fairness_utils.check_frame(actual_column, predict_column, group_column, frame)
Sanity checks for input frame to DIA
- h2o_sonar.methods.utils.fairness_utils.check_frame_type(frame)
Check frame type for DIA
- h2o_sonar.methods.utils.fairness_utils.cm_exp_parser(expression, cm_dict, level)
Small utility function that translates abbreviated metric expressions into executable Python statements:
tp | fp cm_dict[level][0, 0] | cm_dict[level][0, 1] ——- ==> ——————————————– fn | tn cm_dict[level][1, 0] | cm_dict[level][1, 1]
- h2o_sonar.methods.utils.fairness_utils.get_binary_metric_dict()
Dictionary of metrics utilized by binary DIA.
- h2o_sonar.methods.utils.fairness_utils.get_group_levels(group_column, frame)
Get level’s for a particular group column, e.g, {male, female}
- h2o_sonar.methods.utils.fairness_utils.get_metrics_list(problem_type)
Get DIA metrics for a given problem type (regression or binomial).
- h2o_sonar.methods.utils.fairness_utils.get_prroc_dt(frame, y, yhat, pos=1, neg=0, res=0.01)
Calculates precision, recall, and f1 for a datatable of y and yhat values.
- Args:
frame: Datatable of actual (y) and predicted (yhat) values. y: Name of actual value column. yhat: Name of predicted value column. pos: Primary target value, default 1. neg: Secondary target value, default 0. res: Resolution by which to loop through cutoffs, default 0.01.
- Returns:
Datatable of precision, recall, and f1 values.
- h2o_sonar.methods.utils.fairness_utils.get_r2_rmse(frame, actual_column, predict_column)
Calculate R2 and RMSE between actual and predicted columns in a Pandas frame.
- h2o_sonar.methods.utils.fairness_utils.get_reg_metrics_list()
List of metrics utilized by regression DIA.
- h2o_sonar.methods.utils.fairness_utils.mean_squared_error(actual, predicted)
Computes the mean squared error.
- h2o_sonar.methods.utils.fairness_utils.r_squared(actual, predicted)
Computes R^2 (coefficient of determination) regression score function.
- h2o_sonar.methods.utils.fairness_utils.root_mean_squared_error(actual, predicted)
Computes the root mean squared error.
- h2o_sonar.methods.utils.fairness_utils.smd_multinomial(frame, y, group_col, ref_level)
- Parameters:
- frame: datatable.Frame
Datatable that contains target, group column, and multinomial predictions (probabilities) as columns for each class outcome. For example:
target | group_col | class_1_prob | class_2_prob | … |
- y: str
Column that contains the true value for the outcome of interest
- group_col: str
Column that contains certain groups of interest for DIA, e.g., {female, male, other}, {high school, college, graduate school, other}
- ref_level: str
Reference group level used for disparity calculation.
- Returns:
- smd_frame: datatable.Frame
A frame in which the first column contains each group level and each column after contains the standardized mean difference between each class outcome and the reference level.
- h2o_sonar.methods.utils.fairness_utils.squared_error(actual, predicted)
Computes the squared error.
h2o_sonar.methods.utils.h2o_utils module
- h2o_sonar.methods.utils.h2o_utils.assert_is_type(var, *types, **kwargs)
Safe HMLI’s type assert with (cythonized code) bug workaround.
- h2o_sonar.methods.utils.h2o_utils.clean_up_h2o3()
- h2o_sonar.methods.utils.h2o_utils.connect_to_h2o3()
Connect to HMLI and H2O-3 server:
- HMLI client
-> uses H2O-3 client
—————> :port
- HMLI server ~ cluster (Java)
-> H2O-3 server (Java)
- h2o_sonar.methods.utils.h2o_utils.ensure_h2o3_running(auto_start=True, h2o3_config_overrides: Dict | None = None, logger=None)
Ensure that H2O-3 server is running - either by starting it or connecting to it. H2O-3 server is started even if the
auto_start
is not enable in H2O Eval Studio configuration.- Parameters:
- auto_startbool
If True, the H2O-3 server is started if it is not running.
- h2o3_config_overridesDict
H2O-3 configuration overrides.
- logger
Logger.
- h2o_sonar.methods.utils.h2o_utils.h2o_find_free_port(port: int = 54321, max_attempts: int = 10)
Find free port for H2O-3 / HMLI server.
- Parameters:
- portint
Starting port. If 0, then any/random free port is found.
- max_attemptsint
Maximum number of attempts.
- Returns:
- int
Free port.
- h2o_sonar.methods.utils.h2o_utils.h2o_init(h2o3_config: Dict | None = None)
Ensure connection to an H2O instance.
- Parameters:
- h2o3_configdict
H2O configuration as dictionary with keys defined in
h2oaxi.config.H2o3Config
e.g. port or memory.
- h2o_sonar.methods.utils.h2o_utils.h2o_to_dt(X, col_names=None)
- h2o_sonar.methods.utils.h2o_utils.is_h2o3_running() bool
- h2o_sonar.methods.utils.h2o_utils.kill_h2o3()
- h2o_sonar.methods.utils.h2o_utils.preprocess_h2o3_data(frame_for_h2o3: Frame, contains_text_transformers: bool, explainer_work_path, config: H2oSonarConfig, sanitization_utils, num_labels: int, features_metadata: Dict, meta_keys, persistence: Persistence, logger, vectorizer_path: str = '', lm_path: str = '', target_col: str = '', dropped_cols: List[str] | None = None, remove_preprocessed: bool = True)
Preprocess data for H2O-3.
- Parameters:
- frame_for_h2o3datatable.Frame
Frame to be preprocessed.
- target_colstr
Optional target column name.
- dropped_colsOptional[List[str]]
Optional dropped columns list.
- contains_text_transformersbool
Indicator of text transformers presence in the model.
- features_metadataDict
Model features metadata.
- meta_keys
Keys to be used with features metadata dictionary.
- num_labelsint
Number of target labels (regression vs. binomial vs. multinomial).
- explainer_work_pathstr
Explainer working directory path.
- vectorizer_pathstr
Optional vectorizer path.
- lm_pathstr
Optional linear model path.
- config
Global H2O Eval Studio configuration with config overrides already applied - if supported by the container runtime.
- sanitization_utils
Feature names sanitization utils.
- remove_preprocessed
Control removal of preprocessed columns.
- persistencepersistences.Persistence
Persistence store.
- logger
Logger.
- Returns:
- datatable.Frame
Frame for H2O-3.
- h2o_sonar.methods.utils.h2o_utils.start_h2o3(h2o3_config_overrides: Dict | None = None, logger=None)
- h2o_sonar.methods.utils.h2o_utils.to_h2oframe(data, labels=None)
If data is
hmli.H2OFrame
then returns data unchanged. Is this case labels need to be None! We do not concatenate them for you.If data is
hmli.H2OFrame
compatible then create a new H2OFrame from it and return it.If data is
hmli.H2OFrame
compatible then create a new H2OFrame from it and return it.Otherwise throw and exception.
- Parameters:
- data: Union[hmli.H2OFrame, list, list of lists, pandas.DataFrame]
Data to be transformed to the frame.
- labels: Union[hmli.H2OFrame, list, pandas.DataFrame]
Optional frame labels.
- Returns:
- Tuple[hmli.H2OFrame, bool]
Frame created from the data and indicator whether the data were transformed.
- h2o_sonar.methods.utils.h2o_utils.upload_data(data)
Uploads the data located at a given path or returns it if it’s a hmli.H2OFrame, else an exception is raised.
- Parameters:
- datastr, h2o_sonar.core.data.PersistedData
Path to the data or hmli.H2OFrame.
- Returns:
- hmli.H2OFrame
H2O-3 frame.
h2o_sonar.methods.utils.histogram module
- class h2o_sonar.methods.utils.histogram.HistogramBackend(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
- MLI = 1
- h2o_sonar.methods.utils.histogram.find_nan_index(lst)
- h2o_sonar.methods.utils.histogram.has_nan(lst)
- h2o_sonar.methods.utils.histogram.histogram_data(df, bins: list = None, grid_resolution: int = 20, is_discrete: bool = False, is_date: bool = False, discrete_threshold: float = 0.06, backend: HistogramBackend = HistogramBackend.MLI, logger=None) Tuple
Get histogram data for given feature.
Supported feature data types:
integer
(continuous)float
(continuous)string
(discrete)date/time
(continuous)
This implementation provides MLI and backend:
MLI histogram backend is used by default. It calculates histograms for int, float, string and date features using Numpy (Pandas is used only to convert dates for Numpy). MLI histogram backend allows bins specification and provides valid x-axis labels for all feature types.
Method can decide which backend to use.
- Parameters:
- df: datatable.Frame:
Data for which to calculate histogram represented as frame with one column (target feature).
- bins: list
Optional bins / split points for which to compute histogram (unsupported by AutoReport backend).
- grid_resolution: int
Optional grid resolution - the number of equal-width bins / split points in the given range.
- is_discrete: bool
Optional specification to override continuous histogram default (
False
) of integer / float features and create discrete (categorical) histogram instead.- is_date: bool:
Optional specification to force date/time histogram for a string feature.
- discrete_threshold: float
Optional threshold for relative difference between min/max gap, to get histogram for numeric
df
as discrete (ifmin_gap/max_gap < threshold
, plot histogram).- backend: HistogramBackend
Backend to calculate histograms,
HistogramBackend.MLI
is default.- logger:
Logger for testability.
- Returns:
- list, list:
x-axis (bins/split point values) and y-axis (histogram frequencies). The number of x-axis and y-axis ticks are the same in case of discrete (categorical) features, but different in case of histogram for continuous features:
len(x) = len(frequencies)+1
.