h2o_sonar.methods.core package

Submodules

h2o_sonar.methods.core.method module

class h2o_sonar.methods.core.method.FeatureTypes

Bases: object

DEFAULT_DATE_FEATURE_FORMAT = '%Y%m%d'
KEY_CATEGORICAL_FEATURES = 'categorical'
KEY_CATNUM_FEATURES = 'catnum'
KEY_DATE_FEATURES = 'date'
KEY_DATE_FEATURES_FORMAT = 'date-format'
KEY_DATE_TIME_FEATURES = 'datetime'
KEY_ID_FEATURES = 'id'
KEY_IMAGE_FEATURES = 'image'
KEY_NUMERIC_FEATURES = 'numeric'
KEY_QUANTILE_BINS = 'quantile-bin'
KEY_TEXT_FEATURES = 'text'
KEY_TIME_FEATURES = 'time'
class h2o_sonar.methods.core.method.FeaturesMetadata(features_meta: Dict | None = None)

Bases: FeatureTypes

Utility class to build dictionary with features metadata. For instances as used/determined by a machine learning model. Every feature used by model is marked with its type (numeric, categorical or both) and characteristic (date, time, datetime, text, image, ID).

add(feature_type: str, feature_name: str)
property categorical_features: List

Categorical features - can overlap with numeric features.

property categorical_numeric_features: List
static create_blank_dict()
property date_features: List
property date_time_features: List
empty() bool

Return True if no feature metadata are set.

property format_date_features: List

Format for date features - index of the format corresponds to the index of date feature.

get(feature_name: str, default_value)
property id_features: List

ID features.

property image_features: List

Image features - column contains images and is used by the model.

property numeric_features: List

Numeric features (can overlap with categorical features)

property qtile_binning_features: Dict

Quantile binning specification for given features - key is the feature, value is quantile binning specification (the number of quantile bins to create e.g. 4 for quartiles)

set(features_meta: Dict)
property text_features: List

Text features - dataset column is used as text feature by the model.

property time_features: List
to_dict()
to_json(indent=None)
class h2o_sonar.methods.core.method.Method(method_name, method_type, interpretable_model=None)

Bases: ABC, FeatureTypes

Abstract class for all MLI objects exposing interpretation mechanisms.

DEFAULT_GRID_RESOLUTION = 10
KEY_CAT_WITH_NUM_BIN = 'categorical_with_numeric_bin'
LABEL_PREFIX_CLASS = 'p_'
LABEL_REGRESSION = 'p_0'
MISSING_VALUES = ['', '?', 'None', 'nan', 'NA', 'N/A', 'unknown', 'inf', '-inf', '1.7976931348623157e+308', '-1.7976931348623157e+308']
static create_date_aware_bins(features: list, frame, features_meta: dict = None, grid_resolution: int = 10, out_of_range_resolution: int = 0, date_format: str | List[str] = '%Y%m%d')

Create date aware bins (for basic formats) with given grid resolution.

Parameters:
features: list[int or str]

A list of features for which date aware bins should be created.

frame: datatable.Frame or pandas.core.frame.DataFrame

Original data for which should be partial dependence computed.

grid_resolution: int

The number of equally spaced points used to create bins if the number of unique values is big.

features_meta: dict

Optional features metadata allowing to indicate whether given feature is date (use date key and list of feature names)

out_of_range_resolution: int

Number of out of range bins to create below / above the binning interval.

date_format: str or [str]

Pandas (Python string format based) date format to be used to decode featurs. Optinal list allows to specify per-feature date format. https://docs.python.org/3/library/datetime.html #strftime-and-strptime-behavior

Returns:
bins, oor_bins: tuple(list[list[object]], list[list[object]])

Data values for each target feature for which we want to compute partial dependence, vector if for single target feature, otherwise a matrix.

property diagnostics

Method diagnostics data.

abstract explain(model, **kwargs)
property interpretable_model

Interpretable model.

static is_missing_value(value)

Determine whether input represents a missing value.

Parameters:
value:

Input value.

Returns:
bool:

True in case of missing value, False otherwise.

property method_name

Method name.

property method_type

Method type e.g. ‘loco’ or ‘ice’.

h2o_sonar.methods.core.stats module

class h2o_sonar.methods.core.stats.KolmogorovSmirnovResult(statistic, p_value, same_distribution, p_value_method)

Bases: tuple

p_value

Alias for field number 1

p_value_method

Alias for field number 3

same_distribution

Alias for field number 2

statistic

Alias for field number 0

h2o_sonar.methods.core.stats.jensen_shannon_divergence(sample_u: List, sample_v: List) float

Calculate the Jensen-Shannon divergence (not distance) between two distributions.

Parameters:
sample_uList

First probability distribution.

sample_vList

Second probability distribution.

Returns:
float

Jensen-Shannon divergence between the two distributions.

Module contents