h2o_sonar.methods.core package
Submodules
h2o_sonar.methods.core.method module
- class h2o_sonar.methods.core.method.FeatureTypes
Bases:
object
- DEFAULT_DATE_FEATURE_FORMAT = '%Y%m%d'
- KEY_CATEGORICAL_FEATURES = 'categorical'
- KEY_CATNUM_FEATURES = 'catnum'
- KEY_DATE_FEATURES = 'date'
- KEY_DATE_FEATURES_FORMAT = 'date-format'
- KEY_DATE_TIME_FEATURES = 'datetime'
- KEY_ID_FEATURES = 'id'
- KEY_IMAGE_FEATURES = 'image'
- KEY_NUMERIC_FEATURES = 'numeric'
- KEY_QUANTILE_BINS = 'quantile-bin'
- KEY_TEXT_FEATURES = 'text'
- KEY_TIME_FEATURES = 'time'
- class h2o_sonar.methods.core.method.FeaturesMetadata(features_meta: Dict | None = None)
Bases:
FeatureTypes
Utility class to build dictionary with features metadata. For instances as used/determined by a machine learning model. Every feature used by model is marked with its type (numeric, categorical or both) and characteristic (date, time, datetime, text, image, ID).
- add(feature_type: str, feature_name: str)
- property categorical_features: List
Categorical features - can overlap with numeric features.
- property categorical_numeric_features: List
- static create_blank_dict()
- property date_features: List
- property date_time_features: List
- empty() bool
Return
True
if no feature metadata are set.
- property format_date_features: List
Format for date features - index of the format corresponds to the index of date feature.
- get(feature_name: str, default_value)
- property id_features: List
ID features.
- property image_features: List
Image features - column contains images and is used by the model.
- property numeric_features: List
Numeric features (can overlap with categorical features)
- property qtile_binning_features: Dict
Quantile binning specification for given features - key is the feature, value is quantile binning specification (the number of quantile bins to create e.g. 4 for quartiles)
- set(features_meta: Dict)
- property text_features: List
Text features - dataset column is used as text feature by the model.
- property time_features: List
- to_dict()
- to_json(indent=None)
- class h2o_sonar.methods.core.method.Method(method_name, method_type, interpretable_model=None)
Bases:
ABC
,FeatureTypes
Abstract class for all MLI objects exposing interpretation mechanisms.
- DEFAULT_GRID_RESOLUTION = 10
- KEY_CAT_WITH_NUM_BIN = 'categorical_with_numeric_bin'
- LABEL_PREFIX_CLASS = 'p_'
- LABEL_REGRESSION = 'p_0'
- MISSING_VALUES = ['', '?', 'None', 'nan', 'NA', 'N/A', 'unknown', 'inf', '-inf', '1.7976931348623157e+308', '-1.7976931348623157e+308']
- static create_date_aware_bins(features: list, frame, features_meta: dict = None, grid_resolution: int = 10, out_of_range_resolution: int = 0, date_format: str | List[str] = '%Y%m%d')
Create date aware bins (for basic formats) with given grid resolution.
- Parameters:
- features: list[int or str]
A list of features for which date aware bins should be created.
- frame: datatable.Frame or pandas.core.frame.DataFrame
Original data for which should be partial dependence computed.
- grid_resolution: int
The number of equally spaced points used to create bins if the number of unique values is big.
- features_meta: dict
Optional features metadata allowing to indicate whether given feature is date (use
date
key and list of feature names)- out_of_range_resolution: int
Number of out of range bins to create below / above the binning interval.
- date_format: str or [str]
Pandas (Python string format based) date format to be used to decode featurs. Optinal list allows to specify per-feature date format. https://docs.python.org/3/library/datetime.html #strftime-and-strptime-behavior
- Returns:
- bins, oor_bins: tuple(list[list[object]], list[list[object]])
Data values for each target feature for which we want to compute partial dependence, vector if for single target feature, otherwise a matrix.
- property diagnostics
Method diagnostics data.
- abstract explain(model, **kwargs)
- property interpretable_model
Interpretable model.
- static is_missing_value(value)
Determine whether input represents a missing value.
- Parameters:
- value:
Input value.
- Returns:
- bool:
True in case of missing value, False otherwise.
- property method_name
Method name.
- property method_type
Method type e.g. ‘loco’ or ‘ice’.
h2o_sonar.methods.core.stats module
- class h2o_sonar.methods.core.stats.KolmogorovSmirnovResult(statistic, p_value, same_distribution, p_value_method)
Bases:
tuple
- p_value
Alias for field number 1
- p_value_method
Alias for field number 3
- same_distribution
Alias for field number 2
- statistic
Alias for field number 0
- h2o_sonar.methods.core.stats.jensen_shannon_divergence(sample_u: List, sample_v: List) float
Calculate the Jensen-Shannon divergence (not distance) between two distributions.
- Parameters:
- sample_uList
First probability distribution.
- sample_vList
Second probability distribution.
- Returns:
- float
Jensen-Shannon divergence between the two distributions.