h2o_sonar.utils package
Submodules
h2o_sonar.utils.binning module
- h2o_sonar.utils.binning.build_qtile_bins(bins: list, X: DataFrame, feature: str, quantile: int)
Build quantile bins and append back to input bins list.
- Parameters:
- binslist
List of bins.
- Xpandas.DataFrame
Input frame to PD/ICE.
- featurestr
Feature to create quantile bins for.
- quantileint
The decile to compute.
- h2o_sonar.utils.binning.qbin_column(frame: Frame, column: str, logger)
Quantile bin a column in a frame and substitute it in that frame with quantile group ranges for each row.
- Parameters:
- framedatatable.Frame
Frame containing the data. One of the column names must correspond to the column parameter.
- columnstr
Name of the column to be checked.
- loggerLogger
Logger.
- h2o_sonar.utils.binning.quantile_bin(frame: Frame = None, qbin_cols: list[str] | None = None, qbin_count: int = 0, varimp_list: list[str] | None = None, logger=None)
Quantile binning.
- Parameters:
- framedt.Frame
Input frame for quantile binning.
- qbin_colslist
Column(s) to use for quantile binning
- qbin_countint
Number of top numeric variables to use from model’s variable importance list.
- varimp_listlist
Variable importance list from model.
- loggerLogger
Logger.
- Returns:
- Tuple[list, Pandas Dataframe]
List of columns that were binned and Dataframe with quantile binned columns.
h2o_sonar.utils.caching module
Caching module provides functionality to download and cached models used for evaluation upfront, to avoid downloading the models in the runtime.
- h2o_sonar.utils.caching.cache_all_minilm_l6_v2(logger: SonarLogger)
Cache the all-MiniLM-L6-v2 model.
- h2o_sonar.utils.caching.cache_all_models(logger: SonarLogger)
Cache all the models used in the Sonar package.
- h2o_sonar.utils.caching.cache_baai_bge_small_en(logger: SonarLogger)
Cache the BAAI BGE small en
- h2o_sonar.utils.caching.cache_baai_bge_small_env15(logger: SonarLogger)
Cache the BAAI BGE small environment v1.5 model.
- h2o_sonar.utils.caching.cache_bert_base_uncased(logger: SonarLogger)
Cache the BERT base uncased model.
- h2o_sonar.utils.caching.cache_bge_m3(logger: SonarLogger)
Cache the BGE m3
- h2o_sonar.utils.caching.cache_detoxify_models(logger: SonarLogger)
Download and cache the Detoxify models.
- h2o_sonar.utils.caching.cache_eval_studio_models(logger: SonarLogger)
Download the Eval Studio models from the S3
- h2o_sonar.utils.caching.cache_gptscore_evaluator_model(logger: SonarLogger)
Cache default model for gptscore evaluator
- h2o_sonar.utils.caching.cache_hkunlp_instructor(logger: SonarLogger)
Cache hkunlp Instructor
- h2o_sonar.utils.caching.cache_lmppl_perplexity_evaluator_model(logger: SonarLogger)
Cache default model for perplexity evaluator
- h2o_sonar.utils.caching.cache_nltk(logger: SonarLogger)
Cache the NLTK models.
Punkt - used in BLEU and perturbations
averaged_perceptron_tagger - used in perturbations
wordnet - used in perturbations
- h2o_sonar.utils.caching.cache_nltk_averaged_perceptron_tagger(logger: SonarLogger | None = None)
- h2o_sonar.utils.caching.cache_nltk_punkt(logger: SonarLogger | None = None)
- h2o_sonar.utils.caching.cache_nltk_wordnet(logger: SonarLogger | None = None)
- h2o_sonar.utils.caching.cache_summac_vitc(logger: SonarLogger)
Cache the summac used for summarization
- h2o_sonar.utils.caching.cache_tiktoken_blobs(logger: SonarLogger)
Cache the TikToken blobs.
- h2o_sonar.utils.caching.cache_vectara_hallucination_model(logger: SonarLogger)
Cache the Vectara hallucination evaluation model.
h2o_sonar.utils.crypto module
h2o_sonar.utils.io module
- h2o_sonar.utils.io.from_list_explainers_args_json(args_str: str) dict
Deserialize
interpret.py::list_explainers()method arguments from JSon string to dictionary which might be used as a Python method kwargs.
- h2o_sonar.utils.io.from_run_interpretation_args_json(args_str: str) dict
Deserialize
interpret.py::run_interpretation()method arguments from JSon string to dictionary which might be used as a Python method kwargs.
- h2o_sonar.utils.io.load_list_explainers_args_json(file_path) dict
Load
list_explainers()keyword arguments from file.
- h2o_sonar.utils.io.load_run_interpretation_args_json(file_path) dict
Load
run_interpretation()keyword arguments from file.
- h2o_sonar.utils.io.to_list_explainers_args_json(experiment_types: list[str] | None = None, explanation_scopes: list[str] | None = None, model_meta: ExplainableModelMeta | None = None, keywords: list[str] | None = None, explainer_filter: list[FilterEntry] | None = None, extra_params: dict | None = None) str
Serialize
interpret.py::list_explainers()method arguments as JSon.
- h2o_sonar.utils.io.to_run_interpretation_args_json(dataset: str = '', model: str = '', target_col: str = '', explainers: list[str | ExplainerToRun] | None = None, explainer_keywords: list[str] | None = None, validset: str = '', testset: str = '', use_raw_features: bool = True, used_features: list | None = None, weight_col: str = '', prediction_col: str = '', drop_cols: list | None = None, sample_num_rows: int | None = None, log_level: int = 30, results_location: str = None, persistence_type: PersistenceType = PersistenceType.file_system, run_asynchronously: bool = False, run_explainers_in_parallel: bool = False, extra_params: dict | None = None) str
Serialize
interpret.py::run_interpretation()job arguments as JSon.
h2o_sonar.utils.normalization module
- h2o_sonar.utils.normalization.normalize_importance(frame: Frame) Frame
Normalize local feature importance values to global as percentage.
- Parameters:
- framedatatable.Frame
Frame with local feature importance values.
- Returns:
- datatable.Frame
Normalized frame with global feature importance values.
h2o_sonar.utils.preprocessing module
- class h2o_sonar.utils.preprocessing.MultiColumnLabelEncoderAbc
Bases:
objectAbstract base class for multi-column label encoders.
- fit(dframe)
Fit label encoder to Pandas columns.
- fit_transform(dframe)
Fit label encoder and return encoded labels.
- inverse_transform(dframe)
Transform labels back to original encoding.
- transform(dframe)
Transform labels to normalized encoding.
- h2o_sonar.utils.preprocessing.categorical_encoder(X: DataFrame) tuple[DataFrame, MultiColumnLabelEncoderAbc, list]
- h2o_sonar.utils.preprocessing.get_multi_column_label_encoder(columns=None) MultiColumnLabelEncoderAbc
h2o_sonar.utils.problem_detection module
- h2o_sonar.utils.problem_detection.get_feature_importance_problems(shap_means_dict: dict[str, Frame], threshold: float, explainer_id: str, explainer_display_name: str) list[ProblemAndAction]
Get feature importance problems and suggested actions based on SHAP values above a specified threshold.
- Parameters:
- shap_means_dictdict[str, datatable.Frame]
A datatable Frame containing Shapley values.
- thresholdfloat
Threshold for showing potential data leakage in the most important feature.
- explainer_idstr
Explainer id.
- explainer_display_name: str
Explainer display name
- Returns:
- list[problems.ProblemAndAction]
A list of problems and actions.
h2o_sonar.utils.perturbations module
- class h2o_sonar.utils.perturbations.AbcSynAntPerturbator
Bases:
ABC- PUNCTUATION = ('.', ',', '?', '!', ':', ';', "'", '"', '(', '[', '{')
- TAGS = ('CD', 'JJ', 'JJR', 'JJS', 'NN', 'NNS', 'RB', 'RBR', 'RBS')
- class h2o_sonar.utils.perturbations.AntonymPerturbator
Bases:
Perturbator,AbcSynAntPerturbatorPerturbator that replaces words with their antonyms.
- class h2o_sonar.utils.perturbations.CommaPerturbator
Bases:
PerturbatorPerturbator that adds a comma after some words. It mimics a common mistake in English writing and/or typos.
- class h2o_sonar.utils.perturbations.ContextualMisinformationPerturbator
Bases:
Perturbator,AbcAgenticPerturbatorContextual misinformation perturbator is agent-based perturbator that introduces factually incorrect information within a seemingly plausible context, aiming to mislead the model into accepting false statements - adversarial attack.
- class h2o_sonar.utils.perturbations.CopyPerturbator
Bases:
PerturbatorPerturbator that performs no perturbation - returns the input text unchanged.
This perturbator is useful as a baseline or control in perturbation experiments, allowing comparison between perturbed and non-perturbed inputs while maintaining consistent processing pipelines.
- class h2o_sonar.utils.perturbations.EncodingPerturbator
Bases:
PerturbatorPerturbator that encodes the prompt to specified encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.
See: https://substack.com/home/post/p-156004330
- TYPE_ANSWER_DECODED = 'answer_decoded'
- TYPE_ANSWER_ENCODED = 'answer_encoded'
- TYPE_PROMPT_DECODED = 'prompt_decoded'
- TYPE_PROMPT_ENCODED = 'prompt_encoded'
- class h2o_sonar.utils.perturbations.EncodingPerturbatorBase16
Bases:
EncodingPerturbatorPerturbator that encodes the prompt using base64 encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.
- class h2o_sonar.utils.perturbations.KeywordTyposCharacterPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.Perturbator
Bases:
ABCBase class for perturbators.
- as_descriptor() PerturbatorDescriptor
- perturb(text: str | list[str], intensity: PerturbationIntensity = PerturbationIntensity.MEDIUM, retries: int = 15, raised_errors: list | None = None, **perturbation_params) str | list[str] | None
Perturb the input text with the given intensity.
- Parameters:
- textstr | list[str]
Text to perturb.
- intensityPerturbationIntensity | str
Perturbation intensity.
- retriesint, optional
Number of retries if the perturbation does not yield a new text.
- raised_errorslist | None
If
None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.
- class h2o_sonar.utils.perturbations.PerturbatorDescriptor(perturbator_id: str, display_name: str = '', description: str = '', keywords: list[str] | None = None)
Bases:
object- clone() PerturbatorDescriptor
- static load(d: dict) PerturbatorDescriptor
- class h2o_sonar.utils.perturbations.PerturbatorRegistry(singleton_create_key)
Bases:
objectRegistry of perturbators.
- are_compatible(perturbators: list[PerturbatorToRun], items: int = 0) list[PerturbatorToRun]
- describe_perturbator(perturbator_id: str) PerturbatorDescriptor | None
- get_perturbator(perturbator_id: str) Perturbator | None
- is_compatible(perturbator_id: str, items: int = 0) bool
Is the perturbator available and compatible given metadata declarations?
- list_perturbators(keywords: list[str] | None = None) list[Perturbator]
List and optionally filter perturbators by keywords - if multiple keywords are provided, the perturbator must have all of them to be included in the result.
- register(perturbator: Perturbator)
- classmethod registry()
- class h2o_sonar.utils.perturbations.QwertyPerturbator
Bases:
PerturbatorPerturbator that replaces ‘y’ with ‘z’ and vice versa.
- class h2o_sonar.utils.perturbations.RandomCharacterDeletePerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.RandomCharacterInsertPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.RandomCharacterReplacementPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.RandomOCRCharacterPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.SynonymPerturbator
Bases:
Perturbator,AbcSynAntPerturbatorPerturbator that replaces words with their synonyms.
- class h2o_sonar.utils.perturbations.WordSwapPerturbator
Bases:
PerturbatorPerturbator that swaps two words in a sentence.
- h2o_sonar.utils.perturbations.register_ootb_perturbators()
Register out-of-the-box perturbators.
h2o_sonar.utils.sampling module
This module provides the following dataset sampling techniques:
StratifiedDatasetSampling: (default) Dataset sampler which implements both stratified and random sampling. The sampler automatically decided which sampling technique to use.- CONS:
stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)
- PROS:
supports stratified (classification models) and random sampling (regression)
makes automatic decision of the sampling method (can be overriden w/ parameter)
random sampling is able to sample dataset bigger than the free RAM size
NoDatasetSampling: Sampler which is used when the user requests NO sampling. In order to avoid OOM/H2O Sonar crash it checks whether the datasets fits in RAM and if it doesn’t then it raises an exception with a request to sample/use a different dataset.RandomPandasDatasetSampling: Dataset sampler which implements random sampling using Pandas.- CONS:
dataset must fit in free RAM (2x)
sampler does not support the stratification
- PROS:
enables the use of Pandas sampler seamlessly in the H2O Sonar runtime
HeadOfDatasetSampling: Sampler which does not sample, but returnssampling_limitnumber of rows from the head of the dataset.- CONS:
sampled dataset will be very likely biased (should not be used in production)
- PROS:
fast
handles dataset of any size
can be used for splitting and non-functional testing
- class h2o_sonar.utils.sampling.DatasetSampler(system_limit: int = 1000000000)
Bases:
ABCThe sampler children implementations various dataset sampling techniques.
H2O Sonar container samples the dataset upfront (based on the interpretation parameters) in order to protect the process/runtime (from the crash), the system (from OOO and extensive used of resources) and explainers from failures.
- DEFAULT_CAT_NUM_THRESHOLD = 50
- H2O_SONAR_LIMIT = 25000
- SYSTEM_LIMIT = 1000000000
- static is_dataset_fit_in_memory(dataset_path: str | Path)
Check whether the dataset file would fit to free RAM and return sizes.
- Parameters:
- dataset_pathstr
Dataset path.
- Returns:
- Tuple[bool, int, int]
Return whether the dataset will fit, dataset size in bytes and RAM size in bytes.
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = 0, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: list | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: datatable.Frame | str | pathlib.Path
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitint | None = None,
If
None, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedatasettosampling_limitnumber of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None, then automatically choose stratified or random sampling. IfTrue, then force stratified sampling. IfFalse, then force random sampling.- drop_nan_rowsbool
Trueto drop rows with “not a number” value in thetarget_colcolumn in case of classification-friendly techniques.- drop_1_classesbool
Trueto drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_colcolumn in case of classification-friendly techniques.- classeslist | None = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classesvalues are expected to be a subset of the target column classes.- sampled_dataset_pathdatatable.Frame | str | pathlib.Path
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- datatable.Frame | str
Path to the sampled dataset (if the path to
sampled_dataset_pathhas been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.HeadOfDatasetSampling(chunk_size: int = 1000000)
Bases:
DatasetSamplerSampler which does not sample, but returns sampling limit number of examples from the head of the dataset.
PRESUMPTIONS:
sampled dataset will fit into free RAM
CONS:
it is NOT correct for the data science perspective and should NOT be used as it does not guarantee anything - the sampled dataset will very likely be biased i.e. may have completely different characteristics and statistics than the original dataset
PROS:
it can sample dataset of any size, therefore enables H2O Sonar to run on the dataset of any size - in case that the data science aspect is not a problem, this sampler might be a good choice
it is relatively fast in comparison to other samplers
it is ideal for non-functional testing
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: list | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: datatable.Frame | str | pathlib.Path
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitint | None = None,
If
None, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedatasettosampling_limitnumber of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None, then automatically choose stratified or random sampling. IfTrue, then force stratified sampling. IfFalse, then force random sampling.- drop_nan_rowsbool
Trueto drop rows with “not a number” value in thetarget_colcolumn in case of classification-friendly techniques.- drop_1_classesbool
Trueto drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_colcolumn in case of classification-friendly techniques.- classeslist | None = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classesvalues are expected to be a subset of the target column classes.- sampled_dataset_pathdatatable.Frame | str | pathlib.Path
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- datatable.Frame | str
Path to the sampled dataset (if the path to
sampled_dataset_pathhas been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.NoDatasetSampling(check_ram: bool = True)
Bases:
DatasetSamplerSampler which does NO sampling and can check whether the dataset would fit in RAM and thus avoid H2O Sonar OOM crash. Used as default sampling method.
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: list | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: datatable.Frame | str | pathlib.Path
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitint | None = None,
If
None, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedatasettosampling_limitnumber of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None, then automatically choose stratified or random sampling. IfTrue, then force stratified sampling. IfFalse, then force random sampling.- drop_nan_rowsbool
Trueto drop rows with “not a number” value in thetarget_colcolumn in case of classification-friendly techniques.- drop_1_classesbool
Trueto drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_colcolumn in case of classification-friendly techniques.- classeslist | None = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classesvalues are expected to be a subset of the target column classes.- sampled_dataset_pathdatatable.Frame | str | pathlib.Path
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- datatable.Frame | str
Path to the sampled dataset (if the path to
sampled_dataset_pathhas been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.RandomPandasDatasetSampling(logger=None)
Bases:
DatasetSampler- Dataset sampler which implements random sampling using Pandas
pandas.DataFrame.sample().
CONS:
dataset must fit in free RAM (2x)
sampler does not support the stratification
PROS:
enables the use of Pandas sampler seamlessly in the H2O Sonar runtime
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: list | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: datatable.Frame | str | pathlib.Path
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitint | None = None,
If
None, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedatasettosampling_limitnumber of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None, then automatically choose stratified or random sampling. IfTrue, then force stratified sampling. IfFalse, then force random sampling.- drop_nan_rowsbool
Trueto drop rows with “not a number” value in thetarget_colcolumn in case of classification-friendly techniques.- drop_1_classesbool
Trueto drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_colcolumn in case of classification-friendly techniques.- classeslist | None = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classesvalues are expected to be a subset of the target column classes.- sampled_dataset_pathdatatable.Frame | str | pathlib.Path
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- datatable.Frame | str
Path to the sampled dataset (if the path to
sampled_dataset_pathhas been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.StratifiedDatasetSampling
Bases:
DatasetSamplerDataset sampler which implements both stratified and random sampling.
CONS:
stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)
PROS:
supports stratified (classification models) and random sampling (regression)
makes automatic decision of the sampling method (can be overriden w/ parameter)
random sampling is able to sample dataset bigger than the free RAM size
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: list | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: datatable.Frame | str | pathlib.Path
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitint | None = None,
If
None, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedatasettosampling_limitnumber of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None, then automatically choose stratified or random sampling. IfTrue, then force stratified sampling. IfFalse, then force random sampling.- drop_nan_rowsbool
Trueto drop rows with “not a number” value in thetarget_colcolumn in case of classification-friendly techniques.- drop_1_classesbool
Trueto drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_colcolumn in case of classification-friendly techniques.- classeslist | None = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classesvalues are expected to be a subset of the target column classes.- sampled_dataset_pathdatatable.Frame | str | pathlib.Path
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- datatable.Frame | str
Path to the sampled dataset (if the path to
sampled_dataset_pathhas been specified), datatable Frame reference otherwise.
- h2o_sonar.utils.sampling.downsample_dataset(dataset, sample_size: int | None = None, runtime_sample_size: int | None = None, target_col: str = '', is_classification: bool = False, classes: list | None = None, seed: int = 42, logger=None)
Dataset sampling method used by the explainers in Driverless AI (and potentially other container runtimes) to sample the input dataset according to their needs.
This method is not used by the local explainer as it samples the input dataset upfront to protect all the explainers. This is why this method serves as identity - it ensures that H2O Sonar’s sampling will not impact Driverless AI and other host runtimes.
- Parameters:
- dataset
Dataset to be sampled.
- sample_sizeint | None
Sampling limit to use.
- runtime_sample_sizeint | None
Runtime protection - sample dataset to this size even if
sample_sizeis bigger to protect the runtime and avoid space (memory) / time overloading.- target_colstr
Target column to be used for the sampling.
- is_classificationbool
Sample for regression (
False) or classification (True).- classeslist | None
List of classes in case of sampling of classification model dataset.
- seedint
Sampling seed.
- logger
Logger.
- Returns
- ——-
- Any
Sample dataset.
h2o_sonar.utils.sanitization module
- class h2o_sonar.utils.sanitization.DriverlessAiSanitizationMap(raw_names: list[str], sanitized_names: list[str])
Bases:
SanitizationMapDriverless AI model sanitization map.
Driverless AI (auto ML) model provides its own sanitization map. The purpose of this class is to make Driverless AI sanitization available vis standard
SanitizationMapinterface.
- class h2o_sonar.utils.sanitization.SanitizationMap(raw_names: list[str], sanitized_names: list[str])
Bases:
objectMap of original (raw) dataset column names/features to sanitized names and vice versa.
- static sanitize_value(values: str | list[str], special_chars: str = '|,=[]<\t\r\n:.~') str | list[str]
Method for feature values (labels, classes) sanitization. Note that column/feature name sanitization (handled by map) typically has different requirements than value sanitization. Also note that value sanitization is one way (original to sanitized only) and potentially may have collisions if sanitized in multiple calls to this method (collisions within one call of this function are resolved).
- h2o_sonar.utils.sanitization.sanitize_frame(frame, sanitization_map: SanitizationMap | None = None) SanitizationMap | None
- h2o_sonar.utils.sanitization.sanitize_markdown(md_fragment: str) str
The purpose of this function is to sanitize a Markdown fragment string. It is NOT meant to sanitize whole Markdown documents, but its fragments where string (to be stored in Markdown) would interact with other Markdown elements.
- Parameters:
- md_fragmentstr
A Markdown fragment string.
- Returns:
- str
Sanitized Markdown string fragment without dangerous characters and links.
- h2o_sonar.utils.sanitization.sanitize_names(names: str | list[str], sanitization_map: SanitizationMap | None = None) SanitizationMap | None
Sanitize column/feature name(s) either using (model’s) sanitization map (if available) or using universal sanitization method.
- Parameters:
- namesstr | list[str]
Name(s) to be sanitized.
- sanitization_mapSanitizationMap | None
Optional sanitization map.
- h2o_sonar.utils.sanitization.sanitize_strings(strings: str | list[str], replace_with: str = '_', special_chars: str = '|,=[]<\t\r\n:.~')
Sanitize a string or a list of strings.
- Parameters:
- stringsstr | list[str]
Strings to be sanitized.
- replace_withstr
Character to be used for replacement for characters to be forbidden.
- special_charsstr
Optional special characters to be sanitized.
- Returns:
- str | list[str]
Sanitized strings.
h2o_sonar.utils.testing module
H2O Sonar LLM / RAG testing utilities:
Raw test data: a dataset which was used to create the test configuration(s).
Test suite: a collection of tests (see below).
Test: a collection of documents (corpus) along with the test cases (see below) to be run in the context of the corpus.
Test case: a prompt, expected output (ground truth), categories, output condition, output constraints, … and other parameters to be used for a RAG / LLM model evaluation.
Test lab: a set of resolved tests enriched with answers (actual answer), retrieval context, response duration and other data obtained from the conversation with a RAG / LLM.
Resolved test lab is exported to LLM dataset which is then used as input to an evaluation - evaluation runs a set of evaluators to rank RAG / LLM models.
- class h2o_sonar.utils.testing.InMemoryLlmHostPromptCache
Bases:
LlmHostPromptCacheIn-memory LLM host client cache:
- initialization:
(pre-built) cache can be loaded from a JSon file
- hints:
cache can be saved and loaded from a JSon file
pre-built cache can be created from a test lab (not implemented by this class)
when used in the testing environment, cloud deployment, … pre-built cache can be synchronized/downloaded from S3, filesystem, …
- KEY_DATA = 'cache_data'
- add_test_lab(test_lab: RagTestLab)
Add the test lab to the cache.
- clear()
Clear the cache.
- get(key: str) dict | None
Get the cached value for the given key.
Returned dictionary might be passed to a result class with types.
- get_llm_model_names(explainable_model_type: ExplainableModelType) list[str]
List all the LLM model names known to the cache.
- class h2o_sonar.utils.testing.LlmHostPromptCache
Bases:
ABCPrompt cache for the LLM host clients:
caches:
answer(s) (actual answer, duration, cost, chunks, …) for given prompt(s)
NOT caches:
corpus documents synchronization
RAG host server collection creation
LLM models listing
cache key:
does NOT consider particular host (like h2ogpte.h2o.ai), but rather the LLM host type ~ connection type (like H2O_GPT_E or OPENAI_RAG)
does NOT consider particular chunk retrieval method
DOES consider corpus documents (empty for non-RAG), prompt, LLM model name, required context chunks (via chunk retrieval method - none or a method), …
DOES consider context (empty for RAG)
implementations (options):
in-memory cache (testing)
filesystem cache (pre-build JSON files)
Redis cache (shared by EvalStudio workers)
memcached cache (shared by EvalStudio workers)
…
utilities:
cache key generation
cache key hashing
static cache builder from serialized test labs (JSon)
purpose:
NON production use - for testing / demos / conference hands-on sessions only
significantly speed up the test lab completion
avoid test lab build failures due to unstable/slow/fragile system under test (like h2oGPTe server)
save costs (e.g. OpenAI server costs)
- KEY_ACTUAL_OUTPUT = 'actual_output'
- KEY_CONTEXT = 'context'
- KEY_CORPUS = 'corpus'
- KEY_COST = 'cost'
- KEY_DURATION = 'actual_duration'
- KEY_EXTRAS = 'extras'
- KEY_INPUT = 'input'
- KEY_LLM_MODEL_NAME = 'llm_model_name'
- KEY_MODEL_TYPE = 'model_type'
- PREFIX_KEY = 'CACHE-KEY::'
- abstractmethod clear()
Clear the cache.
- abstractmethod get(key: str) dict | None
Get the cached value for the given key.
Returned dictionary might be passed to a result class with types.
- static get_key(explainable_model_type: ExplainableModelType, prompt: str, llm_model_name: str, corpus: list[str] | None = None, extras: str = '') str
Generate cache key for the LLM host client cache:
does NOT consider particular host (like h2ogpte.h2o.ai)
does NOT consider RAG collection
does NOT consider chunk retrieval method
suitable for both RAG hosts (empty corpus, no context) and LLM hosts
- Parameters:
- explainable_model_typemodels.ExplainableModelType
Explainable model type.
- promptstr
Prompt for which the answer is to be cached.
- llm_model_namestr
LLM model name whose answer is to be cached.
- corpuslist[str] | None
Corpus documents - instead of relying on the collection (ID and name which may differ) corpus information is used.
- extrasstr
Extra information - any other parameters which may make the cache key unique.
- Returns:
- str
Cache key.
- abstractmethod get_llm_model_names(explainable_model_type: ExplainableModelType) list[str]
List all the LLM model names known to the cache.
- class h2o_sonar.utils.testing.RagTestCaseConfig(prompt: str, categories: str | list[str] = '', relationships=None, constraints=None, condition='', expected_output: str = '', config: RagTestConfig | None = None, key: str = '')
Bases:
objectRAG / LLM test case configuration:
prompt
expected output
categories
condition (string expression)
constraints (any JSON serializable object)
…
- KEY_CATEGORIES = 'categories'
- KEY_CONDITION = 'condition'
- KEY_CONSTRAINTS = 'constraints'
- KEY_EXPECTED_OUTPUT = 'expected_output'
- KEY_KEY = 'key'
- KEY_PROMPT = 'prompt'
- KEY_RELS = 'relationships'
- perturb(perturbators: list[PerturbatorToRun], in_place: bool = True, raised_errors: list | None = None)
Perturb the prompt.
- Parameters:
- perturbatorslist[commons.PerturbatorToRun]
Perturbators to run - includes the perturbator ID, intensity, and parameters.
- in_placebool
If True, perturb the prompt in place, otherwise create a new perturbed test case.
- raised_errorslist | None
If
None, then raise error(s) if the perturbator(s) fail(s), otherwise do not raise exceptions and store them in the (empty) list provided by the caller.
- to_dict()
- class h2o_sonar.utils.testing.RagTestConfig(documents: list[str | Path], categories: list[str] | None = None, key: str = '')
Bases:
objectRAG / LLM test configuration:
corpus … a set of documents (empty for LLM evaluation)
test cases … a set of prompts, expected outputs, categories, conditions, …
- KEY_CATS = 'categories'
- KEY_DOCUMENTS = 'documents'
- KEY_KEY = 'key'
- static from_dict(key: str, as_dict: dict) RagTestConfig
- class h2o_sonar.utils.testing.RagTestLab(llm_host_connection: ConnectionConfig, raw_dataset: LlmDataset, evaluated_models: list[ExplainableRagModel | ExplainableLlmModel] | None = None, llm_model_names: list[str] | None = None, docs_cache_dir: str | Path = '', results_location: str | Path = '', name: str = 'TestLab', description: str = 'Test lab for RAG / LLM evaluation.', llm_host_prompt_cache: LlmHostPromptCache | None = None, use_evaluated_model_collection_id: bool = False, user_name: str = 'h2o-sonar', logger=None)
Bases:
TestLab,TestLabPersistenceRAG test lab:
TestLab is expected to test multiple LLMs either hosted by one service (like OpenAI) or by RAG (Retrieval-augmented generation) product (like h2oGPTe) or by LLM host product (like h2oGPT).
TestLab gets connection configuration to the host system.
TestLab can compare / benchmark multiple LLM models from the same host system.
Resolved test labs can be merged to get an aggregated lab -> LLM dataset with multiple LLM hosts for the side-by-side evaluation by the
evaluatemodule.
- bind(collection_id: str, collection_name: str, corpus: list | None = None)
Bind ALL the test lab RAG models to collection(s) instead of building it by creating collections and uploading documents.
- build(doc_sync_meta: dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, sync_documents: bool = True, fail_on_missing_corpus: bool = False)
Build the test lab so that it can be used for the evaluation:
synchronize the document cache
create RAG’s document collections
upload documents (corpora) to collection(s).
- Parameters:
- doc_sync_meta: dict[str, Any]
Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like
headers. Example:"http://example.com/doc1.txt": { "headers": { "foo-header": "FOO-VALUE", } }
- progress_callbackprogress.AbstractProgressCallbackContext | None
Optional progress callback context.
- sync_documentsbool
Sync documents from the network/filesystem to the lab’s document cache.
- fail_on_missing_corpusbool
Fail if the test does not specify any corpus.
- complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: Path | str | None = None, parallelize: int = 0, multi_turn: bool = False, retry_on_error: int = 2, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, include_llm_meta: bool = True, raise_on_all_tcs_fail: bool = True, artifacts_base_dir: Path | str | None = None, purge_workdir: bool = True)
Complete the dataset with the actual values from an LLM host.
- Parameters:
- complete_contextint
How many context text chunks to include in the resolved dataset.
- progress_callbackprogress.AbstractProgressCallbackContext | None
Optional progress callback context.
- save_as_you_gopathlib.Path | str | None
Save the dataset as JSON after each input is resolved.
- parallelizeint
Complete the dataset in parallel using multiple processes. Use
-1for auto-choice of the number of workers,0to disable parallelization (will create the lab using sequential requests), and1+ (positive integer) to specify the number of workers.- multi_turnbool
Whether to use multi-turn chat with the LLM host - if enabled, then all test cases within the test will be handled within the single session i.e. the same chat session i.e. the same context.
- retry_on_errorint
How many times to retry the failed LLM host requests.
- timeout_exp_backoffTimeoutRetryExpBackoffCtx | None
Optionally override timeout which can be specified in the model host configuration ExplainableRagModel::model_cfg and which is model host type specific and use exponential backoff strategy for the timeout handling. Timeout is increased on each retry by the backoff factor.
- include_llm_metabool
Whether to include the LLM meta-data like performance statistics.
- raise_on_all_tcs_failbool
Raise an exception if all test cases fail.
- artifacts_base_dirpathlib.Path | str | None
Base directory for storing completion artifacts. If not specified, a new artifacts directory will be created under the h2o-sonar/ base directory if available. The directory is created on the first save of an artifact.
- purge_workdirbool
Purge the working directory with lab shards after the completion.
- complete_from_shards(execution_dir_path: str | Path)
Complete the test lab from the shards stored on the filesystem. This method is used to load previously completed test lab shards and merge them into a single resolved dataset.
- static from_eval_results(eval_results_path: str | Path, interpretation_json_path: str | Path, raw_dataset_empty: bool = True)
Create a test lab from the evaluation results archive.
- Parameters:
- eval_results_pathstr | pathlib.Path
Path to the evaluation results JSon file path.
- interpretation_json_pathstr | pathlib.Path
Path to the interpretation JSon file path.
- raw_dataset_emptybool
Whether to create an empty raw dataset or copy resolved dataset
- static from_llm_test_suite(llm_host_connection: ConnectionConfig, llm_test_suite: RagTestSuiteConfig, llm_model_type: ExplainableModelType, llm_model_names: list[str], results_location: str | Path = '', work_dir: str | Path = '', llm_models_cfgs: dict[str, list[dict]] = None, llm_host_prompt_cache: LlmHostPromptCache | None = None, user_name: str = 'h2o-sonar') RagTestLab
Create new (unresolved) test lab from the LLM test suite configuration.
- static from_rag_test_suite(rag_connection: ConnectionConfig, rag_test_suite: RagTestSuiteConfig, rag_model_type: ExplainableModelType, llm_model_names: list[str], docs_cache_dir: str | Path = '', results_location: str | Path = '', rag_models_cfgs: dict[str, list[dict]] = None, predefined_collection_id: str | dict | None = None, predefined_collection_name: str = '', llm_host_prompt_cache: LlmHostPromptCache | None = None, user_name: str = 'h2o-sonar') RagTestLab
Create new (unresolved) test lab from the RAG test suite configuration.
Test lab is build as follows:
all LLM model names are hosted by the SAME system described by the RAG connection, accessed by a client
LLM model name may have associated a list of custom client configurations
RAG test suite is used to build the test lab:
RAG test suite has test cases that are grouped to tests
test cases within the test has the SAME corpus, different tests may have different corpora
explainable RAG model is…
created for EACH: LLM model name + config + corpus (not test)
carthesian product of: LLM model names x client configurations x corpora = explainable RAG models
Summary of the explainable RAG models creation:
for each LLM model name
for each client configuration of that LLM model name
for each test
create explainable RAG model
- Parameters:
- rag_connectionh2o_sonar_config.ConnectionConfig
Connection to the RAG system.
- rag_test_suiteRagTestSuiteConfig
RAG test suite configuration.
- rag_model_typeExplainableModelTypes
Type of the explainable model hosted by the RAG system.
- llm_model_nameslist[str]
List of LLM model names to be used to build the test lab and to be subsequently evaluated and compared. There are the following special names which can be used with h2oGPTe model host: -
auto: to use the best available model chosen by h2oGPTe -"": empty string to inherit configuration from the h2oGPTe collection -None: to inherit configuration from the h2oGPTe collection- rag_models_cfgsdict[str, list[dict]]
Dictionary with LLM model name as key and list of client configurations as values. Each client configuration is a dictionary with the client configuration parameters which can be created by the client factory using
client.config_factory().- docs_cache_dirstr | pathlib.Path
Directory to store the documents cache.
- results_locationstr | pathlib.Path
Base user directory to store the completion artifacts - like files or exported metadata - which should be subsequently used for the evaluation. Consider for example agentic run and metadata, PDF and Python files created by the agent. If not specified, test lab will not store any evaluation artifacts on completion.
- predefined_collection_idstr | dict | None
Predefined collection ID for the RAG model. If provided as a string, it is used as the collection ID for all the test cases. If provided as a dictionary, it is used as a mapping of the test case keys to the collection IDs.
- predefined_collection_namestr
Predefined collection name to be used when creating the RAG model. If collection ID is specified, then it is NOT used to look up the existing collection on lab build/completion.
- llm_host_prompt_cacheLlmHostPromptCache | None
Cache for the LLM host client.
- user_namestr
Username to be used to build the test lab.
- Returns:
- RagTestLab
New RAG test lab.
- insight_internal_llm_errors(report_dir: str | Path = '', src: str = 'stats') tuple[dict, str]
Create Markdown report with the internal LLM errors.
- Parameters:
- report_dirstr | pathlib.Path
Directory to save the reports as JSon and Markdown to.
- srcstr
Source of the errors:
stats(default) ordataset(text of answers analysis).
- integrity_check()
- static load_from_json(llm_host_connection: ConnectionConfig, file_path: str | Path, docs_cache_dir: str | Path = '', datatable_format: bool = False) RagTestLab
- merge(other_test_lab: RagTestLab, other_llm_prefix: str = 'Other')
Merge another test lab into this one.
- purge()
Purge the test lab by deleting all the created collections/assistants and uploaded documents.
- split_to_shards(base_dir: Path, max_total_workers: int = 20) dict
Split the test lab into shards by RAG model (which is identified by corpus and base LLM model name). If there is one RAG model (or just a few), then even the inputs of particular model are split into shards. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.
Sharding strategy:
- 1 RAG model:
split the inputs of the model for max 20 workers (split to 20 shards)
the minimum number of inputs per worker is 2 (consider process overhead)
- >1 RAG model:
if the number of models is GREATER than 10, split the inputs by the RAG model i.e. the number of needed workers is equal to the number of RAG models
if the number of models is SMALLER or equal to 10, then use up to 20 workers to split the inputs
- Parameters:
- base_dirpathlib.Path
Base directory where to store the shards - JSon representation of test labs.
- max_total_workersint
The number of workers which is used to split the inputs of the SINGLE model (or lab with just a few models).
- split_to_shards_by_model(base_dir: Path) dict
Split the test lab into shards by RAG model - which is identified by corpus and base LLM model name. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.
- sync_documents(doc_sync_meta: dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, fail_on_missing_corpus: bool = False) Path
Cache test suite documents from the network to the local filesystem so that they can be used for RAG evaluation later.
- Parameters:
- doc_sync_meta: dict[str, Any]
Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like
headers. Example:"http://example.com/doc1.txt": { "headers": { "foo-header": "FOO-VALUE", } }
- progress_callbackprogress.AbstractProgressCallbackContext | None
Optional progress callback context.
- fail_on_missing_corpusbool
Fail if a test has empty corpus, or create dummy document to enable empty RAG corpora
- trim(max_llm_models_count=None)
Trim the test lab by keeping only specified number of LLM models and removing all the orphans.
- class h2o_sonar.utils.testing.RagTestLabPromptCache(singleton_create_key)
Bases:
objectRAG test lab prompt cache (singleton) to be used across H2O Sonar.
- MAX_ITEMS = 5000
- classmethod cache()
- reinitialize(enable_cache: bool | None = None, src_path: Path | str | None = None, src_host_connection: ConnectionConfig | None = None, max_items: int | None = None)
- class h2o_sonar.utils.testing.RagTestSuiteConfig(test_cases: list[RagTestCaseConfig] | None = None, name: str = 'TestSuite', description: str = 'Test suite for RAG / LLM evaluation.', categories: list[str] | None = None)
Bases:
objectRAG / LLM test suite configuration:
test suite (RagTestSuiteConfig) … a set of tests
tests (RagTestConfig) … corpus with a set of test cases
test cases (RagTestCaseConfig) … prompt, expected output, categories, conditions, …
- KEY_CATS = 'categories'
- KEY_DESCRIPTION = 'description'
- KEY_NAME = 'name'
- KEY_TESTS = 'tests'
- KEY_TEST_CASES = 'test_cases'
- add_test_case(test_case: RagTestCaseConfig)
- copy() RagTestSuiteConfig
- static from_llm_dataset(llm_dataset: LlmDataset) RagTestSuiteConfig
Create RAG test configuration from the LLM dataset.
- perturb(perturbators: list[PerturbatorToRun], in_place: bool = True, raised_errors: list | None = None)
Perturb the test suite prompts.
- Parameters:
- perturbatorslist[commons.PerturbatorToRun]
Perturbators to run - includes the perturbator ID, intensity, and parameters.
- in_placebool
If True, perturb the test cases in place - there will be the same number of tests and test cases within the test suite Otherwise keep the original test cases and create new perturbed test cases - there will be 2x more test cases in the test suite after the perturbation (all intermediary perturbations in case of multiple perturbator IDs are discarded).
- raised_errorslist | None
If
None, then raise error(s) if the perturbator(s) fail(s), otherwise do not raise exceptions and store them in the (empty) list provided by the caller.
- split(max_tests: int) list[RagTestSuiteConfig]
Split the test suite to multiple test suites so that each test suite has at most the given number of tests.
- Parameters:
- max_testsint
Maximum number of tests in a test suite.
- Returns:
- list[RagTestSuiteConfig]
List of new test suites.
- property tests: list[RagTestConfig]
- class h2o_sonar.utils.testing.TestLab
Bases:
ABCA test target / product test lab.
- KEY_BASE_MODEL_NAMES = 'llm_model_names'
- KEY_DATASET = 'dataset'
- KEY_DESCRIPTION = 'description'
- KEY_DOCS_CACHE = 'docs_cache'
- KEY_MODELS = 'models'
- KEY_NAME = 'name'
- KEY_RAW_DATASET = 'raw_dataset'
- PARALLEL_RUN = -1
- SEQUENTIAL_RUN = 0
- build()
Build / deploy / materialize the test lab on the host system e.g. by creating RAG’s document collections, uploading documents to the collection, …
- complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: Path | str | None = None, parallelize: int = 0, retry_on_error: int = 2, purge_workdir: bool = True)
Complete the LLM dataset with the actual values from the host system.
- Parameters:
- complete_contextint
How many context text chunks to include in the resolved dataset.
- progress_callbackprogress.AbstractProgressCallbackContext | None
Optional progress callback context.
- save_as_you_gopathlib.Path | str | None
Save the dataset as JSON after each input is resolved.
- parallelizeint
Complete the dataset in parallel using multiple processes. Use
-1for auto-choice of the number of workers,0to disable parallelization (will create the lab using sequential requests), and1+ (positive integer) to specify the number of workers.- retry_on_errorint
How many times to retry the failed LLM host requests.
- purge_workdirbool
Purge the working directory with lab shards after the completion.
- class h2o_sonar.utils.testing.TestLabPersistence
Bases:
object- DIR_CHAT_MSG = 'chat_message_'
- DIR_CHAT_SESSION = 'chat_session_'
- DIR_COMPLETION_OF = 'completion_of_'
- DIR_TEST_LAB = 'test_lab_'