h2o_sonar.utils package

Submodules

h2o_sonar.utils.binning module

h2o_sonar.utils.binning.build_qtile_bins(bins: List, X: DataFrame, feature: str, quantile: int)

Build quantile bins and append back to input bins list.

Parameters:

binsList: List of bins.
Xpandas.DataFrame: Input frame to PD/ICE.
featurestr: Feature to create quantile bins for.
quantileint: The decile to compute.

h2o_sonar.utils.binning.qbin_column(frame: Frame, column: str, logger)

Quantile bin a column in a frame and substitute it in that frame with quantile group ranges for each row.

Parameters:

framedatatable.Frame: Frame containing the data. One of the column names must correspond to the column parameter.
columnstr: Name of the column to be checked.
loggerLogger: Logger.

h2o_sonar.utils.binning.quantile_bin(frame: Frame = None, qbin_cols: List[str] | None = None, qbin_count: int = 0, varimp_list: List[str] | None = None, logger=None)

Quantile binning.

Parameters:

framedt.Frame: Input frame for quantile binning.
qbin_colsList: Column(s) to use for quantile binning
qbin_countint: Number of top numeric variables to use from model’s variable importance list.
varimp_listList: Variable importance list from model.
loggerLogger: Logger.

Returns:

Tuple[list, Pandas Dataframe]: List of columns that were binned and Dataframe with quantile binned columns.

h2o_sonar.utils.caching module

Caching module provides functionality to download and cached models used for evaluation upfront, to avoid downloading the models in the runtime.

h2o_sonar.utils.caching.cache_all_models(logger: SonarLogger): Cache all the models used in the Sonar package.

h2o_sonar.utils.caching.cache_baai_bge_small_en(logger: SonarLogger): Cache the BAAI BGE small en

h2o_sonar.utils.caching.cache_baai_bge_small_env15(logger: SonarLogger): Cache the BAAI BGE small environment v1.5 model.

h2o_sonar.utils.caching.cache_bert_base_uncased(logger: SonarLogger): Cache the BERT base uncased model.

h2o_sonar.utils.caching.cache_bge_m3(logger: SonarLogger): Cache the BGE m3

h2o_sonar.utils.caching.cache_detoxify_models(logger: SonarLogger): Download and cache the Detoxify models.

h2o_sonar.utils.caching.cache_eval_studio_models(logger: SonarLogger): Download the Eval Studio models from the S3

h2o_sonar.utils.caching.cache_gptscore_evaluator_model(logger: SonarLogger): Cache default model for gptscore evaluator

h2o_sonar.utils.caching.cache_hkunlp_instructor(logger: SonarLogger): Cache hkunlp Instructor

h2o_sonar.utils.caching.cache_lmppl_perplexity_evaluator_model(logger: SonarLogger): Cache default model for perplexity evaluator

h2o_sonar.utils.caching.cache_nltk(logger: SonarLogger)

Cache the NLTK models.

Punkt - used in BLEU and perturbations
averaged_perceptron_tagger - used in perturbations
wordnet - used in perturbations

h2o_sonar.utils.caching.cache_nltk_averaged_perceptron_tagger(logger: SonarLogger | None = None)

h2o_sonar.utils.caching.cache_nltk_punkt(logger: SonarLogger | None = None)

h2o_sonar.utils.caching.cache_nltk_wordnet(logger: SonarLogger | None = None)

h2o_sonar.utils.caching.cache_summac_vitc(logger: SonarLogger): Cache the summac used for summarization

h2o_sonar.utils.caching.cache_tiktoken_blobs(logger: SonarLogger): Cache the TikToken blobs.

h2o_sonar.utils.caching.cache_vectara_hallucination_model(logger: SonarLogger): Cache the Vectara hallucination evaluation model.

h2o_sonar.utils.crypto module

h2o_sonar.utils.crypto.decrypt(encryption_key: str, data: str) → str

h2o_sonar.utils.crypto.encrypt(encryption_key: str, data: str) → str

h2o_sonar.utils.crypto.resolve_encryption_key(encryption_key: str = '') → str

h2o_sonar.utils.io module

h2o_sonar.utils.io.from_list_explainers_args_json(args_str: str) → Dict: Deserialize interpret.py::list_explainers() method arguments from JSon string to dictionary which might be used as a Python method kwargs.

h2o_sonar.utils.io.from_run_interpretation_args_json(args_str: str) → Dict: Deserialize interpret.py::run_interpretation() method arguments from JSon string to dictionary which might be used as a Python method kwargs.

h2o_sonar.utils.io.load_list_explainers_args_json(file_path) → Dict: Load list_explainers() keyword arguments from file.

h2o_sonar.utils.io.load_run_interpretation_args_json(file_path) → Dict: Load run_interpretation() keyword arguments from file.

h2o_sonar.utils.io.to_list_explainers_args_json(experiment_types: List[str] | None = None, explanation_scopes: List[str] | None = None, model_meta: ExplainableModelMeta | None = None, keywords: List[str] | None = None, explainer_filter: List[FilterEntry] | None = None, extra_params: Dict | None = None) → str: Serialize interpret.py::list_explainers() method arguments as JSon.

h2o_sonar.utils.io.to_run_interpretation_args_json(dataset: str = '', model: str = '', target_col: str = '', explainers: List[str | ExplainerToRun] | None = None, explainer_keywords: List[str] | None = None, validset: str = '', testset: str = '', use_raw_features: bool = True, used_features: List | None = None, weight_col: str = '', prediction_col: str = '', drop_cols: List | None = None, sample_num_rows: int | None = None, log_level: int = 30, results_location: str = None, persistence_type: PersistenceType = PersistenceType.file_system, run_asynchronously: bool = False, run_explainers_in_parallel: bool = False, extra_params: Dict | None = None) → str: Serialize interpret.py::run_interpretation() job arguments as JSon.

h2o_sonar.utils.normalization module

h2o_sonar.utils.normalization.normalize_importance(frame: Frame) → Frame

Normalize local feature importance values to global as percentage.

Parameters:

framedatatable.Frame: Frame with local feature importance values.

Returns:

datatable.Frame: Normalized frame with global feature importance values.

h2o_sonar.utils.preprocessing module

class h2o_sonar.utils.preprocessing.MultiColumnLabelEncoder(columns=None)

Bases: LabelEncoder

Wraps sklearn LabelEncoder functionality for use on multiple columns of a Pandas DataFrame.

fit(dframe): Fit label encoder to Pandas columns. Access individual column classes via indexing self.all_classes_. Access individual column encoders via indexing self.all_encoders_

fit_transform(dframe): Fit label encoder and return encoded labels. Access individual column classes via indexing self.all_classes_ Access individual column encoders via indexing self.all_encoders_ Access individual column encoded labels via indexing self.all_labels_

inverse_transform(dframe): Transform labels back to original encoding.

set_fit_request(*, dframe: bool | None | str = '$UNCHANGED$') → MultiColumnLabelEncoder

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to fit.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for dframe parameter in fit.

Returns:

selfobject: The updated object.

set_inverse_transform_request(*, dframe: bool | None | str = '$UNCHANGED$') → MultiColumnLabelEncoder

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for dframe parameter in inverse_transform.

Returns:

selfobject: The updated object.

set_transform_request(*, dframe: bool | None | str = '$UNCHANGED$') → MultiColumnLabelEncoder

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to transform.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED: Metadata routing for dframe parameter in transform.

Returns:

selfobject: The updated object.

transform(dframe): Transform labels to normalized encoding.

h2o_sonar.utils.preprocessing.categorical_encoder(X: DataFrame) → Tuple[DataFrame, MultiColumnLabelEncoder, List]

h2o_sonar.utils.problem_detection module

h2o_sonar.utils.problem_detection.get_feature_importance_problems(shap_means_dict: Dict[str, Frame], threshold: float, explainer_id: str, explainer_display_name: str) → List[ProblemAndAction]

Get feature importance problems and suggested actions based on SHAP values above a specified threshold.

Parameters:

shap_means_dictDict[str, datatable.Frame]: A datatable Frame containing Shapley values.
thresholdfloat: Threshold for showing potential data leakage in the most important feature.
explainer_idstr: Explainer id.
explainer_display_name: str: Explainer display name

Returns:

List[problems.ProblemAndAction]: A list of problems and actions.

h2o_sonar.utils.perturbations module

class h2o_sonar.utils.perturbations.AbcSynAntPerturbator

Bases: ABC

PUNCTUATION = ('.', ',', '?', '!', ':', ';', "'", '"', '(', '[', '{')

TAGS = ('CD', 'JJ', 'JJR', 'JJS', 'NN', 'NNS', 'RB', 'RBR', 'RBS')

class h2o_sonar.utils.perturbations.AntonymPerturbator

Bases: Perturbator, AbcSynAntPerturbator

Perturbator that replaces words with their antonyms.

class h2o_sonar.utils.perturbations.CommaPerturbator

Bases: Perturbator

Perturbator that adds a comma after some words. It mimics a common mistake in English writing and/or typos.

class h2o_sonar.utils.perturbations.ContextualMisinformationPerturbator

Bases: Perturbator, AbcAgenticPerturbator

Contextual misinformation perturbator is agent-based perturbator that introduces factually incorrect information within a seemingly plausible context, aiming to mislead the model into accepting false statements - adversarial attack.

is_compatible() → bool

class h2o_sonar.utils.perturbations.EncodingPerturbator

Bases: Perturbator

Perturbator that encodes the prompt to specified encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.

See: https://substack.com/home/post/p-156004330

TYPE_ANSWER_DECODED = 'answer_decoded'

TYPE_ANSWER_ENCODED = 'answer_encoded'

TYPE_PROMPT_DECODED = 'prompt_decoded'

TYPE_PROMPT_ENCODED = 'prompt_encoded'

class h2o_sonar.utils.perturbations.EncodingPerturbatorBase16

Bases: EncodingPerturbator

Perturbator that encodes the prompt using base64 encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.

See: https://substack.com/home/post/p-156004330

class h2o_sonar.utils.perturbations.KeywordTyposCharacterPerturbator: Bases: Perturbator

class h2o_sonar.utils.perturbations.Perturbator

Bases: ABC

Base class for perturbators.

as_descriptor() → PerturbatorDescriptor

classmethod config_max_items() → int

property description: str

property display_name: str

classmethod is_compatible() → bool

property keywords: List[str]

perturb(text: str | List[str], intensity: PerturbationIntensity = PerturbationIntensity.MEDIUM, retries: int = 15, raised_errors: List | None = None, **perturbation_params) → str | List[str] | None

Perturb the input text with the given intensity.

Parameters:

textUnion[str, List[str]]: Text to perturb.
intensityUnion[PerturbationIntensity, str]: Perturbation intensity.
retriesint, optional: Number of retries if the perturbation does not yield a new text.
raised_errorsOptional[List]: If None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.

classmethod perturbator_id() → str

class h2o_sonar.utils.perturbations.PerturbatorDescriptor(perturbator_id: str, display_name: str = '', description: str = '', keywords: List[str] | None = None)

Bases: object

clone() → PerturbatorDescriptor

dump() → dict

static load(d: Dict) → PerturbatorDescriptor

class h2o_sonar.utils.perturbations.PerturbatorRegistry(singleton_create_key)

Bases: object

Registry of perturbators.

are_compatible(perturbators: List[PerturbatorToRun], items: int = 0) → List[PerturbatorToRun]

describe_perturbator(perturbator_id: str) → PerturbatorDescriptor | None

get_perturbator(perturbator_id: str) → Perturbator | None

is_compatible(perturbator_id: str, items: int = 0) → bool: Is the perturbator available and compatible given metadata declarations?

list_perturbators(keywords: List[str] | None = None) → List[Perturbator]: List and optionally filter perturbators by keywords - if multiple keywords are provided, the perturbator must have all of them to be included in the result.

register(perturbator: Perturbator)

classmethod registry()

class h2o_sonar.utils.perturbations.QwertyPerturbator

Bases: Perturbator

Perturbator that replaces ‘y’ with ‘z’ and vice versa.

class h2o_sonar.utils.perturbations.RandomCharacterDeletePerturbator: Bases: Perturbator

class h2o_sonar.utils.perturbations.RandomCharacterInsertPerturbator: Bases: Perturbator

class h2o_sonar.utils.perturbations.RandomCharacterReplacementPerturbator: Bases: Perturbator

class h2o_sonar.utils.perturbations.RandomOCRCharacterPerturbator: Bases: Perturbator

class h2o_sonar.utils.perturbations.SynonymPerturbator

Bases: Perturbator, AbcSynAntPerturbator

Perturbator that replaces words with their synonyms.

class h2o_sonar.utils.perturbations.WordSwapPerturbator

Bases: Perturbator

Perturbator that swaps two words in a sentence.

h2o_sonar.utils.perturbations.register_ootb_perturbators(): Register out-of-the-box perturbators.

h2o_sonar.utils.sampling module

This module provides the following dataset sampling techniques:

StratifiedDatasetSampling: (default) Dataset sampler which implements both stratified and random sampling. The sampler automatically decided which sampling technique to use.
- CONS:
  - stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)
- PROS:
  - supports stratified (classification models) and random sampling (regression)
  - makes automatic decision of the sampling method (can be overriden w/ parameter)
  - random sampling is able to sample dataset bigger than the free RAM size
NoDatasetSampling: Sampler which is used when the user requests NO sampling. In order to avoid OOM/H2O Eval Studio crash it checks whether the datasets fits in RAM and if it doesn’t then it raises an exception with a request to sample/use a different dataset.
RandomPandasDatasetSampling: Dataset sampler which implements random sampling using Pandas.
- CONS:
  - dataset must fit in free RAM (2x)
  - sampler does not support the stratification
- PROS:
  - enables the use of Pandas sampler seamlessly in the H2O Eval Studio runtime
HeadOfDatasetSampling: Sampler which does not sample, but returns sampling_limit number of rows from the head of the dataset.
- CONS:
  - sampled dataset will be very likely biased (should not be used in production)
- PROS:
  - fast
  - handles dataset of any size
  - can be used for splitting and non-functional testing

class h2o_sonar.utils.sampling.DatasetSampler(system_limit: int = 1000000000)

Bases: ABC

The sampler children implementations various dataset sampling techniques.

H2O Eval Studio container samples the dataset upfront (based on the interpretation parameters) in order to protect the process/runtime (from the crash), the system (from OOO and extensive used of resources) and explainers from failures.

DEFAULT_CAT_NUM_THRESHOLD = 50

H2O_SONAR_LIMIT = 25000

SYSTEM_LIMIT = 1000000000

static is_dataset_fit_in_memory(dataset_path: str | Path)

Check whether the dataset file would fit to free RAM and return sizes.

Parameters:

dataset_pathstr: Dataset path.

Returns:

Tuple[bool, int, int]: Return whether the dataset will fit, dataset size in bytes and RAM size in bytes.

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = 0, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) → Tuple[bool, Frame, str]

Sample dataset.

Parameters:

dataset: Union[datatable.Frame, str, pathlib.Path]: Dataset to be sampled as reference to the frame or a path to the file.
sampling_limitOptional[int] = None,: If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.
target_colstr = “”,: Optional target colum which is required for certain sampling techniques (like for stratified sampling).
is_classificationbool: If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.
drop_nan_rowsbool: True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.
drop_1_classesbool: True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.
classesOptional[List] = None: Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.
sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]: Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
seedint: Optional random seed for reproducible sampling.
logger: Optional logger.

Returns:

Union[datatable.Frame, str]: Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.HeadOfDatasetSampling(chunk_size: int = 1000000)

Bases: DatasetSampler

Sampler which does not sample, but returns sampling limit number of examples from the head of the dataset.

PRESUMPTIONS:

sampled dataset will fit into free RAM

CONS:

it is NOT correct for the data science perspective and should NOT be used as it does not guarantee anything - the sampled dataset will very likely be biased i.e. may have completely different characteristics and statistics than the original dataset

PROS:

it can sample dataset of any size, therefore enables H2O Eval Studio to run on the dataset of any size - in case that the data science aspect is not a problem, this sampler might be a good choice
it is relatively fast in comparison to other samplers
it is ideal for non-functional testing

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) → Tuple[bool, Frame, str]

Sample dataset.

Parameters:

dataset: Union[datatable.Frame, str, pathlib.Path]: Dataset to be sampled as reference to the frame or a path to the file.
sampling_limitOptional[int] = None,: If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.
target_colstr = “”,: Optional target colum which is required for certain sampling techniques (like for stratified sampling).
is_classificationbool: If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.
drop_nan_rowsbool: True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.
drop_1_classesbool: True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.
classesOptional[List] = None: Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.
sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]: Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
seedint: Optional random seed for reproducible sampling.
logger: Optional logger.

Returns:

Union[datatable.Frame, str]: Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.NoDatasetSampling(check_ram: bool = True)

Bases: DatasetSampler

Sampler which does NO sampling and can check whether the dataset would fit in RAM and thus avoid H2O Eval Studio OOM crash. Used as default sampling method.

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) → Tuple[bool, Frame, str]

Sample dataset.

Parameters:

dataset: Union[datatable.Frame, str, pathlib.Path]: Dataset to be sampled as reference to the frame or a path to the file.
sampling_limitOptional[int] = None,: If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.
target_colstr = “”,: Optional target colum which is required for certain sampling techniques (like for stratified sampling).
is_classificationbool: If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.
drop_nan_rowsbool: True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.
drop_1_classesbool: True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.
classesOptional[List] = None: Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.
sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]: Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
seedint: Optional random seed for reproducible sampling.
logger: Optional logger.

Returns:

Union[datatable.Frame, str]: Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.RandomPandasDatasetSampling(logger=None)

Bases: DatasetSampler

Dataset sampler which implements random sampling using Pandas: pandas.DataFrame.sample().

CONS:
- dataset must fit in free RAM (2x)
- sampler does not support the stratification
PROS:
- enables the use of Pandas sampler seamlessly in the H2O Eval Studio runtime

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) → Tuple[bool, Frame, str]

Sample dataset.

Parameters:

dataset: Union[datatable.Frame, str, pathlib.Path]: Dataset to be sampled as reference to the frame or a path to the file.
sampling_limitOptional[int] = None,: If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.
target_colstr = “”,: Optional target colum which is required for certain sampling techniques (like for stratified sampling).
is_classificationbool: If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.
drop_nan_rowsbool: True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.
drop_1_classesbool: True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.
classesOptional[List] = None: Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.
sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]: Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
seedint: Optional random seed for reproducible sampling.
logger: Optional logger.

Returns:

Union[datatable.Frame, str]: Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.StratifiedDatasetSampling

Bases: DatasetSampler

Dataset sampler which implements both stratified and random sampling.

CONS:
- stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)
PROS:
- supports stratified (classification models) and random sampling (regression)
- makes automatic decision of the sampling method (can be overriden w/ parameter)
- random sampling is able to sample dataset bigger than the free RAM size

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) → Tuple[bool, Frame, str]

Sample dataset.

Parameters:

dataset: Union[datatable.Frame, str, pathlib.Path]: Dataset to be sampled as reference to the frame or a path to the file.
sampling_limitOptional[int] = None,: If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.
target_colstr = “”,: Optional target colum which is required for certain sampling techniques (like for stratified sampling).
is_classificationbool: If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.
drop_nan_rowsbool: True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.
drop_1_classesbool: True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.
classesOptional[List] = None: Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.
sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]: Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
seedint: Optional random seed for reproducible sampling.
logger: Optional logger.

Returns:

Union[datatable.Frame, str]: Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

h2o_sonar.utils.sampling.downsample_dataset(dataset, sample_size: int | None = None, runtime_sample_size: int | None = None, target_col: str = '', is_classification: bool = False, classes: List | None = None, seed: int = 42, logger=None)

Dataset sampling method used by the explainers in Driverless AI (and potentially other container runtimes) to sample the input dataset according to their needs.

This method is not used by the local explainer as it samples the input dataset upfront to protect all the explainers. This is why this method serves as identity - it ensures that H2O Eval Studio’s sampling will not impact Driverless AI and other host runtimes.

Parameters:

dataset: Dataset to be sampled.
sample_sizeOptional[int]: Sampling limit to use.
runtime_sample_sizeOptional[int]: Runtime protection - sample dataset to this size even if sample_size is bigger to protect the runtime and avoid space (memory) / time overloading.
target_colstr: Target column to be used for the sampling.
is_classificationbool: Sample for regression (False) or classification (True).
classesOptional[List]: List of classes in case of sampling of classification model dataset.
seedint: Sampling seed.
logger: Logger.
Returns
——-
Any: Sample dataset.

h2o_sonar.utils.sanitization module

class h2o_sonar.utils.sanitization.DriverlessAiSanitizationMap(raw_names: List[str], sanitized_names: List[str])

Bases: SanitizationMap

Driverless AI model sanitization map.

Driverless AI (auto ML) model provides its own sanitization map. The purpose of this class is to make Driverless AI sanitization available vis standard SanitizationMap interface.

class h2o_sonar.utils.sanitization.SanitizationMap(raw_names: List[str], sanitized_names: List[str])

Bases: object

Map of original (raw) dataset column names/features to sanitized names and vice versa.

static ensure(cols, col) → List[str]

static sanitize_value(values: str | List[str], special_chars: str = '|,=[]<\t\r\n:.~') → str | List[str]: Method for feature values (labels, classes) sanitization. Note that column/feature name sanitization (handled by map) typically has different requirements than value sanitization. Also note that value sanitization is one way (original to sanitized only) and potentially may have collisions if sanitized in multiple calls to this method (collisions within one call of this function are resolved).

to_raw(names: str | List[str]): Sanitized name(s) to original (raw) name(s).

to_sanitized(names: str | List[str]): Original (raw) name(s) to sanitized name(s).

h2o_sonar.utils.sanitization.sanitize_frame(frame, sanitization_map: SanitizationMap | None = None)

h2o_sonar.utils.sanitization.sanitize_markdown(md_fragment: str) → str

The purpose of this function is to sanitize a Markdown fragment string. It is NOT meant to sanitize whole Markdown documents, but its fragments where string (to be stored in Markdown) would interact with other Markdown elements.

Parameters:

md_fragmentstr: A Markdown fragment string.

Returns:

str: Sanitized Markdown string fragment without dangerous characters and links.

h2o_sonar.utils.sanitization.sanitize_names(names: str | List[str], sanitization_map: SanitizationMap | None = None)

Sanitize column/feature name(s) either using (model’s) sanitization map (if available) or using universal sanitization method.

Parameters:

namesUnion[str, List[str]]: Name(s) to be sanitized.
sanitization_mapOptional[SanitizationMap]: Optional sanitization map.

h2o_sonar.utils.sanitization.sanitize_strings(strings: str | List[str], replace_with: str = '_', special_chars: str = '|,=[]<\t\r\n:.~')

Sanitize a string or a list of strings.

Parameters:

stringsUnion[str, List[str]]: Strings to be sanitized.
replace_withstr: Character to be used for replacement for characters to be forbidden.
special_charsstr: Optional special characters to be sanitized.

Returns:

Union[str, List[str]]: Sanitized strings.

h2o_sonar.utils.testing module

H2O Eval Studio LLM / RAG testing utilities:

Raw test data: a dataset which was used to create the test configuration(s).
Test suite: a collection of tests (see below).
Test: a collection of documents (corpus) along with the test cases (see below) to be run in the context of the corpus.
Test case: a prompt, expected output (ground truth), categories, output condition, output constraints, … and other parameters to be used for a RAG / LLM model evaluation.
Test lab: a set of resolved tests enriched with answers (actual answer), retrieval context, response duration and other data obtained from the conversation with a RAG / LLM.

Resolved test lab is exported to LLM dataset which is then used as input to an evaluation - evaluation runs a set of evaluators to rank RAG / LLM models.

class h2o_sonar.utils.testing.InMemoryLlmHostPromptCache

Bases: LlmHostPromptCache

In-memory LLM host client cache:

initialization:
- (pre-built) cache can be loaded from a JSon file
hints:
- cache can be saved and loaded from a JSon file
- pre-built cache can be created from a test lab (not implemented by this class)
- when used in the testing environment, cloud deployment, … pre-built cache can be synchronized/downloaded from S3, filesystem, …

KEY_DATA = 'cache_data'

add_test_lab(test_lab: RagTestLab): Add the test lab to the cache.

clear(): Clear the cache.

evict(key: str): Evict the value from the cache for the given key.

get(key: str) → Dict | None

Get the cached value for the given key.

Returned dictionary might be passed to a result class with types.

get_llm_model_names(explainable_model_type: ExplainableModelType) → List[str]: List all the LLM model names known to the cache.

static load_from_json(file_path: str | Path)

static load_from_url(url: str, work_dir: str | Path = '')

put(key: str, value: Dict)

Put the value to the cache for the given key.

Parameters:

keystr: Cache key.
valueDict: Cache value - it is expected that the dictionary is JSon serialized LlmDataset.LlmDatasetRow.

save_to_json(file_path: str | Path)

to_dict() → Dict

class h2o_sonar.utils.testing.LlmHostPromptCache

Bases: ABC

Prompt cache for the LLM host clients:

caches: - answer(s) (actual answer, duration, cost, chunks, …) for given prompt(s)
NOT caches: - corpus documents synchronization - RAG host server collection creation - LLM models listing
cache key: - does NOT consider particular host (like playground.h2ogpte.h2o.ai),

but rather the LLM host type ~ connection type (like H2O_GPT_E or OPENAI_RAG)
- does NOT consider particular chunk retrieval method
- DOES consider corpus documents (empty for non-RAG), prompt, LLM model name, required context chunks (via chunk retrieval method - none or a method), …
- DOES consider context (empty for RAG)
implementations (options): - in-memory cache (testing) - filesystem cache (pre-build JSON files) - Redis cache (shared by EvalStudio workers) - memcached cache (shared by EvalStudio workers) - …
utilities: - cache key generation - cache key hashing - static cache builder from serialized test labs (JSon)
purpose: - NON production use - for testing / demos / conference hands-on sessions only - significantly speed up the test lab completion - avoid test lab build failures due to unstable/slow/fragile system under test

(like h2oGPTe server)
- save costs (e.g. OpenAI server costs)

KEY_ACTUAL_OUTPUT = 'actual_output'

KEY_CONTEXT = 'context'

KEY_CORPUS = 'corpus'

KEY_COST = 'cost'

KEY_DURATION = 'actual_duration'

KEY_EXTRAS = 'extras'

KEY_INPUT = 'input'

KEY_LLM_MODEL_NAME = 'llm_model_name'

KEY_MODEL_TYPE = 'model_type'

PREFIX_KEY = 'CACHE-KEY::'

abstract clear(): Clear the cache.

abstract evict(key: str): Evict the value from the cache for the given key.

abstract get(key: str) → Dict | None

Get the cached value for the given key.

Returned dictionary might be passed to a result class with types.

static get_key(explainable_model_type: ExplainableModelType, prompt: str, llm_model_name: str, corpus: List[str] | None = None, extras: str = '') → str

Generate cache key for the LLM host client cache:

does NOT consider particular host (like playground.h2ogpte.h2o.ai)
does NOT consider RAG collection
does NOT consider chunk retrieval method
suitable for both RAG hosts (empty corpus, no context) and LLM hosts

Parameters:

explainable_model_typemodels.ExplainableModelType: Explainable model type.
promptstr: Prompt for which the answer is to be cached.
llm_model_namestr: LLM model name whose answer is to be cached.
corpusOptional[List[str]]: Corpus documents - instead of relying on the collection (ID and name which may differ) corpus information is used.
extrasstr: Extra information - any other parameters which may make the cache key unique.

Returns:

str: Cache key.

abstract get_llm_model_names(explainable_model_type: ExplainableModelType) → List[str]: List all the LLM model names known to the cache.

abstract put(key: str, value: Dict)

Put the value to the cache for the given key.

Parameters:

keystr: Cache key.
valueDict: Cache value - it is expected that the dictionary is JSon serialized LlmDataset.LlmDatasetRow.

static str_key_to_dict(key_dict: str) → Dict

class h2o_sonar.utils.testing.RagTestCaseConfig(prompt: str, categories: str | List[str] = '', relationships=None, constraints=None, condition='', expected_output: str = '', config: RagTestConfig | None = None, key: str = '')

Bases: object

RAG / LLM test case configuration:

prompt
expected output
categories
condition (string expression)
constraints (any JSON serializable object)
…

KEY_CATEGORIES = 'categories'

KEY_CONDITION = 'condition'

KEY_CONSTRAINTS = 'constraints'

KEY_EXPECTED_OUTPUT = 'expected_output'

KEY_KEY = 'key'

KEY_PROMPT = 'prompt'

KEY_RELS = 'relationships'

add_relationship(relationship_type: str, target: str, target_type: str)

copy(update_key: bool = True)

perturb(perturbators: List[PerturbatorToRun], in_place: bool = True, raised_errors: List | None = None)

Perturb the prompt.

Parameters:

perturbatorsList[commons.PerturbatorToRun]: Perturbators to run - includes the perturbator ID, intensity, and parameters.
in_placebool: If True, perturb the prompt in place, otherwise create a new perturbed test case.
raised_errorsOptional[List]: If None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.

to_dict()

class h2o_sonar.utils.testing.RagTestConfig(documents: List[Path | str], key: str = '')

Bases: object

RAG / LLM test configuration:

corpus … a set of documents (empty for LLM evaluation) - test cases

… a set of prompts, expected outputs, categories, conditions, …

KEY_DOCUMENTS = 'documents'

KEY_KEY = 'key'

static from_dict(key: str, as_dict: Dict) → RagTestConfig

to_dict()

class h2o_sonar.utils.testing.RagTestLab(llm_host_connection: ConnectionConfig, raw_dataset: LlmDataset, evaluated_models: List[ExplainableRagModel | ExplainableLlmModel] | None = None, llm_model_names: List[str] | None = None, docs_cache_dir: str | Path = '', name: str = 'TestLab', description: str = 'Test lab for RAG / LLM evaluation.', llm_host_prompt_cache: LlmHostPromptCache | None = None, use_evaluated_model_collection_id: bool = False, logger=None)

Bases: TestLab

LLMs (Large Language Model) test lab:

TestLab is expected to test multiple LLMs either hosted by one service (like OpenAI) or by RAG (Retrieval-augmented generation) product (like h2oGPTe) or by LLM host product (like h2oGPT).
TestLab gets connection configuration to the host system.
TestLab can compare / benchmark multiple LLM models from the same host system.
Resolved test labs can be merged to get an aggregated lab -> LLM dataset with multiple LLM hosts for the side-by-side evaluation by the evaluate module.

bind(collection_id: str, collection_name: str, corpus: List | None = None): Bind ALL the test lab RAG models to collection(s) instead of building it by creating collections and uploading documents.

build(doc_sync_meta: Dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, sync_documents: bool = True, fail_on_missing_corpus: bool = False)

Build the test lab so that it can be used for the evaluation:

synchronize the document cache
create RAG’s document collections
upload documents (corpora) to collection(s).

Parameters:

doc_sync_meta: Dict[str, Any]

Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like headers. Example:

“http://example.com/doc1.txt”: {

“headers”: {: “foo-header”: “FOO-VALUE”,

}

progress_callbackOptional[progress.AbstractProgressCallbackContext]

Optional progress callback context.

sync_documentsbool

Sync documents from the network/filesystem to the lab’s document cache.

fail_on_missing_corpusbool

Fail if the test does not specify any corpus.

complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: str | Path | None = None, parallelize: int = 0, multi_turn: bool = False, retry_on_error: int = 2, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, include_llm_meta: bool = True, raise_on_all_tcs_fail: bool = True, purge_workdir: bool = True)

Complete the dataset with the actual values from an LLM host.

Parameters:

complete_contextint: How many context text chunks to include in the resolved dataset.
progress_callbackOptional[progress.AbstractProgressCallbackContext]: Optional progress callback context.
save_as_you_goOptional[Union[str, pathlib.Path]]: Save the dataset as JSON after each input is resolved.
parallelizeint: Complete the dataset in parallel using multiple processes. Use -1 for auto-choice of the number of workers, 0 to disable parallelization (will create the lab using sequential requests), and ``1``+ (positive integer) to specify the number of workers.
multi_turnbool: Whether to use multi-turn chat with the LLM host - if enabled, then all test cases within the test will be handled within the single session i.e. the same chat session i.e. the same context.
retry_on_errorint: How many times to retry the failed LLM host requests.
timeout_exp_backoffOptional[TimeoutRetryExpBackoffCtx]: Optionally override timeout which can be specified in the model host configuration ExplainableRagModel::model_cfg and which is model host type specific and use exponential backoff strategy for the timeout handling. Timeout is increased on each retry by the backoff factor.
include_llm_metabool: Whether to include the LLM meta-data like performance statistics.
raise_on_all_tcs_fail:: Raise an exception if all test cases fail.
purge_workdirbool: Purge the working directory with lab shards after the completion.

complete_from_shards(execution_dir_path: str | Path): Complete the test lab from the shards stored on the filesystem. This method is used to load previously completed test lab shards and merge them into a single resolved dataset.

static from_eval_results(eval_results_path: str | Path, interpretation_json_path: str | Path, raw_dataset_empty: bool = True)

Create a test lab from the evaluation results archive.

Parameters:

eval_results_pathUnion[str, pathlib.Path]: Path to the evaluation results JSon file path.
interpretation_json_pathUnion[str, pathlib.Path]: Path to the interpretation JSon file path.
raw_dataset_emptybool: Whether to create an empty raw dataset or copy resolved dataset

static from_llm_test_suite(llm_host_connection: ConnectionConfig, llm_test_suite: RagTestSuiteConfig, llm_model_type: ExplainableModelType, llm_model_names: List[str], work_dir: str | Path = '', llm_models_cfgs: Dict[str, List[Dict]] = None, llm_host_prompt_cache: LlmHostPromptCache | None = None) → RagTestLab: Create new (unresolved) test lab from the LLM test suite configuration.

static from_rag_test_suite(rag_connection: ConnectionConfig, rag_test_suite: RagTestSuiteConfig, rag_model_type: ExplainableModelType, llm_model_names: List[str], docs_cache_dir: str | Path, rag_models_cfgs: Dict[str, List[Dict]] = None, predefined_collection_id: str | Dict | None = None, llm_host_prompt_cache: LlmHostPromptCache | None = None) → RagTestLab

Create new (unresolved) test lab from the RAG test suite configuration.

Test lab is build as follows:

all LLM model names are hosted by the SAME system described by the RAG connection, accessed by a client
LLM model name may have associated a list of custom client configurations
RAG test suite is used to build the test lab: - RAG test suite has test cases that are grouped to tests - test cases within the test has the SAME corpus,

different tests may have different corpora
explainable RAG model is… - created for EACH: LLM model name + config + corpus (not test) - carthesian product of:

LLM model names x client configurations x corpora = explainable RAG models

Summary of the explainable RAG models creation:

for each LLM model name
- for each client configuration of that LLM model name
  
  for each test
  
  create explainable RAG model

Parameters:

rag_connectionh2o_sonar_config.ConnectionConfig: Connection to the RAG system.
rag_test_suiteRagTestSuiteConfig: RAG test suite configuration.
rag_model_typeExplainableModelTypes: Type of the explainable model hosted by the RAG system.
llm_model_namesList[str]: List of LLM model names to be used to build the test lab and to be subsequently evaluated and compared. There are the following special names which can be used with h2oGPTe model host: - auto: to use the best available model chosen by h2oGPTe - ``: empty string to inherit configuration from the h2oGPTe collection - ``None: to inherit configuration from the h2oGPTe collection
rag_models_cfgsDict[str, List[Dict]]: Dictionary with LLM model name as key and list of client configurations as values. Each client configuration is a dictionary with the client configuration parameters which can be created by the client factory using client.config_factory().
docs_cache_dirUnion[str, pathlib.Path]: Directory to store the documents cache.
predefined_collection_idOptional[Union[str, Dict]]: Predefined collection ID for the RAG model. If provided as a string, it is used as the collection ID for all the test cases. If provided as a dictionary, it is used as a mapping of the test case keys to the collection IDs.
llm_host_prompt_cacheOptional[LlmHostPromptCache]: Cache for the LLM host client.

Returns:

RagTestLab: New RAG test lab.

get_evaluated_model_for_key(model_key: str): Get LLM model name for the evaluated model.

insight_internal_llm_errors(report_dir: str | Path = '', src: str = 'stats') → Tuple[Dict, str]

Create Markdown report with the internal LLM errors.

Parameters:

report_dirUnion[str, pathlib.Path]: Directory to save the reports as JSon and Markdown to.
srcstr: Source of the errors: stats (default) or dataset (text of answers analysis).

integrity_check()

static load_from_json(llm_host_connection: ConnectionConfig, file_path: str | Path, docs_cache_dir: str | Path = '', datatable_format: bool = False) → RagTestLab

merge(other_test_lab: RagTestLab, other_llm_prefix: str = 'Other'): Merge another test lab into this one.

purge(): Purge the test lab by deleting all the created collections/assistants and uploaded documents.

split_to_shards(base_dir: Path, max_total_workers: int = 20) → Dict

Split the test lab into shards by RAG model (which is identified by corpus and base LLM model name). If there is one RAG model (or just a few), then even the inputs of particular model are split into shards. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.

Sharding strategy:

1 RAG model:
- split the inputs of the model for max 20 workers (split to 20 shards)
- the minimum number of inputs per worker is 2 (consider process overhead)
>1 RAG model:
- if the number of models is GREATER than 10, split the inputs by the RAG model i.e. the number of needed workers is equal to the number of RAG models
- if the number of models is SMALLER or equal to 10, then use up to 20 workers to split the inputs

Parameters:

base_dirpathlib.Path: Base directory where to store the shards - JSon representation of test labs.
max_total_workersint: The number of workers which is used to split the inputs of the SINGLE model (or lab with just a few models).

split_to_shards_by_model(base_dir: Path) → Dict: Split the test lab into shards by RAG model - which is identified by corpus and base LLM model name. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.

stats() → Dict: Get the test lab statistics and cross-check.

sync_documents(doc_sync_meta: Dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, fail_on_missing_corpus: bool = False) → Path

Cache test suite documents from the network to the local filesystem so that they can be used for RAG evaluation later.

Parameters:

doc_sync_meta: Dict[str, Any]

Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like headers. Example:

“http://example.com/doc1.txt”: {

“headers”: {: “foo-header”: “FOO-VALUE”,

}

progress_callbackOptional[progress.AbstractProgressCallbackContext]

Optional progress callback context.

fail_on_missing_corpusbool

Fail if a test has empty corpus, or create dummy document to enable empty RAG corpora

to_dict() → Dict

trim(max_llm_models_count=None): Trim the test lab by keeping only specified number of LLM models and removing all the orphans.

class h2o_sonar.utils.testing.RagTestLabPromptCache(singleton_create_key)

Bases: object

RAG test lab prompt cache (singleton) to be used across H2O Eval Studio.

ENV_VAR_H2O_SONAR_PROMPT_CACHE: str = 'H2O_SONAR_PROMPT_CACHE_ENABLED'

ENV_VAR_H2O_SONAR_PROMPT_CACHE_SIZE: str = 'H2O_SONAR_PROMPT_CACHE_SIZE'

ENV_VAR_H2O_SONAR_PROMPT_CACHE_SRC: str = 'H2O_SONAR_PROMPT_CACHE_SRC'

ENV_VAR_H2O_SONAR_PROMPT_CACHE_STATIC: str = 'H2O_SONAR_PROMPT_CACHE_STATIC'

MAX_ITEMS = 5000

classmethod cache()

reinitialize(enable_cache: bool | None = None, src_path: str | Path | None = None, src_host_connection: ConnectionConfig | None = None, max_items: int | None = None)

class h2o_sonar.utils.testing.RagTestSuiteConfig(test_cases: List[RagTestCaseConfig] | None = None, name: str = 'TestSuite', description: str = 'Test suite for RAG / LLM evaluation.')

Bases: object

RAG / LLM test suite configuration:

test suite (RagTestSuiteConfig) … a set of tests - tests (RagTestConfig)

… corpus with a set of test cases - test cases (RagTestCaseConfig)

… prompt, expected output, categories, conditions, …

KEY_DESCRIPTION = 'description'

KEY_NAME = 'name'

KEY_TESTS = 'tests'

KEY_TEST_CASES = 'test_cases'

add_test_case(test_case: RagTestCaseConfig)

copy() → RagTestSuiteConfig

static from_llm_dataset(llm_dataset: LlmDataset) → RagTestSuiteConfig: Create RAG test configuration from the LLM dataset.

static load_from_json(file_path: str | Path)

perturb(perturbators: List[PerturbatorToRun], in_place: bool = True, raised_errors: List | None = None)

Perturb the test suite prompts.

Parameters:

perturbatorsList[commons.PerturbatorToRun]: Perturbators to run - includes the perturbator ID, intensity, and parameters.
in_placebool: If True, perturb the test cases in place - there will be the same number of tests and test cases within the test suite Otherwise keep the original test cases and create new perturbed test cases - there will be 2x more test cases in the test suite after the perturbation (all intermediary perturbations in case of multiple perturbator IDs are discarded).
raised_errorsOptional[List]: If None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.

save_as_json(file_path: str | Path)

split(max_tests: int) → List[RagTestSuiteConfig]

Split the test suite to multiple test suites so that each test suite has at most the given number of tests.

Parameters:

max_testsint: Maximum number of tests in a test suite.

Returns:

List[RagTestSuiteConfig]: List of new test suites.

property tests: List[RagTestConfig]

to_dict() → Dict

trim_tests(max_tests: int): Trim the test suite to the given number of tests.

class h2o_sonar.utils.testing.TestLab

Bases: ABC

A test target / product test lab.

KEY_BASE_MODEL_NAMES = 'llm_model_names'

KEY_DATASET = 'dataset'

KEY_DESCRIPTION = 'description'

KEY_DOCS_CACHE = 'docs_cache'

KEY_MODELS = 'models'

KEY_NAME = 'name'

KEY_RAW_DATASET = 'raw_dataset'

PARALLEL_RUN = -1

SEQUENTIAL_RUN = 0

build(): Build / deploy / materialize the test lab on the host system e.g. by creating RAG’s document collections, uploading documents to the collection, …

complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: str | Path | None = None, parallelize: int = 0, retry_on_error: int = 2, purge_workdir: bool = True)

Complete the LLM dataset with the actual values from the host system.

Parameters:

complete_contextint: How many context text chunks to include in the resolved dataset.
progress_callbackOptional[progress.AbstractProgressCallbackContext]: Optional progress callback context.
save_as_you_goOptional[Union[str, pathlib.Path]]: Save the dataset as JSON after each input is resolved.
parallelizeint: Complete the dataset in parallel using multiple processes. Use -1 for auto-choice of the number of workers, 0 to disable parallelization (will create the lab using sequential requests), and ``1``+ (positive integer) to specify the number of workers.
retry_on_errorint: How many times to retry the failed LLM host requests.
purge_workdirbool: Purge the working directory with lab shards after the completion.

save_as_json(file_path: str | Path)

to_dict() → Dict

h2o_sonar.utils package

Submodules

h2o_sonar.utils.binning module

h2o_sonar.utils.caching module

h2o_sonar.utils.crypto module

h2o_sonar.utils.io module

h2o_sonar.utils.normalization module

h2o_sonar.utils.preprocessing module

h2o_sonar.utils.problem_detection module

h2o_sonar.utils.perturbations module

h2o_sonar.utils.sampling module

h2o_sonar.utils.sanitization module

h2o_sonar.utils.testing module

Module contents