h2o_sonar.utils package

Submodules

h2o_sonar.utils.binning module

h2o_sonar.utils.binning.build_qtile_bins(bins: List, X: DataFrame, feature: str, quantile: int)

Build quantile bins and append back to input bins list.

Parameters:
binsList

List of bins.

Xpandas.DataFrame

Input frame to PD/ICE.

featurestr

Feature to create quantile bins for.

quantileint

The decile to compute.

h2o_sonar.utils.binning.qbin_column(frame: Frame, column: str, logger)

Quantile bin a column in a frame and substitute it in that frame with quantile group ranges for each row.

Parameters:
framedatatable.Frame

Frame containing the data. One of the column names must correspond to the column parameter.

columnstr

Name of the column to be checked.

loggerLogger

Logger.

h2o_sonar.utils.binning.quantile_bin(frame: Frame = None, qbin_cols: List[str] | None = None, qbin_count: int = 0, varimp_list: List[str] | None = None, logger=None)

Quantile binning.

Parameters:
framedt.Frame

Input frame for quantile binning.

qbin_colsList

Column(s) to use for quantile binning

qbin_countint

Number of top numeric variables to use from model’s variable importance list.

varimp_listList

Variable importance list from model.

loggerLogger

Logger.

Returns:
Tuple[list, Pandas Dataframe]

List of columns that were binned and Dataframe with quantile binned columns.

h2o_sonar.utils.caching module

Caching module provides functionality to download and cached models used for evaluation upfront, to avoid downloading the models in the runtime.

h2o_sonar.utils.caching.cache_all_models(logger: SonarLogger)

Cache all the models used in the Sonar package.

h2o_sonar.utils.caching.cache_baai_bge_small_en(logger: SonarLogger)

Cache the BAAI BGE small en

h2o_sonar.utils.caching.cache_baai_bge_small_env15(logger: SonarLogger)

Cache the BAAI BGE small environment v1.5 model.

h2o_sonar.utils.caching.cache_bert_base_uncased(logger: SonarLogger)

Cache the BERT base uncased model.

h2o_sonar.utils.caching.cache_bge_m3(logger: SonarLogger)

Cache the BGE m3

h2o_sonar.utils.caching.cache_detoxify_models(logger: SonarLogger)

Download and cache the Detoxify models.

h2o_sonar.utils.caching.cache_eval_studio_models(logger: SonarLogger)

Download the Eval Studio models from the S3

h2o_sonar.utils.caching.cache_gptscore_evaluator_model(logger: SonarLogger)

Cache default model for gptscore evaluator

h2o_sonar.utils.caching.cache_hkunlp_instructor(logger: SonarLogger)

Cache hkunlp Instructor

h2o_sonar.utils.caching.cache_lmppl_perplexity_evaluator_model(logger: SonarLogger)

Cache default model for perplexity evaluator

h2o_sonar.utils.caching.cache_nltk(logger: SonarLogger)

Cache the NLTK models.

  • Punkt - used in BLEU and perturbations

  • averaged_perceptron_tagger - used in perturbations

  • wordnet - used in perturbations

h2o_sonar.utils.caching.cache_nltk_averaged_perceptron_tagger(logger: SonarLogger | None = None)
h2o_sonar.utils.caching.cache_nltk_punkt(logger: SonarLogger | None = None)
h2o_sonar.utils.caching.cache_nltk_wordnet(logger: SonarLogger | None = None)
h2o_sonar.utils.caching.cache_summac_vitc(logger: SonarLogger)

Cache the summac used for summarization

h2o_sonar.utils.caching.cache_tiktoken_blobs(logger: SonarLogger)

Cache the TikToken blobs.

h2o_sonar.utils.caching.cache_vectara_hallucination_model(logger: SonarLogger)

Cache the Vectara hallucination evaluation model.

h2o_sonar.utils.crypto module

h2o_sonar.utils.crypto.decrypt(encryption_key: str, data: str) str
h2o_sonar.utils.crypto.encrypt(encryption_key: str, data: str) str
h2o_sonar.utils.crypto.resolve_encryption_key(encryption_key: str = '') str

h2o_sonar.utils.io module

h2o_sonar.utils.io.from_list_explainers_args_json(args_str: str) Dict

Deserialize interpret.py::list_explainers() method arguments from JSon string to dictionary which might be used as a Python method kwargs.

h2o_sonar.utils.io.from_run_interpretation_args_json(args_str: str) Dict

Deserialize interpret.py::run_interpretation() method arguments from JSon string to dictionary which might be used as a Python method kwargs.

h2o_sonar.utils.io.load_list_explainers_args_json(file_path) Dict

Load list_explainers() keyword arguments from file.

h2o_sonar.utils.io.load_run_interpretation_args_json(file_path) Dict

Load run_interpretation() keyword arguments from file.

h2o_sonar.utils.io.to_list_explainers_args_json(experiment_types: List[str] | None = None, explanation_scopes: List[str] | None = None, model_meta: ExplainableModelMeta | None = None, keywords: List[str] | None = None, explainer_filter: List[FilterEntry] | None = None, extra_params: Dict | None = None) str

Serialize interpret.py::list_explainers() method arguments as JSon.

h2o_sonar.utils.io.to_run_interpretation_args_json(dataset: str = '', model: str = '', target_col: str = '', explainers: List[str | ExplainerToRun] | None = None, explainer_keywords: List[str] | None = None, validset: str = '', testset: str = '', use_raw_features: bool = True, used_features: List | None = None, weight_col: str = '', prediction_col: str = '', drop_cols: List | None = None, sample_num_rows: int | None = None, log_level: int = 30, results_location: str = None, persistence_type: PersistenceType = PersistenceType.file_system, run_asynchronously: bool = False, run_explainers_in_parallel: bool = False, extra_params: Dict | None = None) str

Serialize interpret.py::run_interpretation() job arguments as JSon.

h2o_sonar.utils.normalization module

h2o_sonar.utils.normalization.normalize_importance(frame: Frame) Frame

Normalize local feature importance values to global as percentage.

Parameters:
framedatatable.Frame

Frame with local feature importance values.

Returns:
datatable.Frame

Normalized frame with global feature importance values.

h2o_sonar.utils.preprocessing module

class h2o_sonar.utils.preprocessing.MultiColumnLabelEncoder(columns=None)

Bases: LabelEncoder

Wraps sklearn LabelEncoder functionality for use on multiple columns of a Pandas DataFrame.

fit(dframe)

Fit label encoder to Pandas columns. Access individual column classes via indexing self.all_classes_. Access individual column encoders via indexing self.all_encoders_

fit_transform(dframe)

Fit label encoder and return encoded labels. Access individual column classes via indexing self.all_classes_ Access individual column encoders via indexing self.all_encoders_ Access individual column encoded labels via indexing self.all_labels_

inverse_transform(dframe)

Transform labels back to original encoding.

set_fit_request(*, dframe: bool | None | str = '$UNCHANGED$') MultiColumnLabelEncoder

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for dframe parameter in fit.

Returns:
selfobject

The updated object.

set_inverse_transform_request(*, dframe: bool | None | str = '$UNCHANGED$') MultiColumnLabelEncoder

Request metadata passed to the inverse_transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to inverse_transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to inverse_transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for dframe parameter in inverse_transform.

Returns:
selfobject

The updated object.

set_transform_request(*, dframe: bool | None | str = '$UNCHANGED$') MultiColumnLabelEncoder

Request metadata passed to the transform method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to transform if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to transform.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for dframe parameter in transform.

Returns:
selfobject

The updated object.

transform(dframe)

Transform labels to normalized encoding.

h2o_sonar.utils.preprocessing.categorical_encoder(X: DataFrame) Tuple[DataFrame, MultiColumnLabelEncoder, List]

h2o_sonar.utils.problem_detection module

h2o_sonar.utils.problem_detection.get_feature_importance_problems(shap_means_dict: Dict[str, Frame], threshold: float, explainer_id: str, explainer_display_name: str) List[ProblemAndAction]

Get feature importance problems and suggested actions based on SHAP values above a specified threshold.

Parameters:
shap_means_dictDict[str, datatable.Frame]

A datatable Frame containing Shapley values.

thresholdfloat

Threshold for showing potential data leakage in the most important feature.

explainer_idstr

Explainer id.

explainer_display_name: str

Explainer display name

Returns:
List[problems.ProblemAndAction]

A list of problems and actions.

h2o_sonar.utils.perturbations module

class h2o_sonar.utils.perturbations.AbcSynAntPerturbator

Bases: ABC

PUNCTUATION = ('.', ',', '?', '!', ':', ';', "'", '"', '(', '[', '{')
TAGS = ('CD', 'JJ', 'JJR', 'JJS', 'NN', 'NNS', 'RB', 'RBR', 'RBS')
class h2o_sonar.utils.perturbations.AntonymPerturbator

Bases: Perturbator, AbcSynAntPerturbator

Perturbator that replaces words with their antonyms.

class h2o_sonar.utils.perturbations.CommaPerturbator

Bases: Perturbator

Perturbator that adds a comma after some words. It mimics a common mistake in English writing and/or typos.

class h2o_sonar.utils.perturbations.ContextualMisinformationPerturbator

Bases: Perturbator, AbcAgenticPerturbator

Contextual misinformation perturbator is agent-based perturbator that introduces factually incorrect information within a seemingly plausible context, aiming to mislead the model into accepting false statements - adversarial attack.

is_compatible() bool
class h2o_sonar.utils.perturbations.EncodingPerturbator

Bases: Perturbator

Perturbator that encodes the prompt to specified encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.

See: https://substack.com/home/post/p-156004330

TYPE_ANSWER_DECODED = 'answer_decoded'
TYPE_ANSWER_ENCODED = 'answer_encoded'
TYPE_PROMPT_DECODED = 'prompt_decoded'
TYPE_PROMPT_ENCODED = 'prompt_encoded'
class h2o_sonar.utils.perturbations.EncodingPerturbatorBase16

Bases: EncodingPerturbator

Perturbator that encodes the prompt using base64 encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.

See: https://substack.com/home/post/p-156004330

class h2o_sonar.utils.perturbations.KeywordTyposCharacterPerturbator

Bases: Perturbator

class h2o_sonar.utils.perturbations.Perturbator

Bases: ABC

Base class for perturbators.

as_descriptor() PerturbatorDescriptor
classmethod config_max_items() int
property description: str
property display_name: str
classmethod is_compatible() bool
property keywords: List[str]
perturb(text: str | List[str], intensity: PerturbationIntensity = PerturbationIntensity.MEDIUM, retries: int = 15, raised_errors: List | None = None, **perturbation_params) str | List[str] | None

Perturb the input text with the given intensity.

Parameters:
textUnion[str, List[str]]

Text to perturb.

intensityUnion[PerturbationIntensity, str]

Perturbation intensity.

retriesint, optional

Number of retries if the perturbation does not yield a new text.

raised_errorsOptional[List]

If None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.

classmethod perturbator_id() str
class h2o_sonar.utils.perturbations.PerturbatorDescriptor(perturbator_id: str, display_name: str = '', description: str = '', keywords: List[str] | None = None)

Bases: object

clone() PerturbatorDescriptor
dump() dict
static load(d: Dict) PerturbatorDescriptor
class h2o_sonar.utils.perturbations.PerturbatorRegistry(singleton_create_key)

Bases: object

Registry of perturbators.

are_compatible(perturbators: List[PerturbatorToRun], items: int = 0) List[PerturbatorToRun]
describe_perturbator(perturbator_id: str) PerturbatorDescriptor | None
get_perturbator(perturbator_id: str) Perturbator | None
is_compatible(perturbator_id: str, items: int = 0) bool

Is the perturbator available and compatible given metadata declarations?

list_perturbators(keywords: List[str] | None = None) List[Perturbator]

List and optionally filter perturbators by keywords - if multiple keywords are provided, the perturbator must have all of them to be included in the result.

register(perturbator: Perturbator)
classmethod registry()
class h2o_sonar.utils.perturbations.QwertyPerturbator

Bases: Perturbator

Perturbator that replaces ‘y’ with ‘z’ and vice versa.

class h2o_sonar.utils.perturbations.RandomCharacterDeletePerturbator

Bases: Perturbator

class h2o_sonar.utils.perturbations.RandomCharacterInsertPerturbator

Bases: Perturbator

class h2o_sonar.utils.perturbations.RandomCharacterReplacementPerturbator

Bases: Perturbator

class h2o_sonar.utils.perturbations.RandomOCRCharacterPerturbator

Bases: Perturbator

class h2o_sonar.utils.perturbations.SynonymPerturbator

Bases: Perturbator, AbcSynAntPerturbator

Perturbator that replaces words with their synonyms.

class h2o_sonar.utils.perturbations.WordSwapPerturbator

Bases: Perturbator

Perturbator that swaps two words in a sentence.

h2o_sonar.utils.perturbations.register_ootb_perturbators()

Register out-of-the-box perturbators.

h2o_sonar.utils.sampling module

This module provides the following dataset sampling techniques:

  • StratifiedDatasetSampling: (default) Dataset sampler which implements both stratified and random sampling. The sampler automatically decided which sampling technique to use.

    • CONS:
      • stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)

    • PROS:
      • supports stratified (classification models) and random sampling (regression)

      • makes automatic decision of the sampling method (can be overriden w/ parameter)

      • random sampling is able to sample dataset bigger than the free RAM size

  • NoDatasetSampling: Sampler which is used when the user requests NO sampling. In order to avoid OOM/H2O Eval Studio crash it checks whether the datasets fits in RAM and if it doesn’t then it raises an exception with a request to sample/use a different dataset.

  • RandomPandasDatasetSampling: Dataset sampler which implements random sampling using Pandas.

    • CONS:
      • dataset must fit in free RAM (2x)

      • sampler does not support the stratification

    • PROS:
      • enables the use of Pandas sampler seamlessly in the H2O Eval Studio runtime

  • HeadOfDatasetSampling: Sampler which does not sample, but returns sampling_limit number of rows from the head of the dataset.

    • CONS:
      • sampled dataset will be very likely biased (should not be used in production)

    • PROS:
      • fast

      • handles dataset of any size

      • can be used for splitting and non-functional testing

class h2o_sonar.utils.sampling.DatasetSampler(system_limit: int = 1000000000)

Bases: ABC

The sampler children implementations various dataset sampling techniques.

H2O Eval Studio container samples the dataset upfront (based on the interpretation parameters) in order to protect the process/runtime (from the crash), the system (from OOO and extensive used of resources) and explainers from failures.

DEFAULT_CAT_NUM_THRESHOLD = 50
H2O_SONAR_LIMIT = 25000
SYSTEM_LIMIT = 1000000000
static is_dataset_fit_in_memory(dataset_path: str | Path)

Check whether the dataset file would fit to free RAM and return sizes.

Parameters:
dataset_pathstr

Dataset path.

Returns:
Tuple[bool, int, int]

Return whether the dataset will fit, dataset size in bytes and RAM size in bytes.

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = 0, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]

Sample dataset.

Parameters:
dataset: Union[datatable.Frame, str, pathlib.Path]

Dataset to be sampled as reference to the frame or a path to the file.

sampling_limitOptional[int] = None,

If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.

target_colstr = “”,

Optional target colum which is required for certain sampling techniques (like for stratified sampling).

is_classificationbool

If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.

drop_nan_rowsbool

True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.

drop_1_classesbool

True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.

classesOptional[List] = None

Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.

sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]

Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).

seedint

Optional random seed for reproducible sampling.

logger

Optional logger.

Returns:
Union[datatable.Frame, str]

Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.HeadOfDatasetSampling(chunk_size: int = 1000000)

Bases: DatasetSampler

Sampler which does not sample, but returns sampling limit number of examples from the head of the dataset.

PRESUMPTIONS:

  • sampled dataset will fit into free RAM

CONS:

  • it is NOT correct for the data science perspective and should NOT be used as it does not guarantee anything - the sampled dataset will very likely be biased i.e. may have completely different characteristics and statistics than the original dataset

PROS:

  • it can sample dataset of any size, therefore enables H2O Eval Studio to run on the dataset of any size - in case that the data science aspect is not a problem, this sampler might be a good choice

  • it is relatively fast in comparison to other samplers

  • it is ideal for non-functional testing

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]

Sample dataset.

Parameters:
dataset: Union[datatable.Frame, str, pathlib.Path]

Dataset to be sampled as reference to the frame or a path to the file.

sampling_limitOptional[int] = None,

If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.

target_colstr = “”,

Optional target colum which is required for certain sampling techniques (like for stratified sampling).

is_classificationbool

If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.

drop_nan_rowsbool

True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.

drop_1_classesbool

True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.

classesOptional[List] = None

Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.

sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]

Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).

seedint

Optional random seed for reproducible sampling.

logger

Optional logger.

Returns:
Union[datatable.Frame, str]

Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.NoDatasetSampling(check_ram: bool = True)

Bases: DatasetSampler

Sampler which does NO sampling and can check whether the dataset would fit in RAM and thus avoid H2O Eval Studio OOM crash. Used as default sampling method.

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]

Sample dataset.

Parameters:
dataset: Union[datatable.Frame, str, pathlib.Path]

Dataset to be sampled as reference to the frame or a path to the file.

sampling_limitOptional[int] = None,

If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.

target_colstr = “”,

Optional target colum which is required for certain sampling techniques (like for stratified sampling).

is_classificationbool

If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.

drop_nan_rowsbool

True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.

drop_1_classesbool

True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.

classesOptional[List] = None

Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.

sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]

Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).

seedint

Optional random seed for reproducible sampling.

logger

Optional logger.

Returns:
Union[datatable.Frame, str]

Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.RandomPandasDatasetSampling(logger=None)

Bases: DatasetSampler

Dataset sampler which implements random sampling using Pandas

pandas.DataFrame.sample().

  • CONS:

    • dataset must fit in free RAM (2x)

    • sampler does not support the stratification

  • PROS:

    • enables the use of Pandas sampler seamlessly in the H2O Eval Studio runtime

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]

Sample dataset.

Parameters:
dataset: Union[datatable.Frame, str, pathlib.Path]

Dataset to be sampled as reference to the frame or a path to the file.

sampling_limitOptional[int] = None,

If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.

target_colstr = “”,

Optional target colum which is required for certain sampling techniques (like for stratified sampling).

is_classificationbool

If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.

drop_nan_rowsbool

True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.

drop_1_classesbool

True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.

classesOptional[List] = None

Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.

sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]

Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).

seedint

Optional random seed for reproducible sampling.

logger

Optional logger.

Returns:
Union[datatable.Frame, str]

Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

class h2o_sonar.utils.sampling.StratifiedDatasetSampling

Bases: DatasetSampler

Dataset sampler which implements both stratified and random sampling.

  • CONS:

    • stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)

  • PROS:

    • supports stratified (classification models) and random sampling (regression)

    • makes automatic decision of the sampling method (can be overriden w/ parameter)

    • random sampling is able to sample dataset bigger than the free RAM size

sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]

Sample dataset.

Parameters:
dataset: Union[datatable.Frame, str, pathlib.Path]

Dataset to be sampled as reference to the frame or a path to the file.

sampling_limitOptional[int] = None,

If None, then automatically sample based on the dataset and RAM size. If > 0, then do sample the dataset to sampling_limit number of rows. If == 0, then do NOT sample.

target_colstr = “”,

Optional target colum which is required for certain sampling techniques (like for stratified sampling).

is_classificationbool

If None, then automatically choose stratified or random sampling. If True, then force stratified sampling. If False, then force random sampling.

drop_nan_rowsbool

True to drop rows with “not a number” value in the target_col column in case of classification-friendly techniques.

drop_1_classesbool

True to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in the target_col column in case of classification-friendly techniques.

classesOptional[List] = None

Optional specification of classes to be used for sampling (all valid classes will be used by default). classes values are expected to be a subset of the target column classes.

sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]

Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).

seedint

Optional random seed for reproducible sampling.

logger

Optional logger.

Returns:
Union[datatable.Frame, str]

Path to the sampled dataset (if the path to sampled_dataset_path has been specified), datatable Frame reference otherwise.

h2o_sonar.utils.sampling.downsample_dataset(dataset, sample_size: int | None = None, runtime_sample_size: int | None = None, target_col: str = '', is_classification: bool = False, classes: List | None = None, seed: int = 42, logger=None)

Dataset sampling method used by the explainers in Driverless AI (and potentially other container runtimes) to sample the input dataset according to their needs.

This method is not used by the local explainer as it samples the input dataset upfront to protect all the explainers. This is why this method serves as identity - it ensures that H2O Eval Studio’s sampling will not impact Driverless AI and other host runtimes.

Parameters:
dataset

Dataset to be sampled.

sample_sizeOptional[int]

Sampling limit to use.

runtime_sample_sizeOptional[int]

Runtime protection - sample dataset to this size even if sample_size is bigger to protect the runtime and avoid space (memory) / time overloading.

target_colstr

Target column to be used for the sampling.

is_classificationbool

Sample for regression (False) or classification (True).

classesOptional[List]

List of classes in case of sampling of classification model dataset.

seedint

Sampling seed.

logger

Logger.

Returns
——-
Any

Sample dataset.

h2o_sonar.utils.sanitization module

class h2o_sonar.utils.sanitization.DriverlessAiSanitizationMap(raw_names: List[str], sanitized_names: List[str])

Bases: SanitizationMap

Driverless AI model sanitization map.

Driverless AI (auto ML) model provides its own sanitization map. The purpose of this class is to make Driverless AI sanitization available vis standard SanitizationMap interface.

class h2o_sonar.utils.sanitization.SanitizationMap(raw_names: List[str], sanitized_names: List[str])

Bases: object

Map of original (raw) dataset column names/features to sanitized names and vice versa.

static ensure(cols, col) List[str]
static sanitize_value(values: str | List[str], special_chars: str = '|,=[]<\t\r\n:.~') str | List[str]

Method for feature values (labels, classes) sanitization. Note that column/feature name sanitization (handled by map) typically has different requirements than value sanitization. Also note that value sanitization is one way (original to sanitized only) and potentially may have collisions if sanitized in multiple calls to this method (collisions within one call of this function are resolved).

to_raw(names: str | List[str])

Sanitized name(s) to original (raw) name(s).

to_sanitized(names: str | List[str])

Original (raw) name(s) to sanitized name(s).

h2o_sonar.utils.sanitization.sanitize_frame(frame, sanitization_map: SanitizationMap | None = None)
h2o_sonar.utils.sanitization.sanitize_markdown(md_fragment: str) str

The purpose of this function is to sanitize a Markdown fragment string. It is NOT meant to sanitize whole Markdown documents, but its fragments where string (to be stored in Markdown) would interact with other Markdown elements.

Parameters:
md_fragmentstr

A Markdown fragment string.

Returns:
str

Sanitized Markdown string fragment without dangerous characters and links.

h2o_sonar.utils.sanitization.sanitize_names(names: str | List[str], sanitization_map: SanitizationMap | None = None)

Sanitize column/feature name(s) either using (model’s) sanitization map (if available) or using universal sanitization method.

Parameters:
namesUnion[str, List[str]]

Name(s) to be sanitized.

sanitization_mapOptional[SanitizationMap]

Optional sanitization map.

h2o_sonar.utils.sanitization.sanitize_strings(strings: str | List[str], replace_with: str = '_', special_chars: str = '|,=[]<\t\r\n:.~')

Sanitize a string or a list of strings.

Parameters:
stringsUnion[str, List[str]]

Strings to be sanitized.

replace_withstr

Character to be used for replacement for characters to be forbidden.

special_charsstr

Optional special characters to be sanitized.

Returns:
Union[str, List[str]]

Sanitized strings.

h2o_sonar.utils.testing module

H2O Eval Studio LLM / RAG testing utilities:

  • Raw test data: a dataset which was used to create the test configuration(s).

  • Test suite: a collection of tests (see below).

  • Test: a collection of documents (corpus) along with the test cases (see below) to be run in the context of the corpus.

  • Test case: a prompt, expected output (ground truth), categories, output condition, output constraints, … and other parameters to be used for a RAG / LLM model evaluation.

  • Test lab: a set of resolved tests enriched with answers (actual answer), retrieval context, response duration and other data obtained from the conversation with a RAG / LLM.

Resolved test lab is exported to LLM dataset which is then used as input to an evaluation - evaluation runs a set of evaluators to rank RAG / LLM models.

class h2o_sonar.utils.testing.InMemoryLlmHostPromptCache

Bases: LlmHostPromptCache

In-memory LLM host client cache:

  • initialization:
    • (pre-built) cache can be loaded from a JSon file

  • hints:
    • cache can be saved and loaded from a JSon file

    • pre-built cache can be created from a test lab (not implemented by this class)

    • when used in the testing environment, cloud deployment, … pre-built cache can be synchronized/downloaded from S3, filesystem, …

KEY_DATA = 'cache_data'
add_test_lab(test_lab: RagTestLab)

Add the test lab to the cache.

clear()

Clear the cache.

evict(key: str)

Evict the value from the cache for the given key.

get(key: str) Dict | None

Get the cached value for the given key.

Returned dictionary might be passed to a result class with types.

get_llm_model_names(explainable_model_type: ExplainableModelType) List[str]

List all the LLM model names known to the cache.

static load_from_json(file_path: str | Path)
static load_from_url(url: str, work_dir: str | Path = '')
put(key: str, value: Dict)

Put the value to the cache for the given key.

Parameters:
keystr

Cache key.

valueDict

Cache value - it is expected that the dictionary is JSon serialized LlmDataset.LlmDatasetRow.

save_to_json(file_path: str | Path)
to_dict() Dict
class h2o_sonar.utils.testing.LlmHostPromptCache

Bases: ABC

Prompt cache for the LLM host clients:

  • caches: - answer(s) (actual answer, duration, cost, chunks, …) for given prompt(s)

  • NOT caches: - corpus documents synchronization - RAG host server collection creation - LLM models listing

  • cache key: - does NOT consider particular host (like playground.h2ogpte.h2o.ai),

    but rather the LLM host type ~ connection type (like H2O_GPT_E or OPENAI_RAG)

    • does NOT consider particular chunk retrieval method

    • DOES consider corpus documents (empty for non-RAG), prompt, LLM model name, required context chunks (via chunk retrieval method - none or a method), …

    • DOES consider context (empty for RAG)

  • implementations (options): - in-memory cache (testing) - filesystem cache (pre-build JSON files) - Redis cache (shared by EvalStudio workers) - memcached cache (shared by EvalStudio workers) - …

  • utilities: - cache key generation - cache key hashing - static cache builder from serialized test labs (JSon)

  • purpose: - NON production use - for testing / demos / conference hands-on sessions only - significantly speed up the test lab completion - avoid test lab build failures due to unstable/slow/fragile system under test

    (like h2oGPTe server)

    • save costs (e.g. OpenAI server costs)

KEY_ACTUAL_OUTPUT = 'actual_output'
KEY_CONTEXT = 'context'
KEY_CORPUS = 'corpus'
KEY_COST = 'cost'
KEY_DURATION = 'actual_duration'
KEY_EXTRAS = 'extras'
KEY_INPUT = 'input'
KEY_LLM_MODEL_NAME = 'llm_model_name'
KEY_MODEL_TYPE = 'model_type'
PREFIX_KEY = 'CACHE-KEY::'
abstract clear()

Clear the cache.

abstract evict(key: str)

Evict the value from the cache for the given key.

abstract get(key: str) Dict | None

Get the cached value for the given key.

Returned dictionary might be passed to a result class with types.

static get_key(explainable_model_type: ExplainableModelType, prompt: str, llm_model_name: str, corpus: List[str] | None = None, extras: str = '') str

Generate cache key for the LLM host client cache:

  • does NOT consider particular host (like playground.h2ogpte.h2o.ai)

  • does NOT consider RAG collection

  • does NOT consider chunk retrieval method

  • suitable for both RAG hosts (empty corpus, no context) and LLM hosts

Parameters:
explainable_model_typemodels.ExplainableModelType

Explainable model type.

promptstr

Prompt for which the answer is to be cached.

llm_model_namestr

LLM model name whose answer is to be cached.

corpusOptional[List[str]]

Corpus documents - instead of relying on the collection (ID and name which may differ) corpus information is used.

extrasstr

Extra information - any other parameters which may make the cache key unique.

Returns:
str

Cache key.

abstract get_llm_model_names(explainable_model_type: ExplainableModelType) List[str]

List all the LLM model names known to the cache.

abstract put(key: str, value: Dict)

Put the value to the cache for the given key.

Parameters:
keystr

Cache key.

valueDict

Cache value - it is expected that the dictionary is JSon serialized LlmDataset.LlmDatasetRow.

static str_key_to_dict(key_dict: str) Dict
class h2o_sonar.utils.testing.RagTestCaseConfig(prompt: str, categories: str | List[str] = '', relationships=None, constraints=None, condition='', expected_output: str = '', config: RagTestConfig | None = None, key: str = '')

Bases: object

RAG / LLM test case configuration:

  • prompt

  • expected output

  • categories

  • condition (string expression)

  • constraints (any JSON serializable object)

KEY_CATEGORIES = 'categories'
KEY_CONDITION = 'condition'
KEY_CONSTRAINTS = 'constraints'
KEY_EXPECTED_OUTPUT = 'expected_output'
KEY_KEY = 'key'
KEY_PROMPT = 'prompt'
KEY_RELS = 'relationships'
add_relationship(relationship_type: str, target: str, target_type: str)
copy(update_key: bool = True)
perturb(perturbators: List[PerturbatorToRun], in_place: bool = True, raised_errors: List | None = None)

Perturb the prompt.

Parameters:
perturbatorsList[commons.PerturbatorToRun]

Perturbators to run - includes the perturbator ID, intensity, and parameters.

in_placebool

If True, perturb the prompt in place, otherwise create a new perturbed test case.

raised_errorsOptional[List]

If None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.

to_dict()
class h2o_sonar.utils.testing.RagTestConfig(documents: List[Path | str], key: str = '')

Bases: object

RAG / LLM test configuration:

  • corpus … a set of documents (empty for LLM evaluation) - test cases

    … a set of prompts, expected outputs, categories, conditions, …

KEY_DOCUMENTS = 'documents'
KEY_KEY = 'key'
static from_dict(key: str, as_dict: Dict) RagTestConfig
to_dict()
class h2o_sonar.utils.testing.RagTestLab(llm_host_connection: ConnectionConfig, raw_dataset: LlmDataset, evaluated_models: List[ExplainableRagModel | ExplainableLlmModel] | None = None, llm_model_names: List[str] | None = None, docs_cache_dir: str | Path = '', name: str = 'TestLab', description: str = 'Test lab for RAG / LLM evaluation.', llm_host_prompt_cache: LlmHostPromptCache | None = None, use_evaluated_model_collection_id: bool = False, logger=None)

Bases: TestLab

LLMs (Large Language Model) test lab:

  • TestLab is expected to test multiple LLMs either hosted by one service (like OpenAI) or by RAG (Retrieval-augmented generation) product (like h2oGPTe) or by LLM host product (like h2oGPT).

  • TestLab gets connection configuration to the host system.

  • TestLab can compare / benchmark multiple LLM models from the same host system.

  • Resolved test labs can be merged to get an aggregated lab -> LLM dataset with multiple LLM hosts for the side-by-side evaluation by the evaluate module.

bind(collection_id: str, collection_name: str, corpus: List | None = None)

Bind ALL the test lab RAG models to collection(s) instead of building it by creating collections and uploading documents.

build(doc_sync_meta: Dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, sync_documents: bool = True, fail_on_missing_corpus: bool = False)

Build the test lab so that it can be used for the evaluation:

  • synchronize the document cache

  • create RAG’s document collections

  • upload documents (corpora) to collection(s).

Parameters:
doc_sync_meta: Dict[str, Any]

Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like headers. Example:

http://example.com/doc1.txt”: {
“headers”: {

“foo-header”: “FOO-VALUE”,

}

}

progress_callbackOptional[progress.AbstractProgressCallbackContext]

Optional progress callback context.

sync_documentsbool

Sync documents from the network/filesystem to the lab’s document cache.

fail_on_missing_corpusbool

Fail if the test does not specify any corpus.

complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: str | Path | None = None, parallelize: int = 0, multi_turn: bool = False, retry_on_error: int = 2, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, include_llm_meta: bool = True, raise_on_all_tcs_fail: bool = True, purge_workdir: bool = True)

Complete the dataset with the actual values from an LLM host.

Parameters:
complete_contextint

How many context text chunks to include in the resolved dataset.

progress_callbackOptional[progress.AbstractProgressCallbackContext]

Optional progress callback context.

save_as_you_goOptional[Union[str, pathlib.Path]]

Save the dataset as JSON after each input is resolved.

parallelizeint

Complete the dataset in parallel using multiple processes. Use -1 for auto-choice of the number of workers, 0 to disable parallelization (will create the lab using sequential requests), and ``1``+ (positive integer) to specify the number of workers.

multi_turnbool

Whether to use multi-turn chat with the LLM host - if enabled, then all test cases within the test will be handled within the single session i.e. the same chat session i.e. the same context.

retry_on_errorint

How many times to retry the failed LLM host requests.

timeout_exp_backoffOptional[TimeoutRetryExpBackoffCtx]

Optionally override timeout which can be specified in the model host configuration ExplainableRagModel::model_cfg and which is model host type specific and use exponential backoff strategy for the timeout handling. Timeout is increased on each retry by the backoff factor.

include_llm_metabool

Whether to include the LLM meta-data like performance statistics.

raise_on_all_tcs_fail:

Raise an exception if all test cases fail.

purge_workdirbool

Purge the working directory with lab shards after the completion.

complete_from_shards(execution_dir_path: str | Path)

Complete the test lab from the shards stored on the filesystem. This method is used to load previously completed test lab shards and merge them into a single resolved dataset.

static from_eval_results(eval_results_path: str | Path, interpretation_json_path: str | Path, raw_dataset_empty: bool = True)

Create a test lab from the evaluation results archive.

Parameters:
eval_results_pathUnion[str, pathlib.Path]

Path to the evaluation results JSon file path.

interpretation_json_pathUnion[str, pathlib.Path]

Path to the interpretation JSon file path.

raw_dataset_emptybool

Whether to create an empty raw dataset or copy resolved dataset

static from_llm_test_suite(llm_host_connection: ConnectionConfig, llm_test_suite: RagTestSuiteConfig, llm_model_type: ExplainableModelType, llm_model_names: List[str], work_dir: str | Path = '', llm_models_cfgs: Dict[str, List[Dict]] = None, llm_host_prompt_cache: LlmHostPromptCache | None = None) RagTestLab

Create new (unresolved) test lab from the LLM test suite configuration.

static from_rag_test_suite(rag_connection: ConnectionConfig, rag_test_suite: RagTestSuiteConfig, rag_model_type: ExplainableModelType, llm_model_names: List[str], docs_cache_dir: str | Path, rag_models_cfgs: Dict[str, List[Dict]] = None, predefined_collection_id: str | Dict | None = None, llm_host_prompt_cache: LlmHostPromptCache | None = None) RagTestLab

Create new (unresolved) test lab from the RAG test suite configuration.

Test lab is build as follows:

  • all LLM model names are hosted by the SAME system described by the RAG connection, accessed by a client

  • LLM model name may have associated a list of custom client configurations

  • RAG test suite is used to build the test lab: - RAG test suite has test cases that are grouped to tests - test cases within the test has the SAME corpus,

    different tests may have different corpora

  • explainable RAG model is… - created for EACH: LLM model name + config + corpus (not test) - carthesian product of:

    LLM model names x client configurations x corpora = explainable RAG models

Summary of the explainable RAG models creation:

  • for each LLM model name
    • for each client configuration of that LLM model name
      • for each test
        • create explainable RAG model

Parameters:
rag_connectionh2o_sonar_config.ConnectionConfig

Connection to the RAG system.

rag_test_suiteRagTestSuiteConfig

RAG test suite configuration.

rag_model_typeExplainableModelTypes

Type of the explainable model hosted by the RAG system.

llm_model_namesList[str]

List of LLM model names to be used to build the test lab and to be subsequently evaluated and compared. There are the following special names which can be used with h2oGPTe model host: - auto: to use the best available model chosen by h2oGPTe - ``: empty string to inherit configuration from the h2oGPTe collection - ``None: to inherit configuration from the h2oGPTe collection

rag_models_cfgsDict[str, List[Dict]]

Dictionary with LLM model name as key and list of client configurations as values. Each client configuration is a dictionary with the client configuration parameters which can be created by the client factory using client.config_factory().

docs_cache_dirUnion[str, pathlib.Path]

Directory to store the documents cache.

predefined_collection_idOptional[Union[str, Dict]]

Predefined collection ID for the RAG model. If provided as a string, it is used as the collection ID for all the test cases. If provided as a dictionary, it is used as a mapping of the test case keys to the collection IDs.

llm_host_prompt_cacheOptional[LlmHostPromptCache]

Cache for the LLM host client.

Returns:
RagTestLab

New RAG test lab.

get_evaluated_model_for_key(model_key: str)

Get LLM model name for the evaluated model.

insight_internal_llm_errors(report_dir: str | Path = '', src: str = 'stats') Tuple[Dict, str]

Create Markdown report with the internal LLM errors.

Parameters:
report_dirUnion[str, pathlib.Path]

Directory to save the reports as JSon and Markdown to.

srcstr

Source of the errors: stats (default) or dataset (text of answers analysis).

integrity_check()
static load_from_json(llm_host_connection: ConnectionConfig, file_path: str | Path, docs_cache_dir: str | Path = '', datatable_format: bool = False) RagTestLab
merge(other_test_lab: RagTestLab, other_llm_prefix: str = 'Other')

Merge another test lab into this one.

purge()

Purge the test lab by deleting all the created collections/assistants and uploaded documents.

split_to_shards(base_dir: Path, max_total_workers: int = 20) Dict

Split the test lab into shards by RAG model (which is identified by corpus and base LLM model name). If there is one RAG model (or just a few), then even the inputs of particular model are split into shards. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.

Sharding strategy:

  • 1 RAG model:
    • split the inputs of the model for max 20 workers (split to 20 shards)

    • the minimum number of inputs per worker is 2 (consider process overhead)

  • >1 RAG model:
    • if the number of models is GREATER than 10, split the inputs by the RAG model i.e. the number of needed workers is equal to the number of RAG models

    • if the number of models is SMALLER or equal to 10, then use up to 20 workers to split the inputs

Parameters:
base_dirpathlib.Path

Base directory where to store the shards - JSon representation of test labs.

max_total_workersint

The number of workers which is used to split the inputs of the SINGLE model (or lab with just a few models).

split_to_shards_by_model(base_dir: Path) Dict

Split the test lab into shards by RAG model - which is identified by corpus and base LLM model name. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.

stats() Dict

Get the test lab statistics and cross-check.

sync_documents(doc_sync_meta: Dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, fail_on_missing_corpus: bool = False) Path

Cache test suite documents from the network to the local filesystem so that they can be used for RAG evaluation later.

Parameters:
doc_sync_meta: Dict[str, Any]

Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like headers. Example:

http://example.com/doc1.txt”: {
“headers”: {

“foo-header”: “FOO-VALUE”,

}

}

progress_callbackOptional[progress.AbstractProgressCallbackContext]

Optional progress callback context.

fail_on_missing_corpusbool

Fail if a test has empty corpus, or create dummy document to enable empty RAG corpora

to_dict() Dict
trim(max_llm_models_count=None)

Trim the test lab by keeping only specified number of LLM models and removing all the orphans.

class h2o_sonar.utils.testing.RagTestLabPromptCache(singleton_create_key)

Bases: object

RAG test lab prompt cache (singleton) to be used across H2O Eval Studio.

ENV_VAR_H2O_SONAR_PROMPT_CACHE: str = 'H2O_SONAR_PROMPT_CACHE_ENABLED'
ENV_VAR_H2O_SONAR_PROMPT_CACHE_SIZE: str = 'H2O_SONAR_PROMPT_CACHE_SIZE'
ENV_VAR_H2O_SONAR_PROMPT_CACHE_SRC: str = 'H2O_SONAR_PROMPT_CACHE_SRC'
ENV_VAR_H2O_SONAR_PROMPT_CACHE_STATIC: str = 'H2O_SONAR_PROMPT_CACHE_STATIC'
MAX_ITEMS = 5000
classmethod cache()
reinitialize(enable_cache: bool | None = None, src_path: str | Path | None = None, src_host_connection: ConnectionConfig | None = None, max_items: int | None = None)
class h2o_sonar.utils.testing.RagTestSuiteConfig(test_cases: List[RagTestCaseConfig] | None = None, name: str = 'TestSuite', description: str = 'Test suite for RAG / LLM evaluation.')

Bases: object

RAG / LLM test suite configuration:

  • test suite (RagTestSuiteConfig) … a set of tests - tests (RagTestConfig)

    … corpus with a set of test cases - test cases (RagTestCaseConfig)

    … prompt, expected output, categories, conditions, …

KEY_DESCRIPTION = 'description'
KEY_NAME = 'name'
KEY_TESTS = 'tests'
KEY_TEST_CASES = 'test_cases'
add_test_case(test_case: RagTestCaseConfig)
copy() RagTestSuiteConfig
static from_llm_dataset(llm_dataset: LlmDataset) RagTestSuiteConfig

Create RAG test configuration from the LLM dataset.

static load_from_json(file_path: str | Path)
perturb(perturbators: List[PerturbatorToRun], in_place: bool = True, raised_errors: List | None = None)

Perturb the test suite prompts.

Parameters:
perturbatorsList[commons.PerturbatorToRun]

Perturbators to run - includes the perturbator ID, intensity, and parameters.

in_placebool

If True, perturb the test cases in place - there will be the same number of tests and test cases within the test suite Otherwise keep the original test cases and create new perturbed test cases - there will be 2x more test cases in the test suite after the perturbation (all intermediary perturbations in case of multiple perturbator IDs are discarded).

raised_errorsOptional[List]

If None, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.

save_as_json(file_path: str | Path)
split(max_tests: int) List[RagTestSuiteConfig]

Split the test suite to multiple test suites so that each test suite has at most the given number of tests.

Parameters:
max_testsint

Maximum number of tests in a test suite.

Returns:
List[RagTestSuiteConfig]

List of new test suites.

property tests: List[RagTestConfig]
to_dict() Dict
trim_tests(max_tests: int)

Trim the test suite to the given number of tests.

class h2o_sonar.utils.testing.TestLab

Bases: ABC

A test target / product test lab.

KEY_BASE_MODEL_NAMES = 'llm_model_names'
KEY_DATASET = 'dataset'
KEY_DESCRIPTION = 'description'
KEY_DOCS_CACHE = 'docs_cache'
KEY_MODELS = 'models'
KEY_NAME = 'name'
KEY_RAW_DATASET = 'raw_dataset'
PARALLEL_RUN = -1
SEQUENTIAL_RUN = 0
build()

Build / deploy / materialize the test lab on the host system e.g. by creating RAG’s document collections, uploading documents to the collection, …

complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: str | Path | None = None, parallelize: int = 0, retry_on_error: int = 2, purge_workdir: bool = True)

Complete the LLM dataset with the actual values from the host system.

Parameters:
complete_contextint

How many context text chunks to include in the resolved dataset.

progress_callbackOptional[progress.AbstractProgressCallbackContext]

Optional progress callback context.

save_as_you_goOptional[Union[str, pathlib.Path]]

Save the dataset as JSON after each input is resolved.

parallelizeint

Complete the dataset in parallel using multiple processes. Use -1 for auto-choice of the number of workers, 0 to disable parallelization (will create the lab using sequential requests), and ``1``+ (positive integer) to specify the number of workers.

retry_on_errorint

How many times to retry the failed LLM host requests.

purge_workdirbool

Purge the working directory with lab shards after the completion.

save_as_json(file_path: str | Path)
to_dict() Dict

Module contents