h2o_sonar.utils package
Submodules
h2o_sonar.utils.binning module
- h2o_sonar.utils.binning.build_qtile_bins(bins: List, X: DataFrame, feature: str, quantile: int)
Build quantile bins and append back to input bins list.
- Parameters:
- binsList
List of bins.
- Xpandas.DataFrame
Input frame to PD/ICE.
- featurestr
Feature to create quantile bins for.
- quantileint
The decile to compute.
- h2o_sonar.utils.binning.qbin_column(frame: Frame, column: str, logger)
Quantile bin a column in a frame and substitute it in that frame with quantile group ranges for each row.
- Parameters:
- framedatatable.Frame
Frame containing the data. One of the column names must correspond to the column parameter.
- columnstr
Name of the column to be checked.
- loggerLogger
Logger.
- h2o_sonar.utils.binning.quantile_bin(frame: Frame = None, qbin_cols: List[str] | None = None, qbin_count: int = 0, varimp_list: List[str] | None = None, logger=None)
Quantile binning.
- Parameters:
- framedt.Frame
Input frame for quantile binning.
- qbin_colsList
Column(s) to use for quantile binning
- qbin_countint
Number of top numeric variables to use from model’s variable importance list.
- varimp_listList
Variable importance list from model.
- loggerLogger
Logger.
- Returns:
- Tuple[list, Pandas Dataframe]
List of columns that were binned and Dataframe with quantile binned columns.
h2o_sonar.utils.caching module
Caching module provides functionality to download and cached models used for evaluation upfront, to avoid downloading the models in the runtime.
- h2o_sonar.utils.caching.cache_all_models(logger: SonarLogger)
Cache all the models used in the Sonar package.
- h2o_sonar.utils.caching.cache_baai_bge_small_en(logger: SonarLogger)
Cache the BAAI BGE small en
- h2o_sonar.utils.caching.cache_baai_bge_small_env15(logger: SonarLogger)
Cache the BAAI BGE small environment v1.5 model.
- h2o_sonar.utils.caching.cache_bert_base_uncased(logger: SonarLogger)
Cache the BERT base uncased model.
- h2o_sonar.utils.caching.cache_bge_m3(logger: SonarLogger)
Cache the BGE m3
- h2o_sonar.utils.caching.cache_detoxify_models(logger: SonarLogger)
Download and cache the Detoxify models.
- h2o_sonar.utils.caching.cache_eval_studio_models(logger: SonarLogger)
Download the Eval Studio models from the S3
- h2o_sonar.utils.caching.cache_gptscore_evaluator_model(logger: SonarLogger)
Cache default model for gptscore evaluator
- h2o_sonar.utils.caching.cache_hkunlp_instructor(logger: SonarLogger)
Cache hkunlp Instructor
- h2o_sonar.utils.caching.cache_lmppl_perplexity_evaluator_model(logger: SonarLogger)
Cache default model for perplexity evaluator
- h2o_sonar.utils.caching.cache_nltk(logger: SonarLogger)
Cache the NLTK models.
Punkt - used in BLEU and perturbations
averaged_perceptron_tagger - used in perturbations
wordnet - used in perturbations
- h2o_sonar.utils.caching.cache_nltk_averaged_perceptron_tagger(logger: SonarLogger | None = None)
- h2o_sonar.utils.caching.cache_nltk_punkt(logger: SonarLogger | None = None)
- h2o_sonar.utils.caching.cache_nltk_wordnet(logger: SonarLogger | None = None)
- h2o_sonar.utils.caching.cache_summac_vitc(logger: SonarLogger)
Cache the summac used for summarization
- h2o_sonar.utils.caching.cache_tiktoken_blobs(logger: SonarLogger)
Cache the TikToken blobs.
- h2o_sonar.utils.caching.cache_vectara_hallucination_model(logger: SonarLogger)
Cache the Vectara hallucination evaluation model.
h2o_sonar.utils.crypto module
- h2o_sonar.utils.crypto.decrypt(encryption_key: str, data: str) str
- h2o_sonar.utils.crypto.encrypt(encryption_key: str, data: str) str
- h2o_sonar.utils.crypto.resolve_encryption_key(encryption_key: str = '') str
h2o_sonar.utils.io module
- h2o_sonar.utils.io.from_list_explainers_args_json(args_str: str) Dict
Deserialize
interpret.py::list_explainers()
method arguments from JSon string to dictionary which might be used as a Python method kwargs.
- h2o_sonar.utils.io.from_run_interpretation_args_json(args_str: str) Dict
Deserialize
interpret.py::run_interpretation()
method arguments from JSon string to dictionary which might be used as a Python method kwargs.
- h2o_sonar.utils.io.load_list_explainers_args_json(file_path) Dict
Load
list_explainers()
keyword arguments from file.
- h2o_sonar.utils.io.load_run_interpretation_args_json(file_path) Dict
Load
run_interpretation()
keyword arguments from file.
- h2o_sonar.utils.io.to_list_explainers_args_json(experiment_types: List[str] | None = None, explanation_scopes: List[str] | None = None, model_meta: ExplainableModelMeta | None = None, keywords: List[str] | None = None, explainer_filter: List[FilterEntry] | None = None, extra_params: Dict | None = None) str
Serialize
interpret.py::list_explainers()
method arguments as JSon.
- h2o_sonar.utils.io.to_run_interpretation_args_json(dataset: str = '', model: str = '', target_col: str = '', explainers: List[str | ExplainerToRun] | None = None, explainer_keywords: List[str] | None = None, validset: str = '', testset: str = '', use_raw_features: bool = True, used_features: List | None = None, weight_col: str = '', prediction_col: str = '', drop_cols: List | None = None, sample_num_rows: int | None = None, log_level: int = 30, results_location: str = None, persistence_type: PersistenceType = PersistenceType.file_system, run_asynchronously: bool = False, run_explainers_in_parallel: bool = False, extra_params: Dict | None = None) str
Serialize
interpret.py::run_interpretation()
job arguments as JSon.
h2o_sonar.utils.normalization module
- h2o_sonar.utils.normalization.normalize_importance(frame: Frame) Frame
Normalize local feature importance values to global as percentage.
- Parameters:
- framedatatable.Frame
Frame with local feature importance values.
- Returns:
- datatable.Frame
Normalized frame with global feature importance values.
h2o_sonar.utils.preprocessing module
- class h2o_sonar.utils.preprocessing.MultiColumnLabelEncoder(columns=None)
Bases:
LabelEncoder
Wraps sklearn
LabelEncoder
functionality for use on multiple columns of a PandasDataFrame
.- fit(dframe)
Fit label encoder to Pandas columns. Access individual column classes via indexing self.all_classes_. Access individual column encoders via indexing self.all_encoders_
- fit_transform(dframe)
Fit label encoder and return encoded labels. Access individual column classes via indexing self.all_classes_ Access individual column encoders via indexing self.all_encoders_ Access individual column encoded labels via indexing self.all_labels_
- inverse_transform(dframe)
Transform labels back to original encoding.
- set_fit_request(*, dframe: bool | None | str = '$UNCHANGED$') MultiColumnLabelEncoder
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
dframe
parameter infit
.
- Returns:
- selfobject
The updated object.
- set_inverse_transform_request(*, dframe: bool | None | str = '$UNCHANGED$') MultiColumnLabelEncoder
Request metadata passed to the
inverse_transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toinverse_transform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toinverse_transform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
dframe
parameter ininverse_transform
.
- Returns:
- selfobject
The updated object.
- set_transform_request(*, dframe: bool | None | str = '$UNCHANGED$') MultiColumnLabelEncoder
Request metadata passed to the
transform
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed totransform
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it totransform
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters:
- dframestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
dframe
parameter intransform
.
- Returns:
- selfobject
The updated object.
- transform(dframe)
Transform labels to normalized encoding.
- h2o_sonar.utils.preprocessing.categorical_encoder(X: DataFrame) Tuple[DataFrame, MultiColumnLabelEncoder, List]
h2o_sonar.utils.problem_detection module
- h2o_sonar.utils.problem_detection.get_feature_importance_problems(shap_means_dict: Dict[str, Frame], threshold: float, explainer_id: str, explainer_display_name: str) List[ProblemAndAction]
Get feature importance problems and suggested actions based on SHAP values above a specified threshold.
- Parameters:
- shap_means_dictDict[str, datatable.Frame]
A datatable Frame containing Shapley values.
- thresholdfloat
Threshold for showing potential data leakage in the most important feature.
- explainer_idstr
Explainer id.
- explainer_display_name: str
Explainer display name
- Returns:
- List[problems.ProblemAndAction]
A list of problems and actions.
h2o_sonar.utils.perturbations module
- class h2o_sonar.utils.perturbations.AbcSynAntPerturbator
Bases:
ABC
- PUNCTUATION = ('.', ',', '?', '!', ':', ';', "'", '"', '(', '[', '{')
- TAGS = ('CD', 'JJ', 'JJR', 'JJS', 'NN', 'NNS', 'RB', 'RBR', 'RBS')
- class h2o_sonar.utils.perturbations.AntonymPerturbator
Bases:
Perturbator
,AbcSynAntPerturbator
Perturbator that replaces words with their antonyms.
- class h2o_sonar.utils.perturbations.CommaPerturbator
Bases:
Perturbator
Perturbator that adds a comma after some words. It mimics a common mistake in English writing and/or typos.
- class h2o_sonar.utils.perturbations.ContextualMisinformationPerturbator
Bases:
Perturbator
,AbcAgenticPerturbator
Contextual misinformation perturbator is agent-based perturbator that introduces factually incorrect information within a seemingly plausible context, aiming to mislead the model into accepting false statements - adversarial attack.
- is_compatible() bool
- class h2o_sonar.utils.perturbations.EncodingPerturbator
Bases:
Perturbator
Perturbator that encodes the prompt to specified encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.
See: https://substack.com/home/post/p-156004330
- TYPE_ANSWER_DECODED = 'answer_decoded'
- TYPE_ANSWER_ENCODED = 'answer_encoded'
- TYPE_PROMPT_DECODED = 'prompt_decoded'
- TYPE_PROMPT_ENCODED = 'prompt_encoded'
- class h2o_sonar.utils.perturbations.EncodingPerturbatorBase16
Bases:
EncodingPerturbator
Perturbator that encodes the prompt using base64 encoding to steer the model to answer in a specified encoding. This perturbation can be used to surpass the model’s safety filters (guardrails) and generate unsafe content.
- class h2o_sonar.utils.perturbations.KeywordTyposCharacterPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.Perturbator
Bases:
ABC
Base class for perturbators.
- as_descriptor() PerturbatorDescriptor
- classmethod config_max_items() int
- property description: str
- property display_name: str
- classmethod is_compatible() bool
- property keywords: List[str]
- perturb(text: str | List[str], intensity: PerturbationIntensity = PerturbationIntensity.MEDIUM, retries: int = 15, raised_errors: List | None = None, **perturbation_params) str | List[str] | None
Perturb the input text with the given intensity.
- Parameters:
- textUnion[str, List[str]]
Text to perturb.
- intensityUnion[PerturbationIntensity, str]
Perturbation intensity.
- retriesint, optional
Number of retries if the perturbation does not yield a new text.
- raised_errorsOptional[List]
If
None
, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.
- classmethod perturbator_id() str
- class h2o_sonar.utils.perturbations.PerturbatorDescriptor(perturbator_id: str, display_name: str = '', description: str = '', keywords: List[str] | None = None)
Bases:
object
- clone() PerturbatorDescriptor
- dump() dict
- static load(d: Dict) PerturbatorDescriptor
- class h2o_sonar.utils.perturbations.PerturbatorRegistry(singleton_create_key)
Bases:
object
Registry of perturbators.
- are_compatible(perturbators: List[PerturbatorToRun], items: int = 0) List[PerturbatorToRun]
- describe_perturbator(perturbator_id: str) PerturbatorDescriptor | None
- get_perturbator(perturbator_id: str) Perturbator | None
- is_compatible(perturbator_id: str, items: int = 0) bool
Is the perturbator available and compatible given metadata declarations?
- list_perturbators(keywords: List[str] | None = None) List[Perturbator]
List and optionally filter perturbators by keywords - if multiple keywords are provided, the perturbator must have all of them to be included in the result.
- register(perturbator: Perturbator)
- classmethod registry()
- class h2o_sonar.utils.perturbations.QwertyPerturbator
Bases:
Perturbator
Perturbator that replaces ‘y’ with ‘z’ and vice versa.
- class h2o_sonar.utils.perturbations.RandomCharacterDeletePerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.RandomCharacterInsertPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.RandomCharacterReplacementPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.RandomOCRCharacterPerturbator
Bases:
Perturbator
- class h2o_sonar.utils.perturbations.SynonymPerturbator
Bases:
Perturbator
,AbcSynAntPerturbator
Perturbator that replaces words with their synonyms.
- class h2o_sonar.utils.perturbations.WordSwapPerturbator
Bases:
Perturbator
Perturbator that swaps two words in a sentence.
- h2o_sonar.utils.perturbations.register_ootb_perturbators()
Register out-of-the-box perturbators.
h2o_sonar.utils.sampling module
This module provides the following dataset sampling techniques:
StratifiedDatasetSampling
: (default) Dataset sampler which implements both stratified and random sampling. The sampler automatically decided which sampling technique to use.- CONS:
stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)
- PROS:
supports stratified (classification models) and random sampling (regression)
makes automatic decision of the sampling method (can be overriden w/ parameter)
random sampling is able to sample dataset bigger than the free RAM size
NoDatasetSampling
: Sampler which is used when the user requests NO sampling. In order to avoid OOM/H2O Eval Studio crash it checks whether the datasets fits in RAM and if it doesn’t then it raises an exception with a request to sample/use a different dataset.RandomPandasDatasetSampling
: Dataset sampler which implements random sampling using Pandas.- CONS:
dataset must fit in free RAM (2x)
sampler does not support the stratification
- PROS:
enables the use of Pandas sampler seamlessly in the H2O Eval Studio runtime
HeadOfDatasetSampling
: Sampler which does not sample, but returnssampling_limit
number of rows from the head of the dataset.- CONS:
sampled dataset will be very likely biased (should not be used in production)
- PROS:
fast
handles dataset of any size
can be used for splitting and non-functional testing
- class h2o_sonar.utils.sampling.DatasetSampler(system_limit: int = 1000000000)
Bases:
ABC
The sampler children implementations various dataset sampling techniques.
H2O Eval Studio container samples the dataset upfront (based on the interpretation parameters) in order to protect the process/runtime (from the crash), the system (from OOO and extensive used of resources) and explainers from failures.
- DEFAULT_CAT_NUM_THRESHOLD = 50
- H2O_SONAR_LIMIT = 25000
- SYSTEM_LIMIT = 1000000000
- static is_dataset_fit_in_memory(dataset_path: str | Path)
Check whether the dataset file would fit to free RAM and return sizes.
- Parameters:
- dataset_pathstr
Dataset path.
- Returns:
- Tuple[bool, int, int]
Return whether the dataset will fit, dataset size in bytes and RAM size in bytes.
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = 0, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: Union[datatable.Frame, str, pathlib.Path]
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitOptional[int] = None,
If
None
, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedataset
tosampling_limit
number of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None
, then automatically choose stratified or random sampling. IfTrue
, then force stratified sampling. IfFalse
, then force random sampling.- drop_nan_rowsbool
True
to drop rows with “not a number” value in thetarget_col
column in case of classification-friendly techniques.- drop_1_classesbool
True
to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_col
column in case of classification-friendly techniques.- classesOptional[List] = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classes
values are expected to be a subset of the target column classes.- sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- Union[datatable.Frame, str]
Path to the sampled dataset (if the path to
sampled_dataset_path
has been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.HeadOfDatasetSampling(chunk_size: int = 1000000)
Bases:
DatasetSampler
Sampler which does not sample, but returns sampling limit number of examples from the head of the dataset.
PRESUMPTIONS:
sampled dataset will fit into free RAM
CONS:
it is NOT correct for the data science perspective and should NOT be used as it does not guarantee anything - the sampled dataset will very likely be biased i.e. may have completely different characteristics and statistics than the original dataset
PROS:
it can sample dataset of any size, therefore enables H2O Eval Studio to run on the dataset of any size - in case that the data science aspect is not a problem, this sampler might be a good choice
it is relatively fast in comparison to other samplers
it is ideal for non-functional testing
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: Union[datatable.Frame, str, pathlib.Path]
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitOptional[int] = None,
If
None
, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedataset
tosampling_limit
number of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None
, then automatically choose stratified or random sampling. IfTrue
, then force stratified sampling. IfFalse
, then force random sampling.- drop_nan_rowsbool
True
to drop rows with “not a number” value in thetarget_col
column in case of classification-friendly techniques.- drop_1_classesbool
True
to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_col
column in case of classification-friendly techniques.- classesOptional[List] = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classes
values are expected to be a subset of the target column classes.- sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- Union[datatable.Frame, str]
Path to the sampled dataset (if the path to
sampled_dataset_path
has been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.NoDatasetSampling(check_ram: bool = True)
Bases:
DatasetSampler
Sampler which does NO sampling and can check whether the dataset would fit in RAM and thus avoid H2O Eval Studio OOM crash. Used as default sampling method.
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: Union[datatable.Frame, str, pathlib.Path]
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitOptional[int] = None,
If
None
, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedataset
tosampling_limit
number of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None
, then automatically choose stratified or random sampling. IfTrue
, then force stratified sampling. IfFalse
, then force random sampling.- drop_nan_rowsbool
True
to drop rows with “not a number” value in thetarget_col
column in case of classification-friendly techniques.- drop_1_classesbool
True
to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_col
column in case of classification-friendly techniques.- classesOptional[List] = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classes
values are expected to be a subset of the target column classes.- sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- Union[datatable.Frame, str]
Path to the sampled dataset (if the path to
sampled_dataset_path
has been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.RandomPandasDatasetSampling(logger=None)
Bases:
DatasetSampler
- Dataset sampler which implements random sampling using Pandas
pandas.DataFrame.sample()
.
CONS:
dataset must fit in free RAM (2x)
sampler does not support the stratification
PROS:
enables the use of Pandas sampler seamlessly in the H2O Eval Studio runtime
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: Union[datatable.Frame, str, pathlib.Path]
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitOptional[int] = None,
If
None
, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedataset
tosampling_limit
number of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None
, then automatically choose stratified or random sampling. IfTrue
, then force stratified sampling. IfFalse
, then force random sampling.- drop_nan_rowsbool
True
to drop rows with “not a number” value in thetarget_col
column in case of classification-friendly techniques.- drop_1_classesbool
True
to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_col
column in case of classification-friendly techniques.- classesOptional[List] = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classes
values are expected to be a subset of the target column classes.- sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- Union[datatable.Frame, str]
Path to the sampled dataset (if the path to
sampled_dataset_path
has been specified), datatable Frame reference otherwise.
- class h2o_sonar.utils.sampling.StratifiedDatasetSampling
Bases:
DatasetSampler
Dataset sampler which implements both stratified and random sampling.
CONS:
stratified sampling can sample datasets up to 50% of the free RAM (sklearn sampler is the bottleneck)
PROS:
supports stratified (classification models) and random sampling (regression)
makes automatic decision of the sampling method (can be overriden w/ parameter)
random sampling is able to sample dataset bigger than the free RAM size
- sample_dataset(dataset: Frame | str | Path, sampling_limit: int | None = None, target_col: str = '', is_classification: bool = False, drop_nan_rows: bool = True, drop_1_classes: bool = True, classes: List | None = None, sampled_dataset_path: Frame | str | Path = '', seed: int = 42, logger=None) Tuple[bool, Frame, str]
Sample dataset.
- Parameters:
- dataset: Union[datatable.Frame, str, pathlib.Path]
Dataset to be sampled as reference to the frame or a path to the file.
- sampling_limitOptional[int] = None,
If
None
, then automatically sample based on the dataset and RAM size. If > 0, then do sample thedataset
tosampling_limit
number of rows. If == 0, then do NOT sample.- target_colstr = “”,
Optional target colum which is required for certain sampling techniques (like for stratified sampling).
- is_classificationbool
If
None
, then automatically choose stratified or random sampling. IfTrue
, then force stratified sampling. IfFalse
, then force random sampling.- drop_nan_rowsbool
True
to drop rows with “not a number” value in thetarget_col
column in case of classification-friendly techniques.- drop_1_classesbool
True
to drop rows which represent classes with cardinality equal to 1 (categories which are represented by exactly one row in the dataset) in thetarget_col
column in case of classification-friendly techniques.- classesOptional[List] = None
Optional specification of classes to be used for sampling (all valid classes will be used by default).
classes
values are expected to be a subset of the target column classes.- sampled_dataset_pathUnion[datatable.Frame, str, pathlib.Path]
Optional path to the sampled dataset file to be created (if no path is specified, then the method returns the reference to datatable frame).
- seedint
Optional random seed for reproducible sampling.
- logger
Optional logger.
- Returns:
- Union[datatable.Frame, str]
Path to the sampled dataset (if the path to
sampled_dataset_path
has been specified), datatable Frame reference otherwise.
- h2o_sonar.utils.sampling.downsample_dataset(dataset, sample_size: int | None = None, runtime_sample_size: int | None = None, target_col: str = '', is_classification: bool = False, classes: List | None = None, seed: int = 42, logger=None)
Dataset sampling method used by the explainers in Driverless AI (and potentially other container runtimes) to sample the input dataset according to their needs.
This method is not used by the local explainer as it samples the input dataset upfront to protect all the explainers. This is why this method serves as identity - it ensures that H2O Eval Studio’s sampling will not impact Driverless AI and other host runtimes.
- Parameters:
- dataset
Dataset to be sampled.
- sample_sizeOptional[int]
Sampling limit to use.
- runtime_sample_sizeOptional[int]
Runtime protection - sample dataset to this size even if
sample_size
is bigger to protect the runtime and avoid space (memory) / time overloading.- target_colstr
Target column to be used for the sampling.
- is_classificationbool
Sample for regression (
False
) or classification (True
).- classesOptional[List]
List of classes in case of sampling of classification model dataset.
- seedint
Sampling seed.
- logger
Logger.
- Returns
- ——-
- Any
Sample dataset.
h2o_sonar.utils.sanitization module
- class h2o_sonar.utils.sanitization.DriverlessAiSanitizationMap(raw_names: List[str], sanitized_names: List[str])
Bases:
SanitizationMap
Driverless AI model sanitization map.
Driverless AI (auto ML) model provides its own sanitization map. The purpose of this class is to make Driverless AI sanitization available vis standard
SanitizationMap
interface.
- class h2o_sonar.utils.sanitization.SanitizationMap(raw_names: List[str], sanitized_names: List[str])
Bases:
object
Map of original (raw) dataset column names/features to sanitized names and vice versa.
- static ensure(cols, col) List[str]
- static sanitize_value(values: str | List[str], special_chars: str = '|,=[]<\t\r\n:.~') str | List[str]
Method for feature values (labels, classes) sanitization. Note that column/feature name sanitization (handled by map) typically has different requirements than value sanitization. Also note that value sanitization is one way (original to sanitized only) and potentially may have collisions if sanitized in multiple calls to this method (collisions within one call of this function are resolved).
- to_raw(names: str | List[str])
Sanitized name(s) to original (raw) name(s).
- to_sanitized(names: str | List[str])
Original (raw) name(s) to sanitized name(s).
- h2o_sonar.utils.sanitization.sanitize_frame(frame, sanitization_map: SanitizationMap | None = None)
- h2o_sonar.utils.sanitization.sanitize_markdown(md_fragment: str) str
The purpose of this function is to sanitize a Markdown fragment string. It is NOT meant to sanitize whole Markdown documents, but its fragments where string (to be stored in Markdown) would interact with other Markdown elements.
- Parameters:
- md_fragmentstr
A Markdown fragment string.
- Returns:
- str
Sanitized Markdown string fragment without dangerous characters and links.
- h2o_sonar.utils.sanitization.sanitize_names(names: str | List[str], sanitization_map: SanitizationMap | None = None)
Sanitize column/feature name(s) either using (model’s) sanitization map (if available) or using universal sanitization method.
- Parameters:
- namesUnion[str, List[str]]
Name(s) to be sanitized.
- sanitization_mapOptional[SanitizationMap]
Optional sanitization map.
- h2o_sonar.utils.sanitization.sanitize_strings(strings: str | List[str], replace_with: str = '_', special_chars: str = '|,=[]<\t\r\n:.~')
Sanitize a string or a list of strings.
- Parameters:
- stringsUnion[str, List[str]]
Strings to be sanitized.
- replace_withstr
Character to be used for replacement for characters to be forbidden.
- special_charsstr
Optional special characters to be sanitized.
- Returns:
- Union[str, List[str]]
Sanitized strings.
h2o_sonar.utils.testing module
H2O Eval Studio LLM / RAG testing utilities:
Raw test data: a dataset which was used to create the test configuration(s).
Test suite: a collection of tests (see below).
Test: a collection of documents (corpus) along with the test cases (see below) to be run in the context of the corpus.
Test case: a prompt, expected output (ground truth), categories, output condition, output constraints, … and other parameters to be used for a RAG / LLM model evaluation.
Test lab: a set of resolved tests enriched with answers (actual answer), retrieval context, response duration and other data obtained from the conversation with a RAG / LLM.
Resolved test lab is exported to LLM dataset which is then used as input to an evaluation - evaluation runs a set of evaluators to rank RAG / LLM models.
- class h2o_sonar.utils.testing.InMemoryLlmHostPromptCache
Bases:
LlmHostPromptCache
In-memory LLM host client cache:
- initialization:
(pre-built) cache can be loaded from a JSon file
- hints:
cache can be saved and loaded from a JSon file
pre-built cache can be created from a test lab (not implemented by this class)
when used in the testing environment, cloud deployment, … pre-built cache can be synchronized/downloaded from S3, filesystem, …
- KEY_DATA = 'cache_data'
- add_test_lab(test_lab: RagTestLab)
Add the test lab to the cache.
- clear()
Clear the cache.
- evict(key: str)
Evict the value from the cache for the given key.
- get(key: str) Dict | None
Get the cached value for the given key.
Returned dictionary might be passed to a result class with types.
- get_llm_model_names(explainable_model_type: ExplainableModelType) List[str]
List all the LLM model names known to the cache.
- static load_from_json(file_path: str | Path)
- static load_from_url(url: str, work_dir: str | Path = '')
- put(key: str, value: Dict)
Put the value to the cache for the given key.
- Parameters:
- keystr
Cache key.
- valueDict
Cache value - it is expected that the dictionary is JSon serialized LlmDataset.LlmDatasetRow.
- save_to_json(file_path: str | Path)
- to_dict() Dict
- class h2o_sonar.utils.testing.LlmHostPromptCache
Bases:
ABC
Prompt cache for the LLM host clients:
caches: - answer(s) (actual answer, duration, cost, chunks, …) for given prompt(s)
NOT caches: - corpus documents synchronization - RAG host server collection creation - LLM models listing
cache key: - does NOT consider particular host (like playground.h2ogpte.h2o.ai),
but rather the LLM host type ~ connection type (like H2O_GPT_E or OPENAI_RAG)
does NOT consider particular chunk retrieval method
DOES consider corpus documents (empty for non-RAG), prompt, LLM model name, required context chunks (via chunk retrieval method - none or a method), …
DOES consider context (empty for RAG)
implementations (options): - in-memory cache (testing) - filesystem cache (pre-build JSON files) - Redis cache (shared by EvalStudio workers) - memcached cache (shared by EvalStudio workers) - …
utilities: - cache key generation - cache key hashing - static cache builder from serialized test labs (JSon)
purpose: - NON production use - for testing / demos / conference hands-on sessions only - significantly speed up the test lab completion - avoid test lab build failures due to unstable/slow/fragile system under test
(like h2oGPTe server)
save costs (e.g. OpenAI server costs)
- KEY_ACTUAL_OUTPUT = 'actual_output'
- KEY_CONTEXT = 'context'
- KEY_CORPUS = 'corpus'
- KEY_COST = 'cost'
- KEY_DURATION = 'actual_duration'
- KEY_EXTRAS = 'extras'
- KEY_INPUT = 'input'
- KEY_LLM_MODEL_NAME = 'llm_model_name'
- KEY_MODEL_TYPE = 'model_type'
- PREFIX_KEY = 'CACHE-KEY::'
- abstract clear()
Clear the cache.
- abstract evict(key: str)
Evict the value from the cache for the given key.
- abstract get(key: str) Dict | None
Get the cached value for the given key.
Returned dictionary might be passed to a result class with types.
- static get_key(explainable_model_type: ExplainableModelType, prompt: str, llm_model_name: str, corpus: List[str] | None = None, extras: str = '') str
Generate cache key for the LLM host client cache:
does NOT consider particular host (like playground.h2ogpte.h2o.ai)
does NOT consider RAG collection
does NOT consider chunk retrieval method
suitable for both RAG hosts (empty corpus, no context) and LLM hosts
- Parameters:
- explainable_model_typemodels.ExplainableModelType
Explainable model type.
- promptstr
Prompt for which the answer is to be cached.
- llm_model_namestr
LLM model name whose answer is to be cached.
- corpusOptional[List[str]]
Corpus documents - instead of relying on the collection (ID and name which may differ) corpus information is used.
- extrasstr
Extra information - any other parameters which may make the cache key unique.
- Returns:
- str
Cache key.
- abstract get_llm_model_names(explainable_model_type: ExplainableModelType) List[str]
List all the LLM model names known to the cache.
- abstract put(key: str, value: Dict)
Put the value to the cache for the given key.
- Parameters:
- keystr
Cache key.
- valueDict
Cache value - it is expected that the dictionary is JSon serialized LlmDataset.LlmDatasetRow.
- static str_key_to_dict(key_dict: str) Dict
- class h2o_sonar.utils.testing.RagTestCaseConfig(prompt: str, categories: str | List[str] = '', relationships=None, constraints=None, condition='', expected_output: str = '', config: RagTestConfig | None = None, key: str = '')
Bases:
object
RAG / LLM test case configuration:
prompt
expected output
categories
condition (string expression)
constraints (any JSON serializable object)
…
- KEY_CATEGORIES = 'categories'
- KEY_CONDITION = 'condition'
- KEY_CONSTRAINTS = 'constraints'
- KEY_EXPECTED_OUTPUT = 'expected_output'
- KEY_KEY = 'key'
- KEY_PROMPT = 'prompt'
- KEY_RELS = 'relationships'
- add_relationship(relationship_type: str, target: str, target_type: str)
- copy(update_key: bool = True)
- perturb(perturbators: List[PerturbatorToRun], in_place: bool = True, raised_errors: List | None = None)
Perturb the prompt.
- Parameters:
- perturbatorsList[commons.PerturbatorToRun]
Perturbators to run - includes the perturbator ID, intensity, and parameters.
- in_placebool
If True, perturb the prompt in place, otherwise create a new perturbed test case.
- raised_errorsOptional[List]
If
None
, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.
- to_dict()
- class h2o_sonar.utils.testing.RagTestConfig(documents: List[Path | str], key: str = '')
Bases:
object
RAG / LLM test configuration:
corpus … a set of documents (empty for LLM evaluation) - test cases
… a set of prompts, expected outputs, categories, conditions, …
- KEY_DOCUMENTS = 'documents'
- KEY_KEY = 'key'
- static from_dict(key: str, as_dict: Dict) RagTestConfig
- to_dict()
- class h2o_sonar.utils.testing.RagTestLab(llm_host_connection: ConnectionConfig, raw_dataset: LlmDataset, evaluated_models: List[ExplainableRagModel | ExplainableLlmModel] | None = None, llm_model_names: List[str] | None = None, docs_cache_dir: str | Path = '', name: str = 'TestLab', description: str = 'Test lab for RAG / LLM evaluation.', llm_host_prompt_cache: LlmHostPromptCache | None = None, use_evaluated_model_collection_id: bool = False, logger=None)
Bases:
TestLab
LLMs (Large Language Model) test lab:
TestLab is expected to test multiple LLMs either hosted by one service (like OpenAI) or by RAG (Retrieval-augmented generation) product (like h2oGPTe) or by LLM host product (like h2oGPT).
TestLab gets connection configuration to the host system.
TestLab can compare / benchmark multiple LLM models from the same host system.
Resolved test labs can be merged to get an aggregated lab -> LLM dataset with multiple LLM hosts for the side-by-side evaluation by the
evaluate
module.
- bind(collection_id: str, collection_name: str, corpus: List | None = None)
Bind ALL the test lab RAG models to collection(s) instead of building it by creating collections and uploading documents.
- build(doc_sync_meta: Dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, sync_documents: bool = True, fail_on_missing_corpus: bool = False)
Build the test lab so that it can be used for the evaluation:
synchronize the document cache
create RAG’s document collections
upload documents (corpora) to collection(s).
- Parameters:
- doc_sync_meta: Dict[str, Any]
Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like
headers
. Example:- “http://example.com/doc1.txt”: {
- “headers”: {
“foo-header”: “FOO-VALUE”,
}
}
- progress_callbackOptional[progress.AbstractProgressCallbackContext]
Optional progress callback context.
- sync_documentsbool
Sync documents from the network/filesystem to the lab’s document cache.
- fail_on_missing_corpusbool
Fail if the test does not specify any corpus.
- complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: str | Path | None = None, parallelize: int = 0, multi_turn: bool = False, retry_on_error: int = 2, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, include_llm_meta: bool = True, raise_on_all_tcs_fail: bool = True, purge_workdir: bool = True)
Complete the dataset with the actual values from an LLM host.
- Parameters:
- complete_contextint
How many context text chunks to include in the resolved dataset.
- progress_callbackOptional[progress.AbstractProgressCallbackContext]
Optional progress callback context.
- save_as_you_goOptional[Union[str, pathlib.Path]]
Save the dataset as JSON after each input is resolved.
- parallelizeint
Complete the dataset in parallel using multiple processes. Use
-1
for auto-choice of the number of workers,0
to disable parallelization (will create the lab using sequential requests), and ``1``+ (positive integer) to specify the number of workers.- multi_turnbool
Whether to use multi-turn chat with the LLM host - if enabled, then all test cases within the test will be handled within the single session i.e. the same chat session i.e. the same context.
- retry_on_errorint
How many times to retry the failed LLM host requests.
- timeout_exp_backoffOptional[TimeoutRetryExpBackoffCtx]
Optionally override timeout which can be specified in the model host configuration ExplainableRagModel::model_cfg and which is model host type specific and use exponential backoff strategy for the timeout handling. Timeout is increased on each retry by the backoff factor.
- include_llm_metabool
Whether to include the LLM meta-data like performance statistics.
- raise_on_all_tcs_fail:
Raise an exception if all test cases fail.
- purge_workdirbool
Purge the working directory with lab shards after the completion.
- complete_from_shards(execution_dir_path: str | Path)
Complete the test lab from the shards stored on the filesystem. This method is used to load previously completed test lab shards and merge them into a single resolved dataset.
- static from_eval_results(eval_results_path: str | Path, interpretation_json_path: str | Path, raw_dataset_empty: bool = True)
Create a test lab from the evaluation results archive.
- Parameters:
- eval_results_pathUnion[str, pathlib.Path]
Path to the evaluation results JSon file path.
- interpretation_json_pathUnion[str, pathlib.Path]
Path to the interpretation JSon file path.
- raw_dataset_emptybool
Whether to create an empty raw dataset or copy resolved dataset
- static from_llm_test_suite(llm_host_connection: ConnectionConfig, llm_test_suite: RagTestSuiteConfig, llm_model_type: ExplainableModelType, llm_model_names: List[str], work_dir: str | Path = '', llm_models_cfgs: Dict[str, List[Dict]] = None, llm_host_prompt_cache: LlmHostPromptCache | None = None) RagTestLab
Create new (unresolved) test lab from the LLM test suite configuration.
- static from_rag_test_suite(rag_connection: ConnectionConfig, rag_test_suite: RagTestSuiteConfig, rag_model_type: ExplainableModelType, llm_model_names: List[str], docs_cache_dir: str | Path, rag_models_cfgs: Dict[str, List[Dict]] = None, predefined_collection_id: str | Dict | None = None, llm_host_prompt_cache: LlmHostPromptCache | None = None) RagTestLab
Create new (unresolved) test lab from the RAG test suite configuration.
Test lab is build as follows:
all LLM model names are hosted by the SAME system described by the RAG connection, accessed by a client
LLM model name may have associated a list of custom client configurations
RAG test suite is used to build the test lab: - RAG test suite has test cases that are grouped to tests - test cases within the test has the SAME corpus,
different tests may have different corpora
explainable RAG model is… - created for EACH: LLM model name + config + corpus (not test) - carthesian product of:
LLM model names x client configurations x corpora = explainable RAG models
Summary of the explainable RAG models creation:
- for each LLM model name
- for each client configuration of that LLM model name
- for each test
create explainable RAG model
- Parameters:
- rag_connectionh2o_sonar_config.ConnectionConfig
Connection to the RAG system.
- rag_test_suiteRagTestSuiteConfig
RAG test suite configuration.
- rag_model_typeExplainableModelTypes
Type of the explainable model hosted by the RAG system.
- llm_model_namesList[str]
List of LLM model names to be used to build the test lab and to be subsequently evaluated and compared. There are the following special names which can be used with h2oGPTe model host: -
auto
: to use the best available model chosen by h2oGPTe - ``: empty string to inherit configuration from the h2oGPTe collection - ``None
: to inherit configuration from the h2oGPTe collection- rag_models_cfgsDict[str, List[Dict]]
Dictionary with LLM model name as key and list of client configurations as values. Each client configuration is a dictionary with the client configuration parameters which can be created by the client factory using
client.config_factory()
.- docs_cache_dirUnion[str, pathlib.Path]
Directory to store the documents cache.
- predefined_collection_idOptional[Union[str, Dict]]
Predefined collection ID for the RAG model. If provided as a string, it is used as the collection ID for all the test cases. If provided as a dictionary, it is used as a mapping of the test case keys to the collection IDs.
- llm_host_prompt_cacheOptional[LlmHostPromptCache]
Cache for the LLM host client.
- Returns:
- RagTestLab
New RAG test lab.
- get_evaluated_model_for_key(model_key: str)
Get LLM model name for the evaluated model.
- insight_internal_llm_errors(report_dir: str | Path = '', src: str = 'stats') Tuple[Dict, str]
Create Markdown report with the internal LLM errors.
- Parameters:
- report_dirUnion[str, pathlib.Path]
Directory to save the reports as JSon and Markdown to.
- srcstr
Source of the errors:
stats
(default) ordataset
(text of answers analysis).
- integrity_check()
- static load_from_json(llm_host_connection: ConnectionConfig, file_path: str | Path, docs_cache_dir: str | Path = '', datatable_format: bool = False) RagTestLab
- merge(other_test_lab: RagTestLab, other_llm_prefix: str = 'Other')
Merge another test lab into this one.
- purge()
Purge the test lab by deleting all the created collections/assistants and uploaded documents.
- split_to_shards(base_dir: Path, max_total_workers: int = 20) Dict
Split the test lab into shards by RAG model (which is identified by corpus and base LLM model name). If there is one RAG model (or just a few), then even the inputs of particular model are split into shards. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.
Sharding strategy:
- 1 RAG model:
split the inputs of the model for max 20 workers (split to 20 shards)
the minimum number of inputs per worker is 2 (consider process overhead)
- >1 RAG model:
if the number of models is GREATER than 10, split the inputs by the RAG model i.e. the number of needed workers is equal to the number of RAG models
if the number of models is SMALLER or equal to 10, then use up to 20 workers to split the inputs
- Parameters:
- base_dirpathlib.Path
Base directory where to store the shards - JSon representation of test labs.
- max_total_workersint
The number of workers which is used to split the inputs of the SINGLE model (or lab with just a few models).
- split_to_shards_by_model(base_dir: Path) Dict
Split the test lab into shards by RAG model - which is identified by corpus and base LLM model name. Shard contains prompts which will be subsequently evaluated in the context of the corpus by given base LLM model.
- stats() Dict
Get the test lab statistics and cross-check.
- sync_documents(doc_sync_meta: Dict[str, Any] = None, progress_callback: AbstractProgressCallbackContext | None = None, fail_on_missing_corpus: bool = False) Path
Cache test suite documents from the network to the local filesystem so that they can be used for RAG evaluation later.
- Parameters:
- doc_sync_meta: Dict[str, Any]
Document synchronization metadata - the key is the document locator (URL), the value is a dictionary with metadata like
headers
. Example:- “http://example.com/doc1.txt”: {
- “headers”: {
“foo-header”: “FOO-VALUE”,
}
}
- progress_callbackOptional[progress.AbstractProgressCallbackContext]
Optional progress callback context.
- fail_on_missing_corpusbool
Fail if a test has empty corpus, or create dummy document to enable empty RAG corpora
- to_dict() Dict
- trim(max_llm_models_count=None)
Trim the test lab by keeping only specified number of LLM models and removing all the orphans.
- class h2o_sonar.utils.testing.RagTestLabPromptCache(singleton_create_key)
Bases:
object
RAG test lab prompt cache (singleton) to be used across H2O Eval Studio.
- ENV_VAR_H2O_SONAR_PROMPT_CACHE: str = 'H2O_SONAR_PROMPT_CACHE_ENABLED'
- ENV_VAR_H2O_SONAR_PROMPT_CACHE_SIZE: str = 'H2O_SONAR_PROMPT_CACHE_SIZE'
- ENV_VAR_H2O_SONAR_PROMPT_CACHE_SRC: str = 'H2O_SONAR_PROMPT_CACHE_SRC'
- ENV_VAR_H2O_SONAR_PROMPT_CACHE_STATIC: str = 'H2O_SONAR_PROMPT_CACHE_STATIC'
- MAX_ITEMS = 5000
- classmethod cache()
- reinitialize(enable_cache: bool | None = None, src_path: str | Path | None = None, src_host_connection: ConnectionConfig | None = None, max_items: int | None = None)
- class h2o_sonar.utils.testing.RagTestSuiteConfig(test_cases: List[RagTestCaseConfig] | None = None, name: str = 'TestSuite', description: str = 'Test suite for RAG / LLM evaluation.')
Bases:
object
RAG / LLM test suite configuration:
test suite (RagTestSuiteConfig) … a set of tests - tests (RagTestConfig)
… corpus with a set of test cases - test cases (RagTestCaseConfig)
… prompt, expected output, categories, conditions, …
- KEY_DESCRIPTION = 'description'
- KEY_NAME = 'name'
- KEY_TESTS = 'tests'
- KEY_TEST_CASES = 'test_cases'
- add_test_case(test_case: RagTestCaseConfig)
- copy() RagTestSuiteConfig
- static from_llm_dataset(llm_dataset: LlmDataset) RagTestSuiteConfig
Create RAG test configuration from the LLM dataset.
- static load_from_json(file_path: str | Path)
- perturb(perturbators: List[PerturbatorToRun], in_place: bool = True, raised_errors: List | None = None)
Perturb the test suite prompts.
- Parameters:
- perturbatorsList[commons.PerturbatorToRun]
Perturbators to run - includes the perturbator ID, intensity, and parameters.
- in_placebool
If True, perturb the test cases in place - there will be the same number of tests and test cases within the test suite Otherwise keep the original test cases and create new perturbed test cases - there will be 2x more test cases in the test suite after the perturbation (all intermediary perturbations in case of multiple perturbator IDs are discarded).
- raised_errorsOptional[List]
If
None
, then raise error(s) if the perturbator(s) fail(s, otherwise do not raise exceptions and store them in the (empty) list provided by the caller.
- save_as_json(file_path: str | Path)
- split(max_tests: int) List[RagTestSuiteConfig]
Split the test suite to multiple test suites so that each test suite has at most the given number of tests.
- Parameters:
- max_testsint
Maximum number of tests in a test suite.
- Returns:
- List[RagTestSuiteConfig]
List of new test suites.
- property tests: List[RagTestConfig]
- to_dict() Dict
- trim_tests(max_tests: int)
Trim the test suite to the given number of tests.
- class h2o_sonar.utils.testing.TestLab
Bases:
ABC
A test target / product test lab.
- KEY_BASE_MODEL_NAMES = 'llm_model_names'
- KEY_DATASET = 'dataset'
- KEY_DESCRIPTION = 'description'
- KEY_DOCS_CACHE = 'docs_cache'
- KEY_MODELS = 'models'
- KEY_NAME = 'name'
- KEY_RAW_DATASET = 'raw_dataset'
- PARALLEL_RUN = -1
- SEQUENTIAL_RUN = 0
- build()
Build / deploy / materialize the test lab on the host system e.g. by creating RAG’s document collections, uploading documents to the collection, …
- complete_dataset(complete_context: int = 10, progress_callback: AbstractProgressCallbackContext | None = None, save_as_you_go: str | Path | None = None, parallelize: int = 0, retry_on_error: int = 2, purge_workdir: bool = True)
Complete the LLM dataset with the actual values from the host system.
- Parameters:
- complete_contextint
How many context text chunks to include in the resolved dataset.
- progress_callbackOptional[progress.AbstractProgressCallbackContext]
Optional progress callback context.
- save_as_you_goOptional[Union[str, pathlib.Path]]
Save the dataset as JSON after each input is resolved.
- parallelizeint
Complete the dataset in parallel using multiple processes. Use
-1
for auto-choice of the number of workers,0
to disable parallelization (will create the lab using sequential requests), and ``1``+ (positive integer) to specify the number of workers.- retry_on_errorint
How many times to retry the failed LLM host requests.
- purge_workdirbool
Purge the working directory with lab shards after the completion.
- save_as_json(file_path: str | Path)
- to_dict() Dict