h2o_sonar.evaluators package

Submodules

h2o_sonar.evaluators.abc_byop_evaluator module

class h2o_sonar.evaluators.abc_byop_evaluator.AbcByopEvaluator

Bases: ABC, Evaluator

Abstract base class for Bring Your Own Prompt (BYOP) evaluators.

class Classes(failure, success)

Bases: tuple

failure: Alias for field number 0

success: Alias for field number 1

IDENTIFIER_ACTUAL_OUTPUT = '{ACTUAL_OUTPUT}'

IDENTIFIER_CONTEXT = '{CONTEXT}'

IDENTIFIER_EXPECTED_OUTPUT = '{EXPECTED_OUTPUT}'

IDENTIFIER_INPUT = '{INPUT}'

KEY_ANSWER: str = 'answer'

KEY_ERROR: str = 'error'

KEY_PARSED_ANSWER: str = 'parsed_answer'

KEY_PROMPT: str = 'prompt'

PARAM_JUDGE_HOST: str = 'judge_host'

PARAM_JUDGE_MODEL: str = 'judge_model'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

property judge

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.bleu_evaluator module

class h2o_sonar.evaluators.bleu_evaluator.BleuEvaluator

Bases: Evaluator

METRIC_BLEU_1 = 'bleu_1'

METRIC_BLEU_2 = 'bleu_2'

METRIC_BLEU_3 = 'bleu_3'

METRIC_BLEU_4 = 'bleu_4'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.classification_evaluator module

class h2o_sonar.evaluators.classification_evaluator.ClassificationEvaluator

Bases: Evaluator

METRIC_FN = 'fn'

METRIC_FP = 'fp'

METRIC_TN = 'tn'

METRIC_TP = 'tp'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.contact_information_byop_evaluator module

class h2o_sonar.evaluators.contact_information_byop_evaluator.ContactInformationByopEvaluator: Bases: AbcByopEvaluator

h2o_sonar.evaluators.fairness_bias_evaluator module

class h2o_sonar.evaluators.fairness_bias_evaluator.FairnessBiasEvaluator

Bases: Evaluator

METRIC_FAIRNESS_BIAS = 'fairness_bias'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.gptscore_evaluator module

class h2o_sonar.evaluators.gptscore_evaluator.GptScoreEvaluator

Bases: ABC, Evaluator

DEFAULT_METRIC_THRESHOLD = inf

PARAM_EVAL_GPT_SCORE_MODEL = 'gpt_score_model'

add_problem_for_row(severity: ProblemSeverity, message: str, row: LlmDatasetRow)

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.gptscore_machine_translation_evaluator module

class h2o_sonar.evaluators.gptscore_machine_translation_evaluator.GptScoreMachineTranslationEvaluator

Bases: GptScoreEvaluator

METRIC_ACCURACY = 'accuracy'

METRIC_FLUENCY = 'fluency'

METRIC_MULTI_QUAL_METRICS = 'multidimensional quality metrics'

h2o_sonar.evaluators.gptscore_question_answering_evaluator module

class h2o_sonar.evaluators.gptscore_question_answering_evaluator.GptScoreQuestionAnsweringEvaluator

Bases: GptScoreEvaluator

METRIC_CORRECTNESS = 'correctness'

METRIC_ENGAGEMENT = 'engagement'

METRIC_FLUENCY = 'fluency'

METRIC_INTEREST = 'interest'

METRIC_RELEVANCE = 'relevance'

METRIC_SEMANTICALLY_APPROPRIATE = 'semantically appropriate'

METRIC_SPECIFIC = 'specific'

METRIC_UNDERSTANDABILITY = 'understandability'

h2o_sonar.evaluators.gptscore_summary_without_reference_evaluator module

class h2o_sonar.evaluators.gptscore_summary_without_reference_evaluator.GptScoreSummaryWithoutReferenceEvaluator

Bases: GptScoreEvaluator

METRIC_COHERENCE = 'coherence'

METRIC_CONSISTENCY = 'consistency'

METRIC_FACTUALITY = 'factuality'

METRIC_FLUENCY = 'fluency'

METRIC_INFORMATIVENESS = 'informativeness'

METRIC_RELEVANCE = 'relevance'

METRIC_SEMANTIC_COVERAGE = 'semantic coverage'

h2o_sonar.evaluators.gptscore_summary_with_reference_evaluator module

class h2o_sonar.evaluators.gptscore_summary_with_reference_evaluator.GptScoreSummaryWithReferenceEvaluator

Bases: GptScoreEvaluator

METRIC_COHERENCE = 'coherence'

METRIC_FACTUALITY = 'factuality'

METRIC_FLUENCY = 'fluency'

METRIC_INFORMATIVENESS = 'informativeness'

METRIC_RELEVANCE = 'relevance'

METRIC_SEMANTIC_COVERAGE = 'semantic coverage'

h2o_sonar.evaluators.language_mismatch_byop_evaluator module

class h2o_sonar.evaluators.language_mismatch_byop_evaluator.LanguageMismatchByopEvaluator: Bases: AbcByopEvaluator

h2o_sonar.evaluators.parameterizable_byop_evaluator module

class h2o_sonar.evaluators.parameterizable_byop_evaluator.ParameterizableByopEvaluator

Bases: AbcByopEvaluator

PROMPT_TEMPLATE_PARAM: str = 'prompt_template'

check_compatibility(params: CommonInterpretationParams | None = None, model: ExplainableModel | None = None, **explainer_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

h2o_sonar.evaluators.perplexity_evaluator module

class h2o_sonar.evaluators.perplexity_evaluator.PerplexityEvaluator

Bases: Evaluator

METRIC_PERPLEXITY = 'perplexity'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.pii_leakage_evaluator module

class h2o_sonar.evaluators.pii_leakage_evaluator.PiiLeakageEvaluator

Bases: Evaluator

DEFAULT_EVAL_RC = True

PARAM_EVAL_RC = 'evaluate_retrieved_context'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

static check_creditcard_leakage(checked_txt: str, failed_constraints: List, fragments: List) → Tuple[List, List]

static check_email_leakage(checked_txt: str, failed_constraints: List, fragments: List) → Tuple[List, List]

Check email leakage.

Returns:

bool: Return list of leaked emails found, empty list otherwise.

static check_ssn_leakage(checked_txt: str, failed_constraints: List, fragments: List) → Tuple[List, List]

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_answer_correctness_evaluator module

class h2o_sonar.evaluators.rag_answer_correctness_evaluator.AnswerCorrectnessEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_answer_relevancy_evaluator module

class h2o_sonar.evaluators.rag_answer_relevancy_evaluator.AnswerRelevancyEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_answer_relevancy_no_judge_evaluator module

class h2o_sonar.evaluators.rag_answer_relevancy_no_judge_evaluator.RagAnswerRelevancyNoJudgeEvaluator

Bases: Evaluator

COL_ACTUAL_OUTPUT = 'actual_output'

COL_CONTEXT = 'context'

COL_EXPECTED_OUTPUT = 'expected_output'

COL_INPUT = 'input'

COL_MODEL = 'model'

COL_SCORE = 'score'

METRIC_ANSWER_RELEVANCY = 'answer_relevancy'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

static split_sentences(text: str) → List[str]: Split the data into sentences

h2o_sonar.evaluators.rag_answer_similarity_evaluator module

class h2o_sonar.evaluators.rag_answer_similarity_evaluator.AnswerSemanticSimilarityEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_chunk_relevancy_evaluator module

class h2o_sonar.evaluators.rag_chunk_relevancy_evaluator.ContextChunkRelevancyEvaluator

Bases: Evaluator

COL_ACTUAL_OUTPUT = 'actual_output'

COL_CONTEXT = 'context'

COL_EXPECTED_OUTPUT = 'expected_output'

COL_INPUT = 'input'

COL_MODEL = 'model'

COL_SCORE = 'score'

METRIC_PRECISION_RELEVANCY = 'precision_relevancy'

METRIC_RECALL_RELEVANCY = 'recall_relevancy'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

static split_sentences(text: str) → List[str]: Split the data into sentences

h2o_sonar.evaluators.rag_context_precision_evaluator module

class h2o_sonar.evaluators.rag_context_precision_evaluator.ContextPrecisionEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_context_recall_evaluator module

class h2o_sonar.evaluators.rag_context_recall_evaluator.ContextRecallEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_context_relevancy_evaluator module

class h2o_sonar.evaluators.rag_context_relevancy_evaluator.ContextRelevancyEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_faithfulness_evaluator module

class h2o_sonar.evaluators.rag_faithfulness_evaluator.FaithfulnessEvaluator

Bases: Evaluator

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_groundedness_evaluator module

class h2o_sonar.evaluators.rag_groundedness_evaluator.RagGroundednessEvaluator

Bases: Evaluator

COL_ACTUAL_OUTPUT = 'actual_output'

COL_CONTEXT = 'context'

COL_EXPECTED_OUTPUT = 'expected_output'

COL_INPUT = 'input'

COL_MODEL = 'model'

COL_SCORE = 'score'

METRIC_GROUNDEDNESS = 'groundedness'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

static split_sentences(text: str) → List[str]: Split the data into sentences

h2o_sonar.evaluators.rag_hallucination_evaluator module

class h2o_sonar.evaluators.rag_hallucination_evaluator.RagHallucinationEvaluator

Bases: Evaluator

COL_ACTUAL_OUTPUT = 'actual_output'

COL_CONTEXT = 'context'

COL_EXPECTED_OUTPUT = 'expected_output'

COL_INPUT = 'input'

COL_MODEL = 'model'

COL_SCORE = 'score'

DEFAULT_METRIC_THRESHOLD = 0.5

METRIC_HALLUCINATION = 'hallucination'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_ragas_evaluator module

class h2o_sonar.evaluators.rag_ragas_evaluator.RagasEvaluator

Bases: Evaluator

KEY_ANSWER = 'answer'

KEY_CONTEXTS = 'contexts'

KEY_GROUND_TRUTHS = 'ground_truths'

KEY_QUESTION = 'question'

METRIC_ANSWER_CORRECTNESS = 'answer_correctness'

METRIC_ANSWER_RELEVANCY = 'answer_relevancy'

METRIC_ANSWER_SIMILARITY = 'answer_similarity'

METRIC_CONTEXT_PRECISION = 'context_precision'

METRIC_CONTEXT_RECALL = 'context_recall'

METRIC_CONTEXT_RELEVANCY = 'context_relevancy'

METRIC_FAITHFULNESS = 'faithfulness'

METRIC_META_ANSWER_CORRECTNESS = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_ANSWER_RELEVANCY = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_ANSWER_SIMILARITY = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_CONTEXT_PRECISION = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_CONTEXT_RECALL = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_CONTEXT_RELEVANCY = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_FAITHFULNESS = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_META_RAGAS = <h2o_sonar.lib.api.commons.MetricMeta object>

METRIC_RAGAS = 'ragas'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

eval_custom_metrics(llm_testset, metrics_threshold: float, save_llm_result: bool, custom_eval_judge_cfg_key: str, metrics_to_run: MetricsMeta, evaluator: Evaluator | None = None, nan_tolerance: float = 0.2) → List

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_tokens_presence_evaluator module

class h2o_sonar.evaluators.rag_tokens_presence_evaluator.ConditionEvaluator(c: str, logger)

Bases: object

Condition evaluator for the AIP-160 syntax subset.

evaluate(s: str, c_ast: List | None = None, failed_sub_conditions_as_str: bool = False) → Tuple[bool, List]

Evaluate the condition.

Parameters:

sstr: The string to be evaluated.
c_astOptional[List]: Optional custom condition AST.
failed_sub_conditions_as_strbool: If True, return the failed sub-conditions as string, otherwise as AST.

Returns:

Tuple[bool, List]: The evaluation result and the list of failed sub-conditions.

class h2o_sonar.evaluators.rag_tokens_presence_evaluator.RagStrStrEvaluator

Bases: Evaluator

DEFAULT_EVAL_RC = False

PARAM_EVAL_RC = 'evaluate_retrieved_context'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

static eval_tc_conditions(row: LlmDatasetRow, evaluator: Evaluator, evaluator_id: str, evaluator_display_name: str, eval_results: LlmEvalResults, key_2_evaluated_model: Dict, llm_host: LlmModelHostType, do_eval_rc: bool, logger)

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.rag_tokens_presence_evaluator.constraints_to_condition(constraint: List | None) → str

Convert constraints to more powerful AIP-160 syntax based expression. The main motivation is KISS - use one evaluator for all types of constraints.

Parameters:

constraintOptional[List]: Constraints structure to be converted to the condition.

Returns:

str: The condition string.

h2o_sonar.evaluators.rouge_evaluator module

class h2o_sonar.evaluators.rouge_evaluator.RougeEvaluator

Bases: Evaluator

METRIC_ROUGE_1 = 'rouge_1'

METRIC_ROUGE_2 = 'rouge_2'

METRIC_ROUGE_L = 'rouge_l'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.sensitive_data_leakage_evaluator module

class h2o_sonar.evaluators.sensitive_data_leakage_evaluator.SensitiveDataLeakageEvaluator

Bases: Evaluator

DEFAULT_EVAL_RC = True

PARAM_EVAL_RC = 'evaluate_retrieved_context'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators.sexism_byop_evaluator module

class h2o_sonar.evaluators.sexism_byop_evaluator.SexismByopEvaluator: Bases: AbcByopEvaluator

h2o_sonar.evaluators.stereotype_byop_evaluator module

class h2o_sonar.evaluators.stereotype_byop_evaluator.StereotypeByopEvaluator: Bases: AbcByopEvaluator

h2o_sonar.evaluators.summarization_byop_evaluator module

class h2o_sonar.evaluators.summarization_byop_evaluator.SummarizationByopEvaluator: Bases: AbcByopEvaluator

h2o_sonar.evaluators.summarization_evaluator module

class h2o_sonar.evaluators.summarization_evaluator.SummarizationEvaluator

Bases: Evaluator

KEY_COMPLETENESS = 'completeness'

KEY_FAITHFULNESS_CONV = 'faithfulness_conv'

KEY_FAITHFULNESS_ZS = 'faithfulness_zs'

calculate_scores(inputs: list[str], actual_outputs: list[str]) → Tuple[Dict[str, float], Dict]

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

static split_sentences(text: str) → List[str]

summac_faith_score1(summary: str, refs: str) → float: Calculate the summac convolution faithfulness score using the summac convolution

summac_faith_score2(summary: str, refs: str) → float: Max summac/NLI score for individual sentences

summary_completeness_batch(summaries: list[str], docs: list[str], nearest_neighbors: int = 10, umap_dimension: int = 5) → Tuple[List | None, Dict]

h2o_sonar.evaluators.summarization_evaluator.load_summac()

h2o_sonar.evaluators.summarization_evaluator.pairwise_distances_wrapper(points)

h2o_sonar.evaluators.summarization_evaluator.segment_calc(distances: Any, n: int) → float

h2o_sonar.evaluators.toxicity_evaluator module

class h2o_sonar.evaluators.toxicity_evaluator.ToxicityEvaluator

Bases: Evaluator

DEFAULT_TOXICITY_METRIC_THRESHOLD = 0.25

METRIC_IDENTITY_ATTACK = 'identity_attack'

METRIC_INSULT = 'insult'

METRIC_OBSCENE = 'obscene'

METRIC_SEVERE_TOXICITY = 'severe_toxicity'

METRIC_THREAT = 'threat'

METRIC_TOXICITY = 'toxicity'

check_compatibility(params: CommonInterpretationParams | None = None, **evaluator_params) → bool: Explainer’s check (based on parameters) verifying that explainer will be able to explain a given model. If this compatibility check returns False or raises error, then it will not be run by the engine. This check may, but does not have to be performed by the execution engine.

evaluate(llm_testset, explanations_types=None, **kwargs) → List

get_result() → LeaderboardResult

setup(model, persistence, **kwargs)

Set all the parameters needed to execute fit() and explain().

Parameters:

modelOptional[Union[models.ExplainableModel, models.ExplainableModelHandle]]: Explainable model with (fit and) score methods (or None if 3rd party).
models: (Explainable) models.
persistence: ExplainerPersistence: Persistence API allowing (controlled) saving and loading of explanations.
key: str: Optional (given) explainer run key (generated otherwise).
params: CommonInterpretationParams: Common explainers parameters specified on explainer run.
explainer_params_as_str: Optional[str]: Explainer specific parameters in string representation.
dataset_apiOptional[datasets.DatasetApi]: Dataset API to create custom explainable datasets needed by this explainer.
model_apiOptional[models.ModelApi]: Model API to create custom explainable models needed by this explainer.
loggerOptional[loggers.SonarLogger]: Logger.
explainer_params:: Other explainers RUNTIME parameters, options, and configuration.

h2o_sonar.evaluators package

Submodules

h2o_sonar.evaluators.abc_byop_evaluator module

h2o_sonar.evaluators.bleu_evaluator module

h2o_sonar.evaluators.classification_evaluator module

h2o_sonar.evaluators.contact_information_byop_evaluator module

h2o_sonar.evaluators.fairness_bias_evaluator module

h2o_sonar.evaluators.gptscore_evaluator module

h2o_sonar.evaluators.gptscore_machine_translation_evaluator module

h2o_sonar.evaluators.gptscore_question_answering_evaluator module

h2o_sonar.evaluators.gptscore_summary_without_reference_evaluator module

h2o_sonar.evaluators.gptscore_summary_with_reference_evaluator module

h2o_sonar.evaluators.language_mismatch_byop_evaluator module

h2o_sonar.evaluators.parameterizable_byop_evaluator module

h2o_sonar.evaluators.perplexity_evaluator module

h2o_sonar.evaluators.pii_leakage_evaluator module

h2o_sonar.evaluators.rag_answer_correctness_evaluator module

h2o_sonar.evaluators.rag_answer_relevancy_evaluator module

h2o_sonar.evaluators.rag_answer_relevancy_no_judge_evaluator module

h2o_sonar.evaluators.rag_answer_similarity_evaluator module

h2o_sonar.evaluators.rag_chunk_relevancy_evaluator module

h2o_sonar.evaluators.rag_context_precision_evaluator module

h2o_sonar.evaluators.rag_context_recall_evaluator module

h2o_sonar.evaluators.rag_context_relevancy_evaluator module

h2o_sonar.evaluators.rag_faithfulness_evaluator module

h2o_sonar.evaluators.rag_groundedness_evaluator module

h2o_sonar.evaluators.rag_hallucination_evaluator module

h2o_sonar.evaluators.rag_ragas_evaluator module

h2o_sonar.evaluators.rag_tokens_presence_evaluator module

h2o_sonar.evaluators.rouge_evaluator module

h2o_sonar.evaluators.sensitive_data_leakage_evaluator module

h2o_sonar.evaluators.sexism_byop_evaluator module

h2o_sonar.evaluators.stereotype_byop_evaluator module

h2o_sonar.evaluators.summarization_byop_evaluator module

h2o_sonar.evaluators.summarization_evaluator module

h2o_sonar.evaluators.toxicity_evaluator module

Module contents