h2o_sonar.lib.integrations package

Submodules

h2o_sonar.lib.integrations.genai module

class h2o_sonar.lib.integrations.genai.AmazonBedrockFoundationModel(model_arn: str, model_id: str, model_name: str, customizations_supported: List[Literal['FINE_TUNING', 'CONTINUED_PRE_TRAINING']], inference_types_supported: List[Literal['ON_DEMAND', 'PROVISIONED']], input_modalities: List[str], model_lifecycle_status: str, output_modalities: List[Literal['TEXT', 'IMAGE', 'EMBEDDING']], provider_name: str, response_streaming_supported: bool)

Bases: object

customizations_supported: List[Literal['FINE_TUNING', 'CONTINUED_PRE_TRAINING']]

inference_types_supported: List[Literal['ON_DEMAND', 'PROVISIONED']]

input_modalities: List[str]

model_arn: str

model_id: str

model_lifecycle_status: str

model_name: str

output_modalities: List[Literal['TEXT', 'IMAGE', 'EMBEDDING']]

provider_name: str

response_streaming_supported: bool

class h2o_sonar.lib.integrations.genai.AmazonBedrockKnowledgeBase(id: str, name: str, description: str, status: str, updated_at: datetime.datetime)

Bases: object

description: str

id: str

name: str

status: str

updated_at: datetime

class h2o_sonar.lib.integrations.genai.AmazonBedrockRagClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: RagClient

ES_TEMP_PREFIX = 'es-temp'

ask_collection(collection_id: str, prompts: List[str], llm_model_name: str = '', include_chunks: int = 0, chunk_retrieval_method: str = 'ANSWER_REFS', **kwargs)

ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) → List

bedrock

bedrock_agent

bedrock_agent_runtime

bedrock_runtime

static config_factory() → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:

Dict: Prototype of the configuration for the client.

property connection: ConnectionConfig

create_collection(doc_paths: List[Path | str], collection_name: str = '', **kwargs) → Tuple[str, str]

get_rag_conf(collection_id, model_arn)

iam

classmethod is_model_enabled(connection, model_id)

list_collections(offset: int = 0, limit: int = 1000) → List[AmazonBedrockKnowledgeBase]

list_llm_model_names()

list_llm_models()

opensearchserverless

purge_collections(collection_ids: List[str] | None = None)

purge_uploaded_docs(document_ids: List[str] | None = None)

s3

s3_resource

sts

class h2o_sonar.lib.integrations.genai.AwsClient: Bases: object

class h2o_sonar.lib.integrations.genai.AwsResource: Bases: object

class h2o_sonar.lib.integrations.genai.H2oGptLlmClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: OpenAiLlmClient

h2oGPT client - connects to the h2oGPT server:

OpenAI client is used to connect to h2oGPT, h2ogpt_client is DEPRECATED.
standalone h2oGPT server connection config: - server URL examples:
- API key: - required, cannot be generated, must be provided by the h2oGPT server admin
Hugging Face Space hosted h2oGPT connection config: - server URL examples:
- h2oai/h2ogpt-chatbot2
- API key: - required, cannot be generated, must be provided by the h2oGPT server admin

See: https://github.com/h2oai/h2ogpt/blob/main/docs/README_CLIENT.md

property client

class h2o_sonar.lib.integrations.genai.H2oGpteRagClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: RagClient

h2oGPTe RAG client.

CFG_EMBEDDING_MODEL = 'embedding_model'

CFG_PROMPT_TEMPLATE_ID = 'prompt_template_id'

DEFAULT_TIMEOUT = 420

MODEL_SPEC_AUTO = 'auto'

MODEL_SPEC_COL = 'llm-inherited-from-collection'

MODEL_SPEC_COL_OPT_E = ''

MODEL_SPEC_COL_OPT_N = None

class TypedLlmConfigDict

Bases: dict

chat_conversation: List[Tuple[str, str]] | None

llm: str | int | None

llm_args: TypedH2ogptLlmConfigDict | None

pre_prompt_query: str | None

prompt_query: str | None

system_prompt: str | None

text_context_list: List[str] | None

timeout: float | None

class TypedRagConfigDict

Bases: dict

embedding_model: str | None

llm: str | int | None

llm_args: TypedH2ogptLlmConfigDict | None

pre_prompt_query: str | None

pre_prompt_summary: str | None

prompt_query: str | None

prompt_summary: str | None

prompt_template_id: str | None

rag_config: Dict[str, str | int] | None

self_reflection_config: Dict[str, str | int] | None

system_prompt: str | None

timeout: float | None

ask_collection(collection_id: str, prompts: List[str], llm_model_name: str = '', include_chunks: int = 0, include_system_prompt: bool = False, chunk_retrieval_method: str = 'ANSWER_REFS', chat_session_id: str | None = None, retry_attempt: int = 0, retry_attempts: int = 0, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, **extra_params) → List[LlmRagAnswer]

Ask h2oGPTe collection.

Parameters:

collection_idstr: h2oGPTe collection ID.
promptsList[str]: Prompts to ask.
llm_model_namestr: Optional base LLM model name.
include_chunksint: Optional parameter to determine also relevant (text) chunks - lexical search using the given query is made.
include_system_promptbool: Optional parameter to determine if the system prompt should be included.
chunk_retrieval_methodstr: Optional parameter to determine how to retrieve chunks. Check H2oGpteChunkRetrievalMethod for possible values.
chat_session_idOptional[str]: Optional parameter to specify the chat session ID allowing to reuse the same session - which uses the chat history as context i.e. stateful chat session / multi-turn conversation.
retry_attemptint: Optional parameter to determine the retry attempt (debugging).
retry_attemptsint: Optional parameter to determine the number of possible retry attempts (debugging).
timeout_exp_backoffOptional[TimeoutRetryExpBackoffCtx]: Optional exponential backoff context for the timeout handling.
extra_params: Optional parameters to be passed to the h2oGPTe client session.query(). These parameters override the default values set in the connection and configuration.

Returns:

List[LlmHostClient.LlmRagAnswer]: List of tuples with prompt, answer, duration and chunks.

ask_model(prompts: List[str], llm_model_name: str = '', timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, **extra_params) → List[LlmRagAnswer]: Ask a h2oGPTe LLM (base) model.

property client

static config_factory(model_type: str = 'rag') → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client.

See: https://docs.h2o.ai/enterprise-h2ogpte/v1.4.13/guide/prompts

Parameters:

model_typecommons.ModelTypeExplanation: Model type explanation - “rag” or “llm”.

create_collection(doc_paths: List[str | Path], collection_name: str = '', upload_if_collection_exists: bool = True, model_cfg: Dict | None = None) → Tuple[str, str]

Create h2oGPTe collection and upload documents (corpus) to that collection.

Parameters:

doc_pathsList[Union[pathlib.Path, str]]: Paths (local filesystem) to the documents to be uploaded.
collection_namestr: Optional parameter with the document collection use (if specified) or create (if the given name does not exist)
upload_if_collection_existsbool: Optional parameter to upload the documents even if the collection exists.
model_cfgOptional[Dict]: Optional model configuration with the following parameters: - embedding_model : str - prompt_template_id : str

Returns:

Tuple[str, str]: h2oGPT Enterprise collection ID and URL.

static humanize_err_msg(ex: Exception, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None) → str: Make the error messages from the h2oGPTe (client) human friendly.

list_collections(offset: int = 0, limit: int = 1000)

list_llm_model_names(retries: int = 3) → List[str]

List h2oGPTe LLM models.

Parameters:

retriesint: Number of retries in case of h2oGPTe failure.

Returns:

List[str]: List of h2oGPTe LLM model names.

purge_collections(collection_ids: List[str] | None = None) → List

Purge h2oGPTe collections.

Parameters:

collection_idsOptional[List[str]]: List of collection IDs to be purged. If the list is empty, all collections created by this instance are purged.

purge_uploaded_docs(document_ids: List[str] | None = None) → List

Purge h2oGPTe uploaded documents.

Parameters:

document_idsOptional[List[str]]: List of document IDs to be purged. If the list is empty, all documents uploaded by this instance are purged.

class h2o_sonar.lib.integrations.genai.H2oLlmOpsClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: OpenAiLlmClient

H2O LLMOps client.

LLMs hosted by H2O LLMOps can be accessed either using the OpenAI API or the H2O GPT client. This client is based on``OpenAiLlmClient``.

See: https://internal-genai.dedicated.h2o.ai/v1/latestapp/ai.h2o.llmops

class h2o_sonar.lib.integrations.genai.LlmHostClient

Bases: ABC

A LLM host product client.

class LlmRagAnswer(prompt, answer, duration, context, cost, chat_session_id)

Bases: tuple

answer: Alias for field number 1

chat_session_id: Alias for field number 5

context: Alias for field number 3

cost: Alias for field number 4

duration: Alias for field number 2

prompt: Alias for field number 0

abstract ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) → List

property client

static config_factory() → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:

Dict: Prototype of the configuration for the client.

health_check(llm_model_name: str) → bool: Check if the judge is healthy and available.

abstract list_llm_model_names()

class h2o_sonar.lib.integrations.genai.MsAzureOpenAiLlmClient(connection: ConnectionConfig, base_url: str = '', deployment_name: str = '', api_version='2024-02-15-preview', logger: SonarLogger | None = None)

Bases: LlmHostClient

Microsoft Azure hosted OpenAI LLM client.

DEFAULT_API_VERSION = '2024-02-15-preview'

ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) → List[LlmRagAnswer]: Ask a Microsoft Azure OpenAi hosted LLM model.

property client

static config_factory() → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:

Dict: Prototype of the configuration for the client.

static config_normalize(config: Dict) → Dict

list_llm_model_names() → List[str]

List ALL Microsoft Azure hosted OpenAI LLM models.

IMPORTANT: these are NOT models provided by the particular deployment for which was the client created, but all models available in the Azure OpenAI API.

class h2o_sonar.lib.integrations.genai.OllamaClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: LlmHostClient

Ollama client.

See https://ollama.com/

class TypedOllamaConfigDict

Bases: dict

context: str | None

format: str

images: List[str] | None

options: TypedOllamaModelFileDict | None

raw: bool

system: str | None

ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) → List

property client

static config_factory() → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:

Dict: Prototype of the configuration for the client.

health_check(llm_model_name: str) → bool: Check if the judge is healthy and available.

list_llm_model_names() → List[str]

h2o_sonar.lib.integrations.genai.OpenAiAssistantsRagClient: alias of OpenAiAssistantsRagClientVersion2

class h2o_sonar.lib.integrations.genai.OpenAiAssistantsRagClientVersion1(connection: ConnectionConfig, default_llm_model_name: str = 'gpt-4o', logger=None)

Bases: RagClient

OpenAI RAG client - Assistants AI with enabled File Search/Retrieval tool.

This client leaks vector stores with zero size. Using the old API there is no way to remove them. Since the size is zero it shouldn’t cost anything but it’s not nice to leave mess.

@see https://github.com/openai/openai-python/blob/v1.20.0/api.md

BASE_LLM_MODELS = ['gpt-4o', 'gpt-3.5-turbo-1106']

DEFAULT_LLM_MODEL = 'gpt-4o'

HEADERS_VERSION_1 = {'OpenAI-Beta': 'assistants=v1'}

HEADERS_VERSION_2 = {'OpenAI-Beta': 'assistants=v2'}

KWARGS_ASSISTANT = 'assistant_kwargs'

KWARGS_RUN = 'run_kwargs'

KWARGS_THREAD = 'thread_kwargs'

ask_collection(assistant_id: str, prompts: List[str], include_chunks: int = 0, **kwargs) → List[LlmRagAnswer]

Ask OpenAI Assistant with retrieval tool enabled and corpus uploaded. This method creates a new thread for each prompt and retrieves the answer as well as relevant chunks (if requested).

Parameters:

assistant_idstr: OpenAI Assistant ID.
promptsList[str]: Prompts to ask.
include_chunksint: Optional parameter to determine also relevant (text) chunks.

ask_model(prompts: List[str], llm_model_name: str = '', is_one_prompt: bool = False, **kwargs) → List[LlmRagAnswer]

Ask a OpenAI LLM (base) model (minimalistic version without messages and parameterization of system prompts, assisting content, parameters, …).

Parameters:

promptsList[str]: Prompts to ask.
llm_model_namestr: Optional LLM model name to use for the answer.
is_one_promptbool: Optional parameter to decide whether to ask all prompts in one request (all prompts will be used as the context for the last prompt) or in separate requests.

Returns:

LlmHostClient.LlmRagAnswer: Named tuple with prompt, answer and duration.

property client

static config_factory() → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:

Dict: Prototype of the configuration for the client.

static config_normalize(config: Dict) → Dict

Normalize default values of the client configuration from the serializable dictionary representation to the OpenAI client configuration.

Parameters:

configDict: Configuration for the client in the serializable format.

Returns:

Dict: Normalized configuration for the client in the OpenAI format.

static config_resolve(in_kwargs: Dict, config_to_resolve: Dict, config_group_key: str, required_keys: Dict[str, Any]) → Dict

Resolve the configuration for the client by ensuring that the required keys are set in the given configuration group:

Model parameters’ priority:

HIGH: kwargs e.g. “instructions” MEDIUM: kwargs[“assistant_kwargs”] e.g. “instructions” LOW: required defaults

Model parameters resolution method:

start with EMPTY/SNAPSHOT parameters
apply (non-assistant) kwargs
apply assistant kwargs - if NOT already set
ensure defaults for REQUIRED parameters
normalize to OpenAI defaults

create_collection(doc_paths: List[str | Path], collection_name: str = '', llm_model_name: str = 'gpt-4o', **kwargs) → str

Create OpenAI Assistant with enabled retrieval tool and upload documents (corpus) to that assistant.

Parameters:

doc_pathsList[Union[pathlib.Path, str]]: Paths (local filesystem) to the documents to be uploaded.
llm_model_namestr: Base LLM model name to be used by RAG in the generation phase.

Returns:

str: OpenAI Assistant ID.

list_collections(offset: int = 0, limit: int = 10) → List

List OpenAI Assistants with retrieval tool enabled.

Parameters:

offsetint: Offset of the returned assistants - is always 0 in case of OpenAI implementation.
limitint: Limit the number of returned assistants.

Returns:

ListList[Assistant]: List of assistant instances.

list_llm_model_names(rag: bool = True) → List[str]

List OpenAI LLM models.

Parameters:

ragbool: Optional parameter to list only models supported by the OpenAI Assistants API (True, default) or all OpenAI LLM models.

Returns:

List[str]: List of LLM model names.

purge_collections(assistants_ids: List[str] | None = None) → List

Purge h2oGPTe collections.

Parameters:

assistants_idsOptional[List[str]]: List of OpenAI Assistant IDs to be purged. If the list is empty, all Assistants created by this instance are purged.

purge_uploaded_docs(document_ids: List[str] | None = None) → List

Purge h2oGPTe uploaded documents.

Parameters:

document_idsOptional[List[str]]: List of document IDs to be purged. If the list is empty, all documents uploaded by this instance are purged.

class h2o_sonar.lib.integrations.genai.OpenAiAssistantsRagClientVersion2(connection: ConnectionConfig, default_llm_model_name: str = 'gpt-4o', logger=None)

Bases: RagClient

OpenAI RAG client - Assistants AI with enabled file search tool.

File search tool is successor to retrieval tool from openai API v1. File search tool uses Vector Store which is a vector database that’s capable of both keyword and semantic search. Each vector_store can hold up to 10,000 files. Vector stores can be attached to both Assistants and Threads but in our implementation we attach vector stores to Assistants.

@see https://platform.openai.com/docs/assistants/tools/file-search

DEFAULT_LLM_MODEL = 'gpt-4o'

HEADERS_VERSION_2 = {'OpenAI-Beta': 'assistants=v2'}

ask_collection(assistant_id: str, prompts: List[str], include_chunks: int = 0, **kwargs) → List[LlmRagAnswer]

Ask OpenAI Assistant with file search tool enabled and corpus uploaded. This method creates a new thread for each prompt and retrieves the answer as well as relevant chunks (if requested).

Parameters:

assistant_idstr: OpenAI Assistant ID.
promptsList[str]: Prompts to ask.
include_chunksint: Optional parameter to determine also relevant (text) chunks.

ask_model(prompts: List[str], llm_model_name: str = '', is_one_prompt: bool = False, **kwargs) → List[LlmRagAnswer]

Ask a OpenAI LLM (base) model (minimalistic version without messages and parameterization of system prompts, assisting content, parameters, …).

Parameters:

promptsList[str]: Prompts to ask.
llm_model_namestr: Optional LLM model name to use for the answer.
is_one_promptbool: Optional parameter to decide whether to ask all prompts in one request (all prompts will be used as the context for the last prompt) or in separate requests.

Returns:

LlmHostClient.LlmRagAnswer: Named tuple with prompt, answer and duration.

property client

create_collection(doc_paths: List[str | Path], collection_name: str = '', llm_model_name: str = 'gpt-4o', assistant_name: str = '') → str

Create OpenAI Assistant with enabled file search tool and upload documents (corpus) to that assistant.

Parameters:

doc_pathsList[Union[pathlib.Path, str]]: Paths (local filesystem) to the documents to be uploaded.
collection_namestr: Optional parameter with the document collection use (if specified) or create (if the given name does not exist)
llm_model_namestr: Base LLM model name to be used by RAG in the generation phase.
assistant_namestr: Optional parameter with the string to name to new Assistant.

Returns:

str: OpenAI Assistant ID.

list_collections(offset: int = 0, limit: int = 10) → List

List OpenAI Assistants with file search tool enabled.

Parameters:

offsetint: Offset of the returned assistants - is always 0 in case of OpenAI implementation.
limitint: Limit the number of returned assistants.

Returns:

ListList[Assistant]: List of assistant instances.

list_llm_model_names(rag: bool = True) → List[str]

List OpenAI LLM models.

Parameters:

ragbool: Optional parameter to list only models supported by the OpenAI Assistants API (True, default) or all OpenAI LLM models.

Returns:

List[str]: List of LLM model names.

purge_collections(assistants_ids: List[str] | None = None) → List

Purge h2oGPTe collections.

Parameters:

assistants_idsOptional[List[str]]: List of OpenAI Assistant IDs to be purged. If the list is empty, all Assistants created by this instance are purged.

purge_uploaded_docs(document_ids: List[str] | None = None) → List

Purge h2oGPTe uploaded documents.

Parameters:

document_idsOptional[List[str]]: List of document IDs to be purged. If the list is empty, all documents uploaded by this instance are purged.

class h2o_sonar.lib.integrations.genai.OpenAiLlmClient(connection: ConnectionConfig, default_llm_model_name: str = 'gpt-4', logger: SonarLogger | None = None)

Bases: LlmHostClient

OpenAI LLM client.

DEFAULT_LLM_MODEL = 'gpt-4'

ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) → List[LlmRagAnswer]: Ask a OpenAi hosted LLM model.

property client

static config_factory() → Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:

Dict: Prototype of the configuration for the client.

static config_normalize(config: Dict) → Dict

Normalize default values of the client configuration from the serializable dictionary representation to the OpenAI client configuration.

Parameters:

configDict: Configuration for the client in the serializable format.

Returns:

Dict: Normalized configuration for the client in the OpenAI format.

list_llm_model_names() → List[str]

class h2o_sonar.lib.integrations.genai.RagChunkRetrievalMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

ANSWER_REFS = 2

LEXICAL = 1

class h2o_sonar.lib.integrations.genai.RagClient

Bases: LlmHostClient, ABC

A RAG product client.

CHUNKS_LIMIT = 42

abstract ask_collection(collection_id: str, prompts: List[str], llm_model_name: str = '', include_chunks: int = 0, chunk_retrieval_method: str = 'ANSWER_REFS', **kwargs)

abstract create_collection(doc_paths: List[str | Path], collection_name: str = '', **kwargs) → Tuple[str, str]

static get_collection_name(doc_paths: List[str | Path]) → str

abstract list_collections(offset: int = 0, limit: int = 10)

abstract purge_collections(collection_ids: List[str] | None = None)

abstract purge_uploaded_docs(document_ids: List[str] | None = None)

class h2o_sonar.lib.integrations.genai.TimeoutRetryExpBackoffCtx(backoff_factor: float = 4.0, min_backoff_secs: float = 5.0, max_backoff_secs: float = 420.0)

Bases: object

Exponential backoff context for the timeout handling. This context is meant to be used in RAG/LLM clients on retries - timeout is increased on each retry by the backoff factor.

BACKOFF_FACTOR: float = 4.0

MAX_BACKOFF_SECS: float = 420.0

MIN_BACKOFF_SECS: float = 5.0

static copy(ctx: TimeoutRetryExpBackoffCtx) → TimeoutRetryExpBackoffCtx

reset() → None: Reset the timeout to the initial value.

retry() → float: Call this method on retry to recalculate the timeout.

property timeout: float

class h2o_sonar.lib.integrations.genai.TypedH2ogptLlmConfigDict

Bases: dict

max_new_tokens: int

min_max_new_tokens: int

repetition_penalty: float

seed: int

temperature: float

top_k: int

top_p: float

class h2o_sonar.lib.integrations.genai.TypedOllamaModelFileDict

Bases: dict

mirorstat: int

mirorstat_eta: float

mirorstat_tau: float

num_ctx: int

num_predict: int

repeat_last_n: int

repeat_penalty: float

seed: int

stop: str

temperature: float

tfs_z: float

top_k: int

top_p: float

h2o_sonar.lib.integrations.genai.get_client_for_connection(connection: ConnectionConfig, logger: SonarLogger | None = None) → LlmHostClient | RagClient

Get a client for the given connection.

Parameters:

connectionh2o_sonar.config.ConnectionConfig: Connection configuration.
loggerOptional[loggers.SonarLogger]: Optional logger.

Returns:

LlmHostClient: An LLM host client.

h2o_sonar.lib.integrations.genai.log_action(logger: SonarLogger, description: str, indent: int = 0)

h2o_sonar.lib.integrations.mv_adapter module

H2O Model Validation adaptor.

class h2o_sonar.lib.integrations.mv_adapter.ExplainerToMvTestAdapter

Bases: object

H2O Eval Studio Explainer to H2O Model Validation MVTest adapter.

MV_PYTHON_MODULE_NAME = 'h2o-mv'

static assert_mv_test_status(explainer, mv_test)

check_mv_compatibility(explainer) → None

setup(h2o_sonar_config, persistence: ExplainerPersistence, logger: SonarLogger)

class h2o_sonar.lib.integrations.mv_adapter.MvResultJSonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)

class h2o_sonar.lib.integrations.mv_adapter.MvResultPersistence(target_dir_path: str | Path, mv_client=None, logger=None)

Bases: object

This class provides portable filesystem export and import of MV test results.

The design enables easy export of the results as a ZIP archive and import from the ZIP (or filesystem structure) to the runtime (MV) data class instances.

Result data are stored:

in the filesystem
either as a directory structure or ZIP archive (zipped directory structure)
with per-test directory JSon index file representing either a data class instance or a dictionary
JSon index having per test result field key and value either holding the data or pointing to another JSon index file in the ZIP archive/directory structure.

Filesystem structure:

MVTest/
    MVTestResults/
        report/
            AGE/
                binned_cat_view/
                    index.json
                ...
                numerical_view/
                index.json
            PAY_ATM6/
            index.json
        index.json
        psi_scores.csv
    MVTestArtifacts/
        ...
        index.json
    MVTestLog/
        log.json
    MVTestSettings/
        ...
        index.json
    index.json

DIR_MVARTIFACTS = 'MVTestArtifacts'

DIR_MVLOG = 'MVTestLog'

DIR_MVRESULTS = 'MVTestResults'

DIR_MVSETTINGS = 'MVTestSettings'

DIR_MVTEST = 'MVTest'

FILE_INDEX = 'index.json'

FORMAT_DATETIME = '%Y/%d/%m %H:%M:%S.%f'

KEY_COLUMNS = '_columns'

KEY_COLUMN_SUMMARIES = '_column_summaries'

KEY_COL_HISTOGRAM_STATS = 'column_histogram_stats'

KEY_DATA = 'data'

KEY_DIR = 'dir'

KEY_ERROR = 'error'

KEY_FILENAME = 'filename'

KEY_HASH = 'hash'

KEY_LEVEL = 'level'

KEY_MSGS = 'messages'

KEY_MV_FPATH = 'mv_fpath'

KEY_MV_ID = 'mvid'

KEY_MV_NAME = 'mv_name'

KEY_MV_TYPE = 'mv_type'

KEY_NAME = 'name'

KEY_N_COLS = 'n_cols'

KEY_N_ROWS = 'n_rows'

KEY_ORIGIN_OBJ_KEY = 'origin_obj_key'

KEY_PATH = 'path'

KEY_PLATFORM_MVID = 'platform_mvid'

KEY_PLATFORM_OBJ_KEY = 'platform_obj_key'

KEY_SAMPLE_TABLE = 'sample_table'

KEY_SHAPE = 'shape'

KEY_SIZE = 'size'

KEY_SIZE_STR = 'size_str'

KEY_SUMMARY = '_summary'

KEY_TEXT = 'text'

KEY_TS = 'timestamp'

KEY_TYPE = 'type'

TYPE_MV_ARTIFACTS = "<class 'h2o_mv.core.mv_test.MVTestArtifacts'>"

TYPE_MV_ARTIFACT_INFO = "<class 'h2o_mv.core.mv_test.ArtifactInfo'>"

TYPE_NONE_STR = "<class 'NoneType'>"

TYPE_SHAPE_STR = "<custom-type 'shape'>"

export_mv_test(mv_test_type: str, mv_test_name: str, mv_test_id: str, mv_test_results=None, mv_test_settings=None, mv_test_artifacts: Dict | None = None, mv_test_log=None, export_dir_name: str = 'MVTest', fail_fast: bool = False)

Export instances created by the MVTest.

Parameters:

mv_test_typestr: Python type of the MV test.
mv_test_namestr: Name of the MV test.
mv_test_idstr: ID of the MV test.
mv_test_results: MV test results - h2o_mv.core.mv_test.MVTestResult.
mv_test_settings: MV test settings - h2o_mv.core.mv_test.MVTestSettings.
mv_test_artifactsOptional[Dict]: Artifacts created by the MV test - dictionary of artifact name to h2o_mv.core.mv_test.ArtifactInfo.
mv_test_log: MV test log - h2o_mv.core.mv_test.MVTestLog.
export_dir_namestr: A name of the directory where the result will be exported. The directory will be created in the export_dir_path.
fail_fastbool: Don’t be robust, but throw an exception on the first error.

Returns:

Tuple[Union[str, pathlib.Path], Dict]: Directory with MV test outputs saved on the filesystem and a dictionary with the index file content.

import_mv_test(zip_path: str | Path = '', fail_fast: bool = False)

Import MVTest related objects - results, settings, artifacts and log.

Parameters:

zip_pathUnion[str, pathlib.Path]: Path to the ZIP archive with the MV export is stored.
fail_fastbool: Don’t be robust, but throw an exception on the first error.

Returns:

Dict: A dictionary with imported instances.

set_target_dir(target_dir_path: Path)

set_test_dir_name(custom_test_dir: str = 'MVTest')

static split_full_type(type_str: str) → Tuple[str, str]

h2o_sonar.lib.integrations.ragas_adapter module

h2o_sonar.lib.integrations.ragas_adapter.get_ragas_privacy_safe_embeddings(embeddings_provider: str = 'huggingface')

ragas uses OpenAI embeddings by default. This function is used to use custom embeddings with ragas so that the validation data are not send to a 3rd party - privacy first.

ragas supports Huggingface embeddings - embeddings model (BAAI/bge-small-en-v1.5) is downloaded from Huggingface model hub and used to calculate embeddings locally - data are not send to a 3rd party.

Parameters:

embeddings_providerstr: Name of the embeddings provider. Currently only “huggingface” is supported. It uses BAAI/bge-small-en-v1.5 model.

h2o_sonar.lib.integrations.ragas_adapter.get_ragas_to_sonar_llm_adapter(custom_judge: EvaluationJudge, logger=None)

h2o_sonar.lib.integrations package

Submodules

h2o_sonar.lib.integrations.genai module

h2o_sonar.lib.integrations.mv_adapter module

h2o_sonar.lib.integrations.ragas_adapter module

Module contents