h2o_sonar.lib.integrations package

Submodules

h2o_sonar.lib.integrations.genai module

class h2o_sonar.lib.integrations.genai.AmazonBedrockFoundationModel(model_arn: str, model_id: str, model_name: str, customizations_supported: List[Literal['FINE_TUNING', 'CONTINUED_PRE_TRAINING']], inference_types_supported: List[Literal['ON_DEMAND', 'PROVISIONED']], input_modalities: List[str], model_lifecycle_status: str, output_modalities: List[Literal['TEXT', 'IMAGE', 'EMBEDDING']], provider_name: str, response_streaming_supported: bool)

Bases: object

customizations_supported: List[Literal['FINE_TUNING', 'CONTINUED_PRE_TRAINING']]
inference_types_supported: List[Literal['ON_DEMAND', 'PROVISIONED']]
input_modalities: List[str]
model_arn: str
model_id: str
model_lifecycle_status: str
model_name: str
output_modalities: List[Literal['TEXT', 'IMAGE', 'EMBEDDING']]
provider_name: str
response_streaming_supported: bool
class h2o_sonar.lib.integrations.genai.AmazonBedrockKnowledgeBase(id: str, name: str, description: str, status: str, updated_at: datetime.datetime)

Bases: object

description: str
id: str
name: str
status: str
updated_at: datetime
class h2o_sonar.lib.integrations.genai.AmazonBedrockRagClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: RagClient

ES_TEMP_PREFIX = 'es-temp'
ask_collection(collection_id: str, prompts: List[str], llm_model_name: str = '', include_chunks: int = 0, chunk_retrieval_method: str = 'ANSWER_REFS', **kwargs)
ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) List
bedrock
bedrock_agent
bedrock_agent_runtime
bedrock_runtime
static config_factory() Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:
Dict

Prototype of the configuration for the client.

property connection: ConnectionConfig
create_collection(doc_paths: List[Path | str], collection_name: str = '', **kwargs) Tuple[str, str]
get_rag_conf(collection_id, model_arn)
iam
classmethod is_model_enabled(connection, model_id)
list_collections(offset: int = 0, limit: int = 1000) List[AmazonBedrockKnowledgeBase]
list_llm_model_names()
list_llm_models()
opensearchserverless
purge_collections(collection_ids: List[str] | None = None)
purge_uploaded_docs(document_ids: List[str] | None = None)
s3
s3_resource
sts
class h2o_sonar.lib.integrations.genai.AwsClient

Bases: object

class h2o_sonar.lib.integrations.genai.AwsResource

Bases: object

class h2o_sonar.lib.integrations.genai.H2oGptLlmClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: OpenAiLlmClient

h2oGPT client - connects to the h2oGPT server:

  • OpenAI client is used to connect to h2oGPT, h2ogpt_client is DEPRECATED.

  • standalone h2oGPT server connection config: - server URL examples:

    • API key: - required, cannot be generated, must be provided by the h2oGPT server admin

  • Hugging Face Space hosted h2oGPT connection config: - server URL examples:

    • h2oai/h2ogpt-chatbot2

    • API key: - required, cannot be generated, must be provided by the h2oGPT server admin

See: https://github.com/h2oai/h2ogpt/blob/main/docs/README_CLIENT.md

property client
class h2o_sonar.lib.integrations.genai.H2oGpteRagClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: RagClient

h2oGPTe RAG client.

CFG_EMBEDDING_MODEL = 'embedding_model'
CFG_PROMPT_TEMPLATE_ID = 'prompt_template_id'
DEFAULT_TIMEOUT = 420
MODEL_SPEC_AUTO = 'auto'
MODEL_SPEC_COL = 'llm-inherited-from-collection'
MODEL_SPEC_COL_OPT_E = ''
MODEL_SPEC_COL_OPT_N = None
class TypedLlmConfigDict

Bases: dict

chat_conversation: List[Tuple[str, str]] | None
llm: str | int | None
llm_args: TypedH2ogptLlmConfigDict | None
pre_prompt_query: str | None
prompt_query: str | None
system_prompt: str | None
text_context_list: List[str] | None
timeout: float | None
class TypedRagConfigDict

Bases: dict

embedding_model: str | None
llm: str | int | None
llm_args: TypedH2ogptLlmConfigDict | None
pre_prompt_query: str | None
pre_prompt_summary: str | None
prompt_query: str | None
prompt_summary: str | None
prompt_template_id: str | None
rag_config: Dict[str, str | int] | None
self_reflection_config: Dict[str, str | int] | None
system_prompt: str | None
timeout: float | None
ask_collection(collection_id: str, prompts: List[str], llm_model_name: str = '', include_chunks: int = 0, include_system_prompt: bool = False, chunk_retrieval_method: str = 'ANSWER_REFS', chat_session_id: str | None = None, retry_attempt: int = 0, retry_attempts: int = 0, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, **extra_params) List[LlmRagAnswer]

Ask h2oGPTe collection.

Parameters:
collection_idstr

h2oGPTe collection ID.

promptsList[str]

Prompts to ask.

llm_model_namestr

Optional base LLM model name.

include_chunksint

Optional parameter to determine also relevant (text) chunks - lexical search using the given query is made.

include_system_promptbool

Optional parameter to determine if the system prompt should be included.

chunk_retrieval_methodstr

Optional parameter to determine how to retrieve chunks. Check H2oGpteChunkRetrievalMethod for possible values.

chat_session_idOptional[str]

Optional parameter to specify the chat session ID allowing to reuse the same session - which uses the chat history as context i.e. stateful chat session / multi-turn conversation.

retry_attemptint

Optional parameter to determine the retry attempt (debugging).

retry_attemptsint

Optional parameter to determine the number of possible retry attempts (debugging).

timeout_exp_backoffOptional[TimeoutRetryExpBackoffCtx]

Optional exponential backoff context for the timeout handling.

extra_params

Optional parameters to be passed to the h2oGPTe client session.query(). These parameters override the default values set in the connection and configuration.

Returns:
List[LlmHostClient.LlmRagAnswer]

List of tuples with prompt, answer, duration and chunks.

ask_model(prompts: List[str], llm_model_name: str = '', timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None, **extra_params) List[LlmRagAnswer]

Ask a h2oGPTe LLM (base) model.

property client
static config_factory(model_type: str = 'rag') Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client.

See: https://docs.h2o.ai/enterprise-h2ogpte/v1.4.13/guide/prompts

Parameters:
model_typecommons.ModelTypeExplanation

Model type explanation - “rag” or “llm”.

create_collection(doc_paths: List[str | Path], collection_name: str = '', upload_if_collection_exists: bool = True, model_cfg: Dict | None = None) Tuple[str, str]

Create h2oGPTe collection and upload documents (corpus) to that collection.

Parameters:
doc_pathsList[Union[pathlib.Path, str]]

Paths (local filesystem) to the documents to be uploaded.

collection_namestr

Optional parameter with the document collection use (if specified) or create (if the given name does not exist)

upload_if_collection_existsbool

Optional parameter to upload the documents even if the collection exists.

model_cfgOptional[Dict]

Optional model configuration with the following parameters: - embedding_model : str - prompt_template_id : str

Returns:
Tuple[str, str]

h2oGPT Enterprise collection ID and URL.

static humanize_err_msg(ex: Exception, timeout_exp_backoff: TimeoutRetryExpBackoffCtx | None = None) str

Make the error messages from the h2oGPTe (client) human friendly.

list_collections(offset: int = 0, limit: int = 1000)
list_llm_model_names(retries: int = 3) List[str]

List h2oGPTe LLM models.

Parameters:
retriesint

Number of retries in case of h2oGPTe failure.

Returns:
List[str]

List of h2oGPTe LLM model names.

purge_collections(collection_ids: List[str] | None = None) List

Purge h2oGPTe collections.

Parameters:
collection_idsOptional[List[str]]

List of collection IDs to be purged. If the list is empty, all collections created by this instance are purged.

purge_uploaded_docs(document_ids: List[str] | None = None) List

Purge h2oGPTe uploaded documents.

Parameters:
document_idsOptional[List[str]]

List of document IDs to be purged. If the list is empty, all documents uploaded by this instance are purged.

class h2o_sonar.lib.integrations.genai.H2oLlmOpsClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: OpenAiLlmClient

H2O LLMOps client.

LLMs hosted by H2O LLMOps can be accessed either using the OpenAI API or the H2O GPT client. This client is based on``OpenAiLlmClient``.

See: https://internal-genai.dedicated.h2o.ai/v1/latestapp/ai.h2o.llmops

class h2o_sonar.lib.integrations.genai.LlmHostClient

Bases: ABC

A LLM host product client.

class LlmRagAnswer(prompt, answer, duration, context, cost, chat_session_id)

Bases: tuple

answer

Alias for field number 1

chat_session_id

Alias for field number 5

context

Alias for field number 3

cost

Alias for field number 4

duration

Alias for field number 2

prompt

Alias for field number 0

abstract ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) List
property client
static config_factory() Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:
Dict

Prototype of the configuration for the client.

health_check(llm_model_name: str) bool

Check if the judge is healthy and available.

abstract list_llm_model_names()
class h2o_sonar.lib.integrations.genai.MsAzureOpenAiLlmClient(connection: ConnectionConfig, base_url: str = '', deployment_name: str = '', api_version='2024-02-15-preview', logger: SonarLogger | None = None)

Bases: LlmHostClient

Microsoft Azure hosted OpenAI LLM client.

DEFAULT_API_VERSION = '2024-02-15-preview'
ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) List[LlmRagAnswer]

Ask a Microsoft Azure OpenAi hosted LLM model.

property client
static config_factory() Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:
Dict

Prototype of the configuration for the client.

static config_normalize(config: Dict) Dict
list_llm_model_names() List[str]

List ALL Microsoft Azure hosted OpenAI LLM models.

IMPORTANT: these are NOT models provided by the particular deployment for which was the client created, but all models available in the Azure OpenAI API.

class h2o_sonar.lib.integrations.genai.OllamaClient(connection: ConnectionConfig, logger: SonarLogger | None = None)

Bases: LlmHostClient

Ollama client.

See https://ollama.com/

class TypedOllamaConfigDict

Bases: dict

context: str | None
format: str
images: List[str] | None
options: TypedOllamaModelFileDict | None
raw: bool
system: str | None
ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) List
property client
static config_factory() Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:
Dict

Prototype of the configuration for the client.

health_check(llm_model_name: str) bool

Check if the judge is healthy and available.

list_llm_model_names() List[str]
h2o_sonar.lib.integrations.genai.OpenAiAssistantsRagClient

alias of OpenAiAssistantsRagClientVersion2

class h2o_sonar.lib.integrations.genai.OpenAiAssistantsRagClientVersion1(connection: ConnectionConfig, default_llm_model_name: str = 'gpt-4o', logger=None)

Bases: RagClient

OpenAI RAG client - Assistants AI with enabled File Search/Retrieval tool.

This client leaks vector stores with zero size. Using the old API there is no way to remove them. Since the size is zero it shouldn’t cost anything but it’s not nice to leave mess.

@see https://github.com/openai/openai-python/blob/v1.20.0/api.md

BASE_LLM_MODELS = ['gpt-4o', 'gpt-3.5-turbo-1106']
DEFAULT_LLM_MODEL = 'gpt-4o'
HEADERS_VERSION_1 = {'OpenAI-Beta': 'assistants=v1'}
HEADERS_VERSION_2 = {'OpenAI-Beta': 'assistants=v2'}
KWARGS_ASSISTANT = 'assistant_kwargs'
KWARGS_RUN = 'run_kwargs'
KWARGS_THREAD = 'thread_kwargs'
ask_collection(assistant_id: str, prompts: List[str], include_chunks: int = 0, **kwargs) List[LlmRagAnswer]

Ask OpenAI Assistant with retrieval tool enabled and corpus uploaded. This method creates a new thread for each prompt and retrieves the answer as well as relevant chunks (if requested).

Parameters:
assistant_idstr

OpenAI Assistant ID.

promptsList[str]

Prompts to ask.

include_chunksint

Optional parameter to determine also relevant (text) chunks.

ask_model(prompts: List[str], llm_model_name: str = '', is_one_prompt: bool = False, **kwargs) List[LlmRagAnswer]

Ask a OpenAI LLM (base) model (minimalistic version without messages and parameterization of system prompts, assisting content, parameters, …).

Parameters:
promptsList[str]

Prompts to ask.

llm_model_namestr

Optional LLM model name to use for the answer.

is_one_promptbool

Optional parameter to decide whether to ask all prompts in one request (all prompts will be used as the context for the last prompt) or in separate requests.

Returns:
LlmHostClient.LlmRagAnswer

Named tuple with prompt, answer and duration.

property client
static config_factory() Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:
Dict

Prototype of the configuration for the client.

static config_normalize(config: Dict) Dict

Normalize default values of the client configuration from the serializable dictionary representation to the OpenAI client configuration.

Parameters:
configDict

Configuration for the client in the serializable format.

Returns:
Dict

Normalized configuration for the client in the OpenAI format.

static config_resolve(in_kwargs: Dict, config_to_resolve: Dict, config_group_key: str, required_keys: Dict[str, Any]) Dict

Resolve the configuration for the client by ensuring that the required keys are set in the given configuration group:

Model parameters’ priority:

HIGH: kwargs e.g. “instructions” MEDIUM: kwargs[“assistant_kwargs”] e.g. “instructions” LOW: required defaults

Model parameters resolution method:

  1. start with EMPTY/SNAPSHOT parameters

  2. apply (non-assistant) kwargs

  3. apply assistant kwargs - if NOT already set

  4. ensure defaults for REQUIRED parameters

  5. normalize to OpenAI defaults

create_collection(doc_paths: List[str | Path], collection_name: str = '', llm_model_name: str = 'gpt-4o', **kwargs) str

Create OpenAI Assistant with enabled retrieval tool and upload documents (corpus) to that assistant.

Parameters:
doc_pathsList[Union[pathlib.Path, str]]

Paths (local filesystem) to the documents to be uploaded.

llm_model_namestr

Base LLM model name to be used by RAG in the generation phase.

Returns:
str

OpenAI Assistant ID.

list_collections(offset: int = 0, limit: int = 10) List

List OpenAI Assistants with retrieval tool enabled.

Parameters:
offsetint

Offset of the returned assistants - is always 0 in case of OpenAI implementation.

limitint

Limit the number of returned assistants.

Returns:
ListList[Assistant]

List of assistant instances.

list_llm_model_names(rag: bool = True) List[str]

List OpenAI LLM models.

Parameters:
ragbool

Optional parameter to list only models supported by the OpenAI Assistants API (True, default) or all OpenAI LLM models.

Returns:
List[str]

List of LLM model names.

purge_collections(assistants_ids: List[str] | None = None) List

Purge h2oGPTe collections.

Parameters:
assistants_idsOptional[List[str]]

List of OpenAI Assistant IDs to be purged. If the list is empty, all Assistants created by this instance are purged.

purge_uploaded_docs(document_ids: List[str] | None = None) List

Purge h2oGPTe uploaded documents.

Parameters:
document_idsOptional[List[str]]

List of document IDs to be purged. If the list is empty, all documents uploaded by this instance are purged.

class h2o_sonar.lib.integrations.genai.OpenAiAssistantsRagClientVersion2(connection: ConnectionConfig, default_llm_model_name: str = 'gpt-4o', logger=None)

Bases: RagClient

OpenAI RAG client - Assistants AI with enabled file search tool.

File search tool is successor to retrieval tool from openai API v1. File search tool uses Vector Store which is a vector database that’s capable of both keyword and semantic search. Each vector_store can hold up to 10,000 files. Vector stores can be attached to both Assistants and Threads but in our implementation we attach vector stores to Assistants.

@see https://platform.openai.com/docs/assistants/tools/file-search

DEFAULT_LLM_MODEL = 'gpt-4o'
HEADERS_VERSION_2 = {'OpenAI-Beta': 'assistants=v2'}
ask_collection(assistant_id: str, prompts: List[str], include_chunks: int = 0, **kwargs) List[LlmRagAnswer]

Ask OpenAI Assistant with file search tool enabled and corpus uploaded. This method creates a new thread for each prompt and retrieves the answer as well as relevant chunks (if requested).

Parameters:
assistant_idstr

OpenAI Assistant ID.

promptsList[str]

Prompts to ask.

include_chunksint

Optional parameter to determine also relevant (text) chunks.

ask_model(prompts: List[str], llm_model_name: str = '', is_one_prompt: bool = False, **kwargs) List[LlmRagAnswer]

Ask a OpenAI LLM (base) model (minimalistic version without messages and parameterization of system prompts, assisting content, parameters, …).

Parameters:
promptsList[str]

Prompts to ask.

llm_model_namestr

Optional LLM model name to use for the answer.

is_one_promptbool

Optional parameter to decide whether to ask all prompts in one request (all prompts will be used as the context for the last prompt) or in separate requests.

Returns:
LlmHostClient.LlmRagAnswer

Named tuple with prompt, answer and duration.

property client
create_collection(doc_paths: List[str | Path], collection_name: str = '', llm_model_name: str = 'gpt-4o', assistant_name: str = '') str

Create OpenAI Assistant with enabled file search tool and upload documents (corpus) to that assistant.

Parameters:
doc_pathsList[Union[pathlib.Path, str]]

Paths (local filesystem) to the documents to be uploaded.

collection_namestr

Optional parameter with the document collection use (if specified) or create (if the given name does not exist)

llm_model_namestr

Base LLM model name to be used by RAG in the generation phase.

assistant_namestr

Optional parameter with the string to name to new Assistant.

Returns:
str

OpenAI Assistant ID.

list_collections(offset: int = 0, limit: int = 10) List

List OpenAI Assistants with file search tool enabled.

Parameters:
offsetint

Offset of the returned assistants - is always 0 in case of OpenAI implementation.

limitint

Limit the number of returned assistants.

Returns:
ListList[Assistant]

List of assistant instances.

list_llm_model_names(rag: bool = True) List[str]

List OpenAI LLM models.

Parameters:
ragbool

Optional parameter to list only models supported by the OpenAI Assistants API (True, default) or all OpenAI LLM models.

Returns:
List[str]

List of LLM model names.

purge_collections(assistants_ids: List[str] | None = None) List

Purge h2oGPTe collections.

Parameters:
assistants_idsOptional[List[str]]

List of OpenAI Assistant IDs to be purged. If the list is empty, all Assistants created by this instance are purged.

purge_uploaded_docs(document_ids: List[str] | None = None) List

Purge h2oGPTe uploaded documents.

Parameters:
document_idsOptional[List[str]]

List of document IDs to be purged. If the list is empty, all documents uploaded by this instance are purged.

class h2o_sonar.lib.integrations.genai.OpenAiLlmClient(connection: ConnectionConfig, default_llm_model_name: str = 'gpt-4', logger: SonarLogger | None = None)

Bases: LlmHostClient

OpenAI LLM client.

DEFAULT_LLM_MODEL = 'gpt-4'
ask_model(prompts: List[str], llm_model_name: str = '', **extra_params) List[LlmRagAnswer]

Ask a OpenAi hosted LLM model.

property client
static config_factory() Dict

Get the prototype of the configuration for the client - it can be used as reflection of the parameters names which might be passed as custom configuration of the client. The prototype dictionary is type safe for de/serializable to JSon. It is expected that users will use just the keys which they need to set and will skip the rest.

Returns:
Dict

Prototype of the configuration for the client.

static config_normalize(config: Dict) Dict

Normalize default values of the client configuration from the serializable dictionary representation to the OpenAI client configuration.

Parameters:
configDict

Configuration for the client in the serializable format.

Returns:
Dict

Normalized configuration for the client in the OpenAI format.

list_llm_model_names() List[str]
class h2o_sonar.lib.integrations.genai.RagChunkRetrievalMethod(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

ANSWER_REFS = 2
LEXICAL = 1
class h2o_sonar.lib.integrations.genai.RagClient

Bases: LlmHostClient, ABC

A RAG product client.

CHUNKS_LIMIT = 42
abstract ask_collection(collection_id: str, prompts: List[str], llm_model_name: str = '', include_chunks: int = 0, chunk_retrieval_method: str = 'ANSWER_REFS', **kwargs)
abstract create_collection(doc_paths: List[str | Path], collection_name: str = '', **kwargs) Tuple[str, str]
static get_collection_name(doc_paths: List[str | Path]) str
abstract list_collections(offset: int = 0, limit: int = 10)
abstract purge_collections(collection_ids: List[str] | None = None)
abstract purge_uploaded_docs(document_ids: List[str] | None = None)
class h2o_sonar.lib.integrations.genai.TimeoutRetryExpBackoffCtx(backoff_factor: float = 4.0, min_backoff_secs: float = 5.0, max_backoff_secs: float = 420.0)

Bases: object

Exponential backoff context for the timeout handling. This context is meant to be used in RAG/LLM clients on retries - timeout is increased on each retry by the backoff factor.

BACKOFF_FACTOR: float = 4.0
MAX_BACKOFF_SECS: float = 420.0
MIN_BACKOFF_SECS: float = 5.0
static copy(ctx: TimeoutRetryExpBackoffCtx) TimeoutRetryExpBackoffCtx
reset() None

Reset the timeout to the initial value.

retry() float

Call this method on retry to recalculate the timeout.

property timeout: float
class h2o_sonar.lib.integrations.genai.TypedH2ogptLlmConfigDict

Bases: dict

max_new_tokens: int
min_max_new_tokens: int
repetition_penalty: float
seed: int
temperature: float
top_k: int
top_p: float
class h2o_sonar.lib.integrations.genai.TypedOllamaModelFileDict

Bases: dict

mirorstat: int
mirorstat_eta: float
mirorstat_tau: float
num_ctx: int
num_predict: int
repeat_last_n: int
repeat_penalty: float
seed: int
stop: str
temperature: float
tfs_z: float
top_k: int
top_p: float
h2o_sonar.lib.integrations.genai.get_client_for_connection(connection: ConnectionConfig, logger: SonarLogger | None = None) LlmHostClient | RagClient

Get a client for the given connection.

Parameters:
connectionh2o_sonar.config.ConnectionConfig

Connection configuration.

loggerOptional[loggers.SonarLogger]

Optional logger.

Returns:
LlmHostClient

An LLM host client.

h2o_sonar.lib.integrations.genai.log_action(logger: SonarLogger, description: str, indent: int = 0)

h2o_sonar.lib.integrations.mv_adapter module

H2O Model Validation adaptor.

class h2o_sonar.lib.integrations.mv_adapter.ExplainerToMvTestAdapter

Bases: object

H2O Eval Studio Explainer to H2O Model Validation MVTest adapter.

MV_PYTHON_MODULE_NAME = 'h2o-mv'
static assert_mv_test_status(explainer, mv_test)
check_mv_compatibility(explainer) None
setup(h2o_sonar_config, persistence: ExplainerPersistence, logger: SonarLogger)
class h2o_sonar.lib.integrations.mv_adapter.MvResultJSonEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
class h2o_sonar.lib.integrations.mv_adapter.MvResultPersistence(target_dir_path: str | Path, mv_client=None, logger=None)

Bases: object

This class provides portable filesystem export and import of MV test results.

The design enables easy export of the results as a ZIP archive and import from the ZIP (or filesystem structure) to the runtime (MV) data class instances.

Result data are stored:

  • in the filesystem

  • either as a directory structure or ZIP archive (zipped directory structure)

  • with per-test directory JSon index file representing either a data class instance or a dictionary

  • JSon index having per test result field key and value either holding the data or pointing to another JSon index file in the ZIP archive/directory structure.

Filesystem structure:

MVTest/
    MVTestResults/
        report/
            AGE/
                binned_cat_view/
                    index.json
                ...
                numerical_view/
                index.json
            PAY_ATM6/
            index.json
        index.json
        psi_scores.csv
    MVTestArtifacts/
        ...
        index.json
    MVTestLog/
        log.json
    MVTestSettings/
        ...
        index.json
    index.json
DIR_MVARTIFACTS = 'MVTestArtifacts'
DIR_MVLOG = 'MVTestLog'
DIR_MVRESULTS = 'MVTestResults'
DIR_MVSETTINGS = 'MVTestSettings'
DIR_MVTEST = 'MVTest'
FILE_INDEX = 'index.json'
FORMAT_DATETIME = '%Y/%d/%m %H:%M:%S.%f'
KEY_COLUMNS = '_columns'
KEY_COLUMN_SUMMARIES = '_column_summaries'
KEY_COL_HISTOGRAM_STATS = 'column_histogram_stats'
KEY_DATA = 'data'
KEY_DIR = 'dir'
KEY_ERROR = 'error'
KEY_FILENAME = 'filename'
KEY_HASH = 'hash'
KEY_LEVEL = 'level'
KEY_MSGS = 'messages'
KEY_MV_FPATH = 'mv_fpath'
KEY_MV_ID = 'mvid'
KEY_MV_NAME = 'mv_name'
KEY_MV_TYPE = 'mv_type'
KEY_NAME = 'name'
KEY_N_COLS = 'n_cols'
KEY_N_ROWS = 'n_rows'
KEY_ORIGIN_OBJ_KEY = 'origin_obj_key'
KEY_PATH = 'path'
KEY_PLATFORM_MVID = 'platform_mvid'
KEY_PLATFORM_OBJ_KEY = 'platform_obj_key'
KEY_SAMPLE_TABLE = 'sample_table'
KEY_SHAPE = 'shape'
KEY_SIZE = 'size'
KEY_SIZE_STR = 'size_str'
KEY_SUMMARY = '_summary'
KEY_TEXT = 'text'
KEY_TS = 'timestamp'
KEY_TYPE = 'type'
TYPE_MV_ARTIFACTS = "<class 'h2o_mv.core.mv_test.MVTestArtifacts'>"
TYPE_MV_ARTIFACT_INFO = "<class 'h2o_mv.core.mv_test.ArtifactInfo'>"
TYPE_NONE_STR = "<class 'NoneType'>"
TYPE_SHAPE_STR = "<custom-type 'shape'>"
export_mv_test(mv_test_type: str, mv_test_name: str, mv_test_id: str, mv_test_results=None, mv_test_settings=None, mv_test_artifacts: Dict | None = None, mv_test_log=None, export_dir_name: str = 'MVTest', fail_fast: bool = False)

Export instances created by the MVTest.

Parameters:
mv_test_typestr

Python type of the MV test.

mv_test_namestr

Name of the MV test.

mv_test_idstr

ID of the MV test.

mv_test_results

MV test results - h2o_mv.core.mv_test.MVTestResult.

mv_test_settings

MV test settings - h2o_mv.core.mv_test.MVTestSettings.

mv_test_artifactsOptional[Dict]

Artifacts created by the MV test - dictionary of artifact name to h2o_mv.core.mv_test.ArtifactInfo.

mv_test_log

MV test log - h2o_mv.core.mv_test.MVTestLog.

export_dir_namestr

A name of the directory where the result will be exported. The directory will be created in the export_dir_path.

fail_fastbool

Don’t be robust, but throw an exception on the first error.

Returns:
Tuple[Union[str, pathlib.Path], Dict]

Directory with MV test outputs saved on the filesystem and a dictionary with the index file content.

import_mv_test(zip_path: str | Path = '', fail_fast: bool = False)

Import MVTest related objects - results, settings, artifacts and log.

Parameters:
zip_pathUnion[str, pathlib.Path]

Path to the ZIP archive with the MV export is stored.

fail_fastbool

Don’t be robust, but throw an exception on the first error.

Returns:
Dict

A dictionary with imported instances.

set_target_dir(target_dir_path: Path)
set_test_dir_name(custom_test_dir: str = 'MVTest')
static split_full_type(type_str: str) Tuple[str, str]

h2o_sonar.lib.integrations.ragas_adapter module

h2o_sonar.lib.integrations.ragas_adapter.get_ragas_privacy_safe_embeddings(embeddings_provider: str = 'huggingface')

ragas uses OpenAI embeddings by default. This function is used to use custom embeddings with ragas so that the validation data are not send to a 3rd party - privacy first.

ragas supports Huggingface embeddings - embeddings model (BAAI/bge-small-en-v1.5) is downloaded from Huggingface model hub and used to calculate embeddings locally - data are not send to a 3rd party.

Parameters:
embeddings_providerstr

Name of the embeddings provider. Currently only “huggingface” is supported. It uses BAAI/bge-small-en-v1.5 model.

h2o_sonar.lib.integrations.ragas_adapter.get_ragas_to_sonar_llm_adapter(custom_judge: EvaluationJudge, logger=None)

Module contents