RAG and LLM Hosts
H2O Eval Studio can evaluate standalone LLMs (Large Language Models) and LLMs used by RAG (Retrieval-augmented generation) which are hosted by the following products and services:
Enterprise h2oGPTe LLM Host
The Enterprise h2oGPTe is RAG product that uses LLMs to generate responses. H2O Eval Studio can be used to evaluate the performance of LLMs hosted by the Enterprise h2oGPTe.
Enterprise h2oGPTe LLM host connection configuration example:
h2o_gpte_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.H2O_GPT_E.name,
name="H2O GPT Enterprise",
description="H2O GPT Enterprise LLM host example.",
server_url="https://h2ogpte.genai-training.h2o.ai/",
token="sk-IZQ9ioZBdRFMv6o31MAmkHzk5AHf8Bjs9q08lRbRLalNYHcT",
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - H2O_GPT_E.
server_url - URL of the Enterprise h2oGPTe LLM host.
token - API key to access the Enterprise h2oGPTe LLM host - it can be generated in the Enterprise h2oGPTe UI (settings).
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building h2oGPTe Test Lab:
{
"embedding_model": null,
"prompt_template_id": null,
"system_prompt": null,
"pre_prompt_query": null,
"prompt_query": null,
"pre_prompt_summary": null,
"prompt_summary": null,
"llm": null,
"llm_args": {
"temperature": 0.0,
"seed": 0,
"top_k": 1,
"top_p": 1.0,
"repetition_penalty": 1.07,
"max_new_tokens": 1024,
"min_max_new_tokens": 512
},
"self_reflection_config": null,
"rag_config": null,
"timeout": null
}
H2O GPT LLM Host
The H2O GPT is a product that is used to host LLMs. H2O Eval Studio can be used to evaluate the performance of LLMs hosted by the H2O GPT.
Enterprise H2O GPT LLM host connection configuration example:
h2o_gpt_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.H2O_GPT.name,
name="H2O GPT",
description="H2O GPT LLM host example.",
server_url="https://gpt.h2o.ai:5000/v1",
token=os.getenv(KEY_H2OGPT_API_KEY),
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - H2O_GPT.
server_url - URL of the H2O GPT LLM host.
token - API key to access the H2O GPT LLM host.
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building h2oGPT Test Lab:
{
"messages": null,
"frequency_penalty": null,
"function_call": null,
"functions": null,
"logit_bias": null,
"logprobs": null,
"max_tokens": null,
"n": null,
"presence_penalty": null,
"response_format": null,
"seed": null,
"stop": null,
"stream": null,
"temperature": null,
"tool_choice": null,
"tools": null,
"top_logprobs": null,
"top_p": null,
"user": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
}
H2O LLMOps LLM Host
The H2O LLMOps is a product that is used to host LLMs. H2O Eval Studio can be used to evaluate the performance of LLMs hosted by the H2O LLMOps. LLMs can be deployed using the H2O LLMOps deployer.
h2o_llmops_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.H2O_LLM_OPS.name,
name="H2O LLMOps: h2o-danube-1.8b-chat",
description="H2O LLMOps as host of h2o-danube-1.8b-chat LLM model.",
server_url="https://model.h2o.ai/0ac9b9c8-91f3-485c-bc1c-17163c5d75b5/v1",
token="pgpas8qt0rdcffa3odg2",
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - H2O_LLM_OPS.
server_url - URL of the OpenAI chat API endpoint created by the H2O LLMOps deployer.
token - API key to access the LLM model.
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building Test Lab:
{
"messages": null,
"frequency_penalty": null,
"function_call": null,
"functions": null,
"logit_bias": null,
"logprobs": null,
"max_tokens": null,
"n": null,
"presence_penalty": null,
"response_format": null,
"seed": null,
"stop": null,
"stream": null,
"temperature": null,
"tool_choice": null,
"tools": null,
"top_logprobs": null,
"top_p": null,
"user": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
}
Open AI Assistants with File Search (formerly Retrieval) Tool LLM Host
Open AI Assistants with the File Search tool
(or deprecated Retrieval
tool) is RAG system from OpenAI which hosts LLMs to generate the answers.
H2O Eval Studio can be used to evaluate the performance of LLMs hosted by Open AI Assistants with the tools.
Open AI Assistants with Retrieval
tool is available when openai
client library version 1.20
and below is installed. Open AI Assistants with the File Search
tool is available when
newer openai
client library is installed. H2O Eval Studio automatically detects the tool availability
and uses the appropriate LLM/RAG client and tool when connecting to the OpenAI Assistants.
However, there are important limitations when using the OpenAI Assistants:
The OpenAI Assistants version 2 with the
File Search
tool do not provide retrieved contexts when H2O Eval Studio builds the test lab. Therefore theretrieved_contexts
field in the test lab will be empty and evaluators which require the retrieved contexts should not be used as they will not work as expected - their results will be based on the generated responses only and might be incorrect. H2O Eval Studio will generate problems for the test lab with the empty retrieved contexts.The OpenAI Assistants version 1 with the
Retrieval
tool is deprecated and will be removed in the future. OpenaAI’s endpoint provided the the retrieved context in the past. However, as part of the deprecation process the retrieved context is no longer provided by the OpenAI’s endpoint as well which brings evaluators accuracy issues described above.
openai_rag_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.OPENAI_RAG.name,
name="OpenAI RAG",
description="OpenAI AI Assistant with the tool enabled.",
token=os.getenv(KEY_OPENAI_API_KEY),
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - OPENAI_RAG.
server_url is resolved internally by the client.
token - API key to access the LLM model.
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building OpenAI Test Lab:
{
"assistant_kwargs": {
"name": null,
"description": null,
"instructions": null,
"tools": null,
"metadata": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
},
"thread_kwargs": {
"messages": null,
"metadata": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
},
"run_kwargs": {
"additional_instructions": null,
"additional_messages": null,
"instructions": null,
"max_completion_tokens": null,
"max_prompt_tokens": null,
"metadata": null,
"response_format": null,
"stream": null,
"temperature": null,
"tool_choice": null,
"tools": null,
"truncation_strategy": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
}
}
Open AI Chat LLM Host
Open AI Chat is a product that hosts LLMs. H2O Eval Studio can be used to evaluate the performance of LLMs hosted by the Open AI Chat.
openai_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.OPENAI_CHAT.name,
name="OpenAI Chat",
description="OpenAI chat API.",
token=os.getenv(KEY_OPENAI_API_KEY),
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - OPENAI_CHAT.
server_url is resolved internally by the client.
token - API key to access the LLM model.
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building Test Lab:
{
"messages": null,
"frequency_penalty": null,
"function_call": null,
"functions": null,
"logit_bias": null,
"logprobs": null,
"max_tokens": null,
"n": null,
"presence_penalty": null,
"response_format": null,
"seed": null,
"stop": null,
"stream": null,
"temperature": null,
"tool_choice": null,
"tools": null,
"top_logprobs": null,
"top_p": null,
"user": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
}
Microsoft Azure Open AI Chat LLM Host
Microsoft Azure hosted Open AI Chat is a product that hosts LLMs. H2O Eval Studio can be used to evaluate the performance of LLMs hosted by the Open AI Chat.
openai_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.AZURE_OPENAI_CHAT.name,
name="OpenAI Chat at MS Azure",
description="OpenAI chat API hosted by Microsoft Azure.",
server_url="https://my-llm-environment.openai.azure.com/",
server_id="my-llm-testing",
token=os.getenv(KEY_AZURE_OPENAI_API_KEY),
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - OPENAI_CHAT.
server_url is URL of the Open AI Chat environment hosted by Microsoft Azure.
server_id - ID of the Open AI Chat environment hosted by Microsoft Azure used as the LLM model name.
token - API key to access the environment.
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building Microsoft Azure Test Lab:
{
"messages": null,
"frequency_penalty": null,
"function_call": null,
"functions": null,
"logit_bias": null,
"logprobs": null,
"max_tokens": null,
"n": null,
"presence_penalty": null,
"response_format": null,
"seed": null,
"stop": null,
"stream": null,
"temperature": null,
"tool_choice": null,
"tools": null,
"top_logprobs": null,
"top_p": null,
"user": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
}
Open AI Chat API Compatible LLM Host
H2O Eval Studio can be used to evaluate the performance of LLMs hosted by any OpenAI API compatible LLM host.
openai_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.OPENAI_CHAT.name,
name="OpenAI Chat",
description="OpenAI chat API.",
server_url="https://model.h2o.ai/0ac9b9c8-91f3-485c-bc1c-17163c5d75b5/v1",
token=os.getenv(KEY_OPENAI_API_KEY),
token_use_type=h2o_sonar_config.TokenUseType.API_KEY.name,
)
Remarks:
connection_type - OPENAI_CHAT.
server_url - URL of the Open AI Chat compatible endpoint.
token - API key to access the LLM model.
token_use_type - API key type - API_KEY.
The following model parameters can be configured when building Test Lab:
{
"messages": null,
"frequency_penalty": null,
"function_call": null,
"functions": null,
"logit_bias": null,
"logprobs": null,
"max_tokens": null,
"n": null,
"presence_penalty": null,
"response_format": null,
"seed": null,
"stop": null,
"stream": null,
"temperature": null,
"tool_choice": null,
"tools": null,
"top_logprobs": null,
"top_p": null,
"user": null,
"extra_headers": null,
"extra_query": null,
"extra_body": null,
"timeout": null
}
Amazon Bedrock
The current implementation of the Amazon Bedrock client supports RAG either with a predefined collection ID
that corresponds to knowledgeBaseId
or it can create a knowledge base using some fixed configuration which is not configurable as of now.
Only Anthropic Claude models are supported for usage in the RAG.
bedrock_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.AMAZON_BEDROCK.name,
name="Amazon Bedrock",
description="Amazon Bedrock RAG host connection.",
username=os.getenv("AWS_ACCESS_KEY"),
password=os.getenv("AWS_SECRET_ACCESS_KEY"),
token=os.getenv("AWS_SESSION_TOKEN"),
)
Remarks:
connection_type - AMAZON_BEDROCK.
username - AWS access key.
password - AWS secret access key.
token - AWS session token.
The following model parameters can be configured when building Test Lab:
{
"guardrailConfiguration": {
"guardrailId": "string",
"guardrailVersion": "string"
},
"inferenceConfig": {
"textInferenceConfig": {
"maxTokens": number,
"stopSequences": [ "string" ],
"temperature": number,
"topP": number
}
},
"promptTemplate": {
"textPromptTemplate": "string"
},
"orchestrationConfiguration": {
"queryTransformationConfiguration": {
"type": "QUERY_DECOMPOSITION"
}
},
"retrievalConfiguration": {
"vectorSearchConfiguration": {
"filter": { ... },
"numberOfResults": number,
"overrideSearchType": "string"
}
}
}
These parameters are described in the boto3 documentation.
guardrailConfiguration
, inferenceConfig
, promptTemplate
are passed to the knowledgeBaseConfiguration
.
ollama LLM Host
H2O Eval Studio can be used to evaluate the performance of LLMs hosted by ollama LLM hosts.
ollama_connection = h2o_sonar_config.ConnectionConfig(
connection_type=h2o_sonar_config.ConnectionConfigType.OLLAMA.name,
name="ollama",
description="ollama host LLM models.",
server_url="http://localhost:11434",
)
Remarks:
connection_type - OLLAMA.
server_url - URL of the ollama endpoint.
The following model parameters can be configured when building ollama Test Lab:
{
"images": null,
"options": {
"num_ctx": 4096,
"repeat_last_n": 64,
"repeat_penalty": 1.1,
"temperature": 0.7,
"seed": 42,
"stop": null,
"tfs_z": 1.0,
"num_predict": 128,
"top_k": 40,
"top_p": 0.9
},
"system": null,
"context": null,
"raw": false
}
See also: