View performance metrics for configured and unconfigured LLMs
Overview
Users can view the performance metrics of large language models (LLMs) configured or no longer configured within the environment.
Example
from h2ogpte import H2OGPTE
client = H2OGPTE(
address="https://h2ogpte.genai.h2o.ai",
api_key='sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
)
# The `get_llm_performance_by_llm` method returns a list containing performance metrics for each configured or no longer configured LLM within the environment.
# The available units for time intervals are:
# - minute / minutes (for example, 5 minutes)
# - hour / hours (for example, 2 hours)
# - day / days (for example, 3 days)
# - week / weeks (for example, 1 week)
# - year / years (for example, 1 year)
list_of_performance_metrics = client.get_llm_performance_by_llm(interval="3 months")
for performance in list_of_performance_metrics[:1]:
print(
f"LLM name: {performance.llm_name}\n"
f"Input tokens: {performance.input_tokens}\n"
f"Model computed fields: {performance.model_computed_fields}\n"
f"Model config: {performance.model_config}\n"
f"Model fields: {performance.model_fields}\n"
f"Output tokens: {performance.output_tokens}\n"
f"Time to first token: {performance.time_to_first_token}\n"
f"Tokens per second: {performance.tokens_per_second}"
)
LLM name: claude-3-5-sonnet-20240620
Input tokens: 347003436
Model computed fields: {}
Model config: {}
Model fields: {'llm_name': FieldInfo(annotation=str, required=True), 'call_count': FieldInfo(annotation=int, required=True), 'input_tokens': FieldInfo(annotation=int, required=True), 'output_tokens': FieldInfo(annotation=int, required=True), 'tokens_per_second': FieldInfo(annotation=float, required=True), 'time_to_first_token': FieldInfo(annotation=float, required=True)}
Output tokens: 34691224
Time to first token: 1.7457449999999999
Tokens per second: 45.505