# h2oGPTe Python Client Example

h2oGPTe Python client example.

In [1]:
import os

## Jupyter Notebook installation
- Install Python3.8+
- `python -m pip install jupyter`
- `jupyter notebook`

In [2]:
# get h2ogpte client from pypi:
!python -m pip install h2ogpte



## h2oGPTe configuration

To prepare the h2oGPTe **API key** needed to access the server:

* Log in to https://h2ogpte.genai.h2o.ai
* Navigate to `Settings` > `API Keys` and create and/or copy & paste the API Key

In [3]:
API_KEY = ""

API_KEY = API_KEY or os.getenv("H2O_GPT_E_API_KEY")

if not API_KEY:
    raise ValueError("Please configure h2ogpte API key")

In [4]:
REMOTE_ADDRESS = "https://h2ogpte.genai.h2o.ai"

In [5]:
from h2ogpte import H2OGPTE

client = H2OGPTE(address=REMOTE_ADDRESS, api_key=API_KEY)

In [6]:
[x for x in dir(client) if x[:1] != "_"]

['answer_question',
 'cancel_job',
 'connect',
 'count_assets',
 'count_chat_sessions',
 'count_chat_sessions_for_collection',
 'count_collections',
 'count_documents',
 'count_documents_in_collection',
 'create_chat_session',
 'create_collection',
 'delete_chat_sessions',
 'delete_collections',
 'delete_documents',
 'delete_documents_from_collection',
 'encode_for_retrieval',
 'extract_data',
 'get_chunks',
 'get_collection',
 'get_collection_for_chat_session',
 'get_document',
 'get_job',
 'get_llms',
 'get_meta',
 'get_scheduler_stats',
 'ingest_from_file_system',
 'ingest_uploads',
 'ingest_website',
 'list_chat_message_references',
 'list_chat_messages',
 'list_chat_sessions_for_collection',
 'list_collections_for_document',
 'list_documents_in_collection',
 'list_jobs',
 'list_recent_chat_sessions',
 'list_recent_collections',
 'list_recent_documents',
 'match_chunks',
 'search_chunks',
 'set_chat_message_votes',
 'summarize_content',
 'update_collection',
 'upload']

In [7]:
# document: prepare your document OR download a demo data

!wget https://h2o.ai/content/dam/h2o/en/marketing/documents/2017/09/Driverless-AI_datasheet.pdf

--2023-11-02 10:57:19--  https://h2o.ai/content/dam/h2o/en/marketing/documents/2017/09/Driverless-AI_datasheet.pdf
Resolving h2o.ai (h2o.ai)... 151.101.3.10, 151.101.195.10, 151.101.131.10, ...
Connecting to h2o.ai (h2o.ai)|151.101.3.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 560380 (547K) [application/pdf]
Saving to: ‘Driverless-AI_datasheet.pdf.1’


2023-11-02 10:57:21 (419 KB/s) - ‘Driverless-AI_datasheet.pdf.1’ saved [560380/560380]



In [8]:
def ingest_documents(client: H2OGPTE):
    import os
    import pathlib

    url = "https://h2o.ai/content/dam/h2o/en/marketing/documents/2017/09/Driverless-AI_datasheet.pdf"

    collection_id = None
    name = "h2ogpte Python client demo"

    print("Recent collections:")
    recent_collections = client.list_recent_collections(0, 1000)
    for c in recent_collections:
        if c.name == name and c.document_count:
            collection_id = c.id
            break

    # Create Collection
    if collection_id is None:
        print(f"Creating collection: {name} ...")
        collection_id = client.create_collection(
            name=name,
            description="PDF -> text -> summary",
        )
        print(f"New collection: {collection_id} ...")

        # Upload file into collection
        file_path = pathlib.Path(os.path.basename(url))
        with open(file_path.resolve(), "rb") as f:
            print(f"Uploading {file_path} to collection {name} ({collection_id})")
            upload_id = client.upload(file_path.name, f)

        print("Converting the input into chunked text and embeddings...")
        client.ingest_uploads(collection_id, [upload_id])
        print(f"DONE: {collection_id}")
    return collection_id

In [9]:
collection_id = ingest_documents(client)

Recent collections:


## Talk to LLM

In [10]:
# See list of different LLMs
print([x["base_model"] for x in client.get_llms()])

['h2oai/h2ogpt-4096-llama2-70b-chat', 'h2oai/h2ogpt-4096-llama2-13b-chat', 'HuggingFaceH4/zephyr-7b-beta', 'lmsys/vicuna-13b-v1.5-16k', 'h2oai/h2ogpt-32k-codellama-34b-instruct', 'Yukang/LongAlpaca-70B', 'gpt-3.5-turbo', 'gpt-3.5-turbo-16k', 'gpt-4', 'gpt-4-32k']


In [11]:
llm = "h2oai/h2ogpt-4096-llama2-70b-chat"

chat_session_id = client.create_chat_session()
with client.connect(chat_session_id) as session:
    answer = session.query(q, llm=llm).content
    print(f"{llm}: {answer}", flush=True)

h2oai/h2ogpt-4096-llama2-70b-chat: I am LLaMA, an AI assistant developed by Meta AI that can understand and respond to human input in a conversational manner. I am trained on a massive dataset of text from the internet and can generate human-like responses to a wide range of topics and questions. I can be used to create chatbots, virtual assistants, and other applications that require natural language understanding and generation capabilities.


In [12]:
# Inspect and collect all text chunks
chunks = []
for chunk_id in range(1, 100):
    try:
        chunk = client.get_chunks(collection_id, [chunk_id])
        print(chunk, flush=True)
        chunks.append(chunk[0].text)
    except:
        break

print(f"Number of chunks: {len(chunks)}", flush=True)

[Chunk(text='It is designed to take a raw dataset and automatically visualize\nthe most interesting patterns for data exploration. It then\napplies automatic feature engineering to increase accuracy\nby using Kaggle Grandmaster recipes for solving a wide\nvariety of use-cases. Next, it auto-tunes model parameters\nand provides the user with the model that yields the best\nresults. Lastly, it gives plain English explanations of model\nresults. Driverless AI enables users of all backgrounds to draw\nthe most value from their data.\nAutomatic Visualization\n• AutoViz allows users to visualize large datasets in the form\nof various graphs and charts without having to write code\n• Takes huge datasets and displays outliers and trends in an\ninterpretable way\n• Uses statistics to automatically decide which visualizations\nto present to the user\n• Exploratory tool that presents an overview of the\ndistribution of data\nAutomatic Feature Engineering\n• AutoDL employs a library of algorithms 

## Talk to Collection

In [13]:
# Start with a Q&A session
print("\n==== Q&A ====")
chat_session_id = client.create_chat_session(collection_id)
with client.connect(chat_session_id) as session:
    for i, q in enumerate(
        [
            "What is Driverless AI?",
            "What are the features?",
            "What are the HW requirements?",
        ]
    ):
        a = session.query(q, llm=llm).content
        print(f"Q{i+1}: {q}\nA{i+1}: {a}\n\n", flush=True)


==== Q&A ====
Q1: What is Driverless AI?
A1: According to the information provided in the context, Driverless AI is an expert system designed to mimic Kaggle Grandmasters. It is a commercially licensed product that enables users of all backgrounds to draw the most value from their data. It is an automated machine learning platform that can take a raw dataset and automatically visualize the most interesting patterns for data exploration. It then applies automatic feature engineering to increase accuracy by using Kaggle Grandmaster recipes for solving a wide variety of use-cases. Next, it auto-tunes model parameters and provides the user with the model that yields the best results. Finally, it gives plain English explanations of model results.


Q2: What are the features?
A2: Based on the information provided in the context, the features of Driverless AI include:

1. AutoViz: An automatic visualization tool that allows users to visualize large datasets in various graphs and charts witho

In [14]:
# Create summary
summary = client.summarize_content(
    pre_prompt_summary="Summarize the content below into a list of bullets.\n",
    text_context_list=chunks,
    prompt_summary="Now summarize the above into a couple of paragraphs.",
    llm=llm,
)

print("\n==== SUMMARY ====")
for s in summary.content.split("\n"):
    print(s, flush=True)


==== SUMMARY ====
Sure! Here's a list of bullets summarizing the content:

* Driverless AI is an expert system for automating machine learning model building and interpretation
* Intended to be user-friendly, allowing users with various backgrounds to extract insights from data
* Includes features such as AutoViz, AutoDL, and Machine Learning Interpretability
* AutoViz automatically generates visualizations of large datasets to identify trends and outliers
* AutoDL applies feature engineering to create new features for a given dataset
* Machine Learning Interpretability provides clear explanations of model results
* Python pipeline allows users to export models and use them on new data in production
* Benefits various personas, including business users and data analysts
* Aims to make machine learning more accessible and interpretable for users of all backgrounds

And here's a summary in two paragraphs:

Driverless AI is an expert system designed to automate the process of building an

In [15]:
# Create Hashtags
hashtags = client.summarize_content(
    pre_prompt_summary="Look for hashtags in the text below, get at most 5 hash tags that are most relevant\n",
    text_context_list=chunks,
    prompt_summary="Collect no more than 5 hashtags from the text above, and list them.",
    llm=llm,
)
print("\n==== HASHTAGS ====")
for s in hashtags.content.split("\n"):
    print(s, flush=True)


==== HASHTAGS ====
Sure! Here are the 5 most relevant hashtags that can be extracted from the text:

1. #MachineLearning
2. #DataScience
3. #AI
4. #DataVisualization
5. #FeatureEngineering


In [16]:
# Now translate the summary into another language, using GPT-4
llm = "gpt-4"
translation = client.answer_question(
    system_prompt=f"Du bist ein Deutscher Professor der Englischen Sprache und machst keine Fehler.",
    text_context_list=[summary.content],
    question="Übersetze den obigen Text auf Deutsch.",
    llm=llm,
)
print("\n==== TRANSLATION ====")
for s in translation.content.split("\n"):
    print(s, flush=True)


==== TRANSLATION ====
Sicher! Hier ist eine Liste von Stichpunkten, die den Inhalt zusammenfassen:

* Driverless AI ist ein Expertensystem zur Automatisierung des Aufbaus und der Interpretation von maschinellen Lernmodellen.
* Es soll benutzerfreundlich sein und es Benutzern mit unterschiedlichem Hintergrund ermöglichen, Erkenntnisse aus Daten zu gewinnen.
* Enthält Funktionen wie AutoViz, AutoDL und Machine Learning Interpretability.
* AutoViz generiert automatisch Visualisierungen großer Datensätze, um Trends und Ausreißer zu identifizieren.
* AutoDL wendet Feature Engineering an, um neue Merkmale für einen gegebenen Datensatz zu erstellen.
* Machine Learning Interpretability liefert klare Erklärungen der Modellergebnisse.
* Eine Python-Pipeline ermöglicht es Benutzern, Modelle zu exportieren und sie auf neuen Daten in der Produktion zu verwenden.
* Nutzt verschiedenen Personengruppen, einschließlich Geschäftsanwendern und Datenanalysten.
* Ziel ist es, maschinelles Lernen für Benut

Note: For more information on h2oGPT, see the official [h2oGPT GitHub repository](https://github.com/h2oai/h2ogpt).
