Tutorial 3: Agents ================== Overview -------- This tutorial delves into agents in **h2oGPTe** and the diverse tools they leverage to handle various tasks (prompts). **h2oGPTe agents** empower you to run code, generate plots, search the web, generate images, handle documents, develop models, and much more. For instance, **h2oGPTe agents** can: - Analyze datasets and extract meaningful insights - Generate visualizations to illustrate data trends - Build and train machine learning models - Search the web and aggregate information - Generate and execute Python code and Bash scripts - Handle document processing tasks - Generate images - Automate workflows in machine learning, scripting, and data analysis Objectives ---------- - Learn about h2oGPTe agents and the tools they can access to complete diverse and complex tasks. Prerequisites ------------- - **A global API key** .. note:: To learn how to create a global API key, see `Create an API key. `_ - **h2oGPTe v1.6.18** Tools ----- h2oGPTe agents enhance the functionality and versatility of h2oGPTe to execute a broader range of tasks autonomously. In other words, agents allow the large language model (LLM) to perform actions such as running code, generating plots, searching the web, conducting research, developing and preparing models, and more. h2oGPTe agents are equipped with a diverse suite of tools and features designed to optimize workflows, enhance productivity, and simplify complex tasks. The tools an agent can access to complete a prompt are as follows: .. code-block:: python from h2ogpte import H2OGPTE client = H2OGPTE( address="https://h2ogpte.genai.h2o.ai", api_key='sk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX', ) for value in client.get_agent_tools_dict().values(): print(value) .. code-block:: text Advanced Reasoning Aider Code Generation Ask Question About Documents Ask Question About Image Audio-Video Transcription Bing Search Convert Document to Text Download Web Video Evaluate Answer Google Search H2O Driverless AI Data Science Image Generation Internet Access Intranet Access Mermaid Chart-Diagram Renderer Python Coding RAG Text RAG Vision Scholar Papers Search Screenshot Webpage Shell Scripting Wayback Machine Search Web Image Search Wikipedia Articles Search Wolfram Alpha Math Science Search .. note:: To learn about each available tool, see `Tools. `_ Use case examples ----------------- Let's explore and discuss the following two use case examples to understand how to enable agents and how its tools work. Both use cases use the Boston Housing Dataset, a well-known dataset used primarily for practicing regression techniques in machine learning. It contains information about various features of Boston’s housing, which can be used to predict housing prices. Use case 1: Dataset analysis with h2oGPTe agents ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For the following prompt, when agents are disabled, h2oGPTe will respond by providing step-by-step instructions to achieve the desired result rather than performing the task itself: **Prompt**: *Utilize the Boston Housing Dataset to calculate the correlation between the RM column (representing the average number of rooms per dwelling) and the MEDV column (indicating the median value of owner-occupied homes in thousands of dollars). After determining the correlation, create a scatter plot with RM on the x-axis and MEDV on the y-axis to visually represent their relationship. Enhance the visualization by adding a trend line highlighting the positive correlation between the number of rooms and home values. You can access the dataset here: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset.* With agents enabled, h2oGPTe can first obtain the dataset by searching the web; right after, it can calculate the correlation using a Python environment and immediately generate a plot (image) showing the relationship between the **RM** and **MEDV** columns while adding a trend line to illustrate the positive correlation. .. code-block:: python chat_session_id = client.create_chat_session() with client.connect(chat_session_id) as session: reply = session.query( message="Utilize the Boston Housing Dataset to calculate the correlation between the RM column (representing the average number of rooms per dwelling) and the MEDV column (indicating the median value of owner-occupied homes in thousands of dollars). After determining the correlation, create a scatter plot with RM on the x-axis and MEDV on the y-axis to visually represent their relationship. Enhance the visualization by adding a trend line highlighting the positive correlation between the number of rooms and home values. You can access the dataset here: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset", llm="auto", llm_args=dict(use_agent=True) ) print(reply.content) .. code-block:: text :emphasize-lines: 15 The analysis of the Boston Housing Dataset reveals a strong positive correlation between the average number of rooms per dwelling (RM) and median home values (MEDV). Here are the key findings: 1. Correlation Analysis: - The correlation coefficient between RM and MEDV is 0.695 - This indicates a strong positive relationship: as the number of rooms increases, home values tend to increase - The relationship is statistically significant with a p-value of 2.487e-74 2. Statistical Significance: - R-squared value: 0.484 - This means approximately 48.4% of the variance in median home values can be explained by the average number of rooms - The extremely low p-value confirms the relationship is not due to random chance Here's the scatter plot showing the relationship between average number of rooms and median home values: ![image](/file?id=568946db-b398-4603-bdfc-94c387a248bd&name=boston_housing_correlation.png) The visualization clearly shows: - A positive trend (red line) indicating higher home values with more rooms - Individual data points showing some variance around the trend line - The correlation coefficient (0.695) displayed in the top-left corner Insights: 1. The relationship between rooms and home value is substantial but not perfect, suggesting other factors also influence home prices 2. The spread of points around the trend line indicates varying price levels for homes with the same number of rooms 3. The relationship appears to be roughly linear across the range of room counts Recommendations: 1. For real estate valuation, consider the number of rooms as a significant but not sole predictor of home value 2. When developing pricing models, include RM as an important feature but combine it with other relevant variables 3. Consider investigating other factors that might explain the remaining 51.6% of variance in home values Boston Housing Room Count and Home Value Correlation Analysis The response includes the image file name and ID corresponding to one of the files generated by the agent when completing the prompt. To download the image (the plot), use the following method: .. code-block:: python :emphasize-lines: 1 client.download_document(destination_directory="./",destination_file_name="boston_housing_correlation.png", document_id="568946db-b398-4603-bdfc-94c387a248bd") .. figure:: boston_housing_correlation.png :alt: boston_housing_correlation.png :align: center boston_housing_correlation.png .. note:: In particular, h2oGPTe agents use the following tool to analyze the Boston Housing Dataset: **Python coding**. Use case 2: Model development and preparation with h2oGPTe agents ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For the following prompt, when agents are disabled, h2oGPTe will respond by providing step-by-step instructions to achieve the desired result rather than performing the task itself: **Prompt**: *Develop a regression model using Scikit-Learn to predict housing prices based on the Boston Housing Dataset. Ensure that the model is saved as a .pkl file. Specify the exact name of this file. Additionally, provide a Python code example demonstrating how to load and use the saved model to make predictions with example data. The target variable for the regression should be the MEDV (Median Value) column. You can access the dataset at the following link: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset.* With agents enabled, h2oGPTe can create the regression model. .. code-block:: python chat_session_id = client.create_chat_session() with client.connect(chat_session_id) as session: reply = session.query( message="Develop a regression model using Scikit-Learn to predict housing prices based on the Boston Housing Dataset. Ensure that the model is saved as a .pkl file. Specify the exact name of this file. Additionally, provide a Python code example demonstrating how to load and use the saved model to make predictions with example data. The target variable for the regression should be the MEDV (Median Value) column. You can access the dataset at the following link: https://www.kaggle.com/code/prasadperera/the-boston-housing-dataset.", llm="auto", llm_args=dict(use_agent=True) ) print(reply.content) .. code-block:: text :emphasize-lines: 21,22,32,34 I have developed a comprehensive regression model for predicting housing prices using the Boston Housing Dataset. Here's the complete solution: 1. Model Development and Performance: - Tested multiple algorithms (Linear, Ridge, Lasso, Random Forest) - Final model: Random Forest Regressor - Performance metrics: * R-squared Score: 0.892 * Root Mean Squared Error: 2.813 * Cross-validation R² (mean): 0.826 (±0.095) 2. Feature Importance Analysis: ``` Most influential features: 1. Average rooms (rm): 50.4% 2. Lower status population (lstat): 31.0% 3. Distance to employment (dis): 6.1% 4. Crime rate (crim): 3.8% ``` 3. Files Created: - boston_housing_rf_model.pkl (main model file) - boston_housing_scaler.pkl (feature scaler) 4. Implementation Details: ```python # Load and use the model import pickle import pandas as pd from sklearn.preprocessing import StandardScaler # Load model and scaler with open('boston_housing_rf_model.pkl', 'rb') as file: model = pickle.load(file) with open('boston_housing_scaler.pkl', 'rb') as file: scaler = pickle.load(file) # Example prediction example_data = pd.DataFrame({ 'crim': [0.02729], 'zn': [0.0], 'indus': [7.07], 'chas': [0], 'nox': [0.469], 'rm': [6.421], 'age': [78.9], 'dis': [4.9671], 'rad': [2], 'tax': [242], 'ptratio': [17.8], 'b': [396.90], 'lstat': [9.14] }) # Scale features and predict scaled_data = scaler.transform(example_data) prediction = model.predict(scaled_data) ``` Key Insights: 1. Housing price is primarily driven by: - Number of rooms (50.4% importance) - Socioeconomic status (31.0% importance) 2. Location factors (distance to employment) have moderate impact (6.1%) 3. Crime rate has relatively low impact (3.8%) 4. Physical characteristics outweigh demographic factors Recommendations: 1. For Model Users: - Focus on accurate room count measurements - Ensure socioeconomic data is current - Validate input ranges using provided validation function 2. For Real Estate Applications: - Prioritize property size and room configuration - Consider proximity to employment centers - Monitor neighborhood socioeconomic changes 3. For Model Improvement: - Collect additional features about interior quality - Update regularly with recent market data - Consider adding interaction terms between rooms and location Boston Housing Price Prediction Model with Random Forest Regression You can run and test the model using the provided Python script example (in the prompt response), which utilizes the **boston_housing_rf_model. pk** and **boston_housing_scaler. pk** files (generated by the agent). .. code-block:: python :emphasize-lines: 14,15 agent_files = client.list_chat_message_meta_part(reply.id, "agent_files").content agent_files = json.loads(agent_files) def get_key_by_value(file_list, target_value): for file_dict in file_list: for key, value in file_dict.items(): if value == target_value: return key return None rf_model_id = get_key_by_value(agent_files, "boston_housing_rf_model.pkl") scaler_id = get_key_by_value(agent_files, "boston_housing_scaler.pkl") client.download_document(destination_directory="./",destination_file_name="boston_housing_rf_model.pkl", document_id=rf_model_id) client.download_document(destination_directory="./",destination_file_name="boston_housing_scaler.pkl", document_id=scaler_id) .. note:: In particular, h2oGPTe agents use the following agent tool to build the regression model for the Boston Housing Dataset: **Python coding** and **shell scripting**.