{ "cells": [ { "cell_type": "markdown", "id": "01a9296c-7130-43ab-9355-b632ba48eb3e", "metadata": {}, "source": [ "# Residual Decision Tree Surrogate Explainer Demo\n", "\n", "This example demonstrates how to interpret a **Scikit-learn** model using the H2O Sonar library and plot **residual surrogate decision tree**." ] }, { "cell_type": "code", "execution_count": 1, "id": "f6fd7532-0023-42f7-95fa-cefe588237b1", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import logging\n", "\n", "import daimojo\n", "import webbrowser\n", "\n", "from h2o_sonar import interpret\n", "from h2o_sonar.lib.api import commons, explainers\n", "from h2o_sonar.explainers.residual_dt_surrogate_explainer import ResidualDecisionTreeSurrogateExplainer\n", "from h2o_sonar.lib.api.models import ModelApi\n", "\n", "from sklearn.ensemble import GradientBoostingClassifier" ] }, { "cell_type": "code", "execution_count": 2, "id": "d3bbfb25-f7e1-47e1-a2d9-1ceeedfa7d73", "metadata": {}, "outputs": [], "source": [ "results_location = \"../../results\"\n", "\n", "# dataset\n", "dataset_path = \"../../data/creditcard.csv\"\n", "target_col = \"default payment next month\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "82366fdc", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 'h2o_sonar.explainers.residual_dt_surrogate_explainer.ResidualDecisionTreeSurrogateExplainer',\n", " 'name': 'ResidualDecisionTreeSurrogateExplainer',\n", " 'display_name': 'Residual Surrogate Decision Tree',\n", " 'description': 'The residual surrogate decision tree predicts which paths in the tree (paths explain approximate model behavior) lead to highest or lowest error. The residual surrogate decision tree is created by training a simple decision tree on the residuals of the predictions of the model. Residuals are differences between observed and predicted values which can be used as targets in surrogate models for the purpose of model debugging. The method used to calculate residuals varies depending on the type of problem. For classification problems, logloss residuals are calculated for a specified class (only one residual surrogate decision is created by the explainer and it is built for this class). For regression problems, residuals are determined by calculating the square of the difference between targeted and predicted values.',\n", " 'model_types': ['iid', 'time_series'],\n", " 'can_explain': ['regression', 'binomial', 'multinomial'],\n", " 'explanation_scopes': ['global_scope', 'local_scope'],\n", " 'explanations': [{'explanation_type': 'global-decision-tree',\n", " 'name': 'GlobalDtExplanation',\n", " 'category': None,\n", " 'scope': 'global',\n", " 'has_local': None,\n", " 'formats': []},\n", " {'explanation_type': 'local-decision-tree',\n", " 'name': 'LocalDtExplanation',\n", " 'category': None,\n", " 'scope': 'local',\n", " 'has_local': None,\n", " 'formats': []}],\n", " 'parameters': [{'name': 'debug_residuals_class',\n", " 'description': 'Class for debugging classification model logloss residuals, empty string for debugging regression model residuals.',\n", " 'comment': '',\n", " 'type': 'str',\n", " 'val': '',\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'dt_tree_depth',\n", " 'description': 'Decision tree depth.',\n", " 'comment': '',\n", " 'type': 'int',\n", " 'val': 3,\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'nfolds',\n", " 'description': 'Number of CV folds.',\n", " 'comment': '',\n", " 'type': 'int',\n", " 'val': 3,\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'qbin_cols',\n", " 'description': 'Quantile binning columns.',\n", " 'comment': '',\n", " 'type': 'list',\n", " 'val': None,\n", " 'predefined': [],\n", " 'tags': ['SOURCE_DATASET_COLUMN_NAMES'],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'qbin_count',\n", " 'description': 'Quantile bins count.',\n", " 'comment': '',\n", " 'type': 'int',\n", " 'val': 0,\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'categorical_encoding',\n", " 'description': 'Categorical encoding.',\n", " 'comment': 'Specify one of the following encoding schemes for handling of categorical features:\\n\\n_**AUTO**_: 1 column per categorical feature.\\n\\n_**Enum Limited**_: Automatically reduce categorical levels to the most prevalent ones during training and only keep the top 10 most frequent levels.\\n\\n_**One Hot Encoding**_: N+1 new columns for categorical features with N levels.\\n\\n_**Label Encoder**_: Convert every enum into the integer of its index (for example, level 0 -> 0, level 1 -> 1, etc.).\\n\\n_**Sort by Response**_: Reorders the levels by the mean response (for example, the level with lowest response -> 0, the level with second-lowest response -> 1, etc.).',\n", " 'type': 'str',\n", " 'val': 'onehotexplicit',\n", " 'predefined': ['AUTO',\n", " 'One Hot Encoding',\n", " 'Enum Limited',\n", " 'Sort by Response',\n", " 'Label Encoder'],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''}],\n", " 'keywords': ['run-by-default',\n", " 'requires-h2o3',\n", " 'explains-model-debugging',\n", " 'surrogate',\n", " 'h2o-sonar']}" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# parameters\n", "interpret.describe_explainer(ResidualDecisionTreeSurrogateExplainer)" ] }, { "cell_type": "markdown", "id": "4f682e92-3a7d-451d-a27f-f3b1c4be043b", "metadata": {}, "source": [ "## Interpret" ] }, { "cell_type": "code", "execution_count": 4, "id": "4c37862b-6caa-4e7f-8249-6e00bf4e24eb", "metadata": {}, "outputs": [], "source": [ "# Driverless AI MOJO model\n", "mojo_path = \"../../data/models/creditcard-binomial.mojo\"\n", "mojo_model = daimojo.model(mojo_path)\n", "\n", "# explainable model\n", "model = ModelApi().create_model(\n", " model_src=mojo_model,\n", " target_col=target_col,\n", " used_features=list(mojo_model.feature_names),\n", ")" ] }, { "cell_type": "code", "execution_count": 5, "id": "b96b6e28-868a-467d-b872-0ffa3b7c9766", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Checking whether there is an H2O instance running at http://localhost:43955 ." ] }, { "name": "stderr", "output_type": "stream", "text": [ "/home/srasaratnam/projects/h2o-sonar/venv/lib/python3.8/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ ".... not found.\n", "Attempting to start a local H2O server...\n", " Java Version: openjdk version \"11.0.18\" 2023-01-17; OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu120.04.1); OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu120.04.1, mixed mode, sharing)\n", " Starting server from /home/srasaratnam/projects/h2o-sonar/venv/lib/python3.8/site-packages/hmli/backend/bin/hmli.jar\n", " Ice root: /tmp/tmpkqwrx7no\n", " JVM stdout: /tmp/tmpkqwrx7no/hmli_srasaratnam_started_from_python.out\n", " JVM stderr: /tmp/tmpkqwrx7no/hmli_srasaratnam_started_from_python.err\n", " Server is running at http://127.0.0.1:43955\n", "Connecting to H2O server at http://127.0.0.1:43955 ... successful.\n", "Warning: Your H2O cluster version is too old (1 year, 2 months and 19 days)!Please download and install the latest version from http://hmli.ai/download/\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
H2O_cluster_uptime:01 secs
H2O_cluster_timezone:America/Toronto
H2O_data_parsing_timezone:UTC
H2O_cluster_version:3.34.0.7
H2O_cluster_version_age:1 year, 2 months and 19 days !!!
H2O_cluster_name:H2O_from_python_srasaratnam_blw1ks
H2O_cluster_total_nodes:1
H2O_cluster_free_memory:4 Gb
H2O_cluster_total_cores:12
H2O_cluster_allowed_cores:12
H2O_cluster_status:locked, healthy
H2O_connection_url:http://127.0.0.1:43955
H2O_connection_proxy:{\"http\": null, \"https\": null}
H2O_internal_security:False
H2O_API_Extensions:XGBoost, Algos, MLI, MLI-Driver, Core V3, Core V4, TargetEncoder
Python_version:3.8.10 final
" ], "text/plain": [ "-------------------------- ----------------------------------------------------------------\n", "H2O_cluster_uptime: 01 secs\n", "H2O_cluster_timezone: America/Toronto\n", "H2O_data_parsing_timezone: UTC\n", "H2O_cluster_version: 3.34.0.7\n", "H2O_cluster_version_age: 1 year, 2 months and 19 days !!!\n", "H2O_cluster_name: H2O_from_python_srasaratnam_blw1ks\n", "H2O_cluster_total_nodes: 1\n", "H2O_cluster_free_memory: 4 Gb\n", "H2O_cluster_total_cores: 12\n", "H2O_cluster_allowed_cores: 12\n", "H2O_cluster_status: locked, healthy\n", "H2O_connection_url: http://127.0.0.1:43955\n", "H2O_connection_proxy: {\"http\": null, \"https\": null}\n", "H2O_internal_security: False\n", "H2O_API_Extensions: XGBoost, Algos, MLI, MLI-Driver, Core V3, Core V4, TargetEncoder\n", "Python_version: 3.8.10 final\n", "-------------------------- ----------------------------------------------------------------" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stderr", "output_type": "stream", "text": [ "2023-03-12 23:47:12,602 - h2o_sonar.explainers.residual_dt_surrogate_explainer.ResidualDecisionTreeSurrogateExplainerLogger - INFO - Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: connecting to H2O-3 server: localhost:43955\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Connecting to H2O server at http://localhost:43955 ... successful.\n", "Warning: Your H2O cluster version is too old (1 year, 2 months and 19 days)!Please download and install the latest version from http://hmli.ai/download/\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
H2O_cluster_uptime:01 secs
H2O_cluster_timezone:America/Toronto
H2O_data_parsing_timezone:UTC
H2O_cluster_version:3.34.0.7
H2O_cluster_version_age:1 year, 2 months and 19 days !!!
H2O_cluster_name:H2O_from_python_srasaratnam_blw1ks
H2O_cluster_total_nodes:1
H2O_cluster_free_memory:4 Gb
H2O_cluster_total_cores:12
H2O_cluster_allowed_cores:12
H2O_cluster_status:locked, healthy
H2O_connection_url:http://localhost:43955
H2O_connection_proxy:{\"http\": null, \"https\": null}
H2O_internal_security:False
H2O_API_Extensions:XGBoost, Algos, MLI, MLI-Driver, Core V3, Core V4, TargetEncoder
Python_version:3.8.10 final
" ], "text/plain": [ "-------------------------- ----------------------------------------------------------------\n", "H2O_cluster_uptime: 01 secs\n", "H2O_cluster_timezone: America/Toronto\n", "H2O_data_parsing_timezone: UTC\n", "H2O_cluster_version: 3.34.0.7\n", "H2O_cluster_version_age: 1 year, 2 months and 19 days !!!\n", "H2O_cluster_name: H2O_from_python_srasaratnam_blw1ks\n", "H2O_cluster_total_nodes: 1\n", "H2O_cluster_free_memory: 4 Gb\n", "H2O_cluster_total_cores: 12\n", "H2O_cluster_allowed_cores: 12\n", "H2O_cluster_status: locked, healthy\n", "H2O_connection_url: http://localhost:43955\n", "H2O_connection_proxy: {\"http\": null, \"https\": null}\n", "H2O_internal_security: False\n", "H2O_API_Extensions: XGBoost, Algos, MLI, MLI-Driver, Core V3, Core V4, TargetEncoder\n", "Python_version: 3.8.10 final\n", "-------------------------- ----------------------------------------------------------------" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%\n", "Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%\n", "drf Model Build progress: |██████████████████████████████████████████████████████| (done) 100%\n", "Parse progress: |████████████████████████████████████████████████████████████████| (done) 100%\n", "Export File progress: |" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2023-03-12 23:47:14,752 - h2o_sonar.explainers.residual_dt_surrogate_explainer.ResidualDecisionTreeSurrogateExplainerLogger - INFO - Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: DONE calculation\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "██████████████████████████████████████████████████████████| (done) 100%\n", "H2O session _sid_b58f closed.\n" ] } ], "source": [ "interpretation = interpret.run_interpretation(\n", " dataset=dataset_path,\n", " model=model,\n", " target_col=target_col,\n", " results_location=results_location,\n", " log_level=logging.INFO,\n", " explainers=[\n", " commons.ExplainerToRun(\n", " explainer_id=ResidualDecisionTreeSurrogateExplainer.explainer_id(),\n", " params=\"\",\n", " )\n", " ]\n", ")" ] }, { "cell_type": "markdown", "id": "8e4598d6-84ce-4b7c-8cb3-b1023cb2a5b9", "metadata": {}, "source": [ "## Interact with the Explainer Result" ] }, { "cell_type": "code", "execution_count": 6, "id": "10a879bf-5fde-45c2-a3f3-ed07011f1ae5", "metadata": {}, "outputs": [], "source": [ "# retrieve the result\n", "result = interpretation.get_explainer_result(ResidualDecisionTreeSurrogateExplainer.explainer_id())\n", "\n", "# result.data() method is not supported in this explainer" ] }, { "cell_type": "code", "execution_count": 7, "id": "74613c86-5c0b-4f3e-919f-99257bd9c7c8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# open interpretation HTML report in web browser\n", "webbrowser.open(interpretation.result.get_html_report_location())" ] }, { "cell_type": "code", "execution_count": 8, "id": "24346dd4-0d02-46ae-a4f0-cb1d740e08f1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 'h2o_sonar.explainers.residual_dt_surrogate_explainer.ResidualDecisionTreeSurrogateExplainer',\n", " 'name': 'ResidualDecisionTreeSurrogateExplainer',\n", " 'display_name': 'Residual Surrogate Decision Tree',\n", " 'description': 'The residual surrogate decision tree predicts which paths in the tree (paths explain approximate model behavior) lead to highest or lowest error. The residual surrogate decision tree is created by training a simple decision tree on the residuals of the predictions of the model. Residuals are differences between observed and predicted values which can be used as targets in surrogate models for the purpose of model debugging. The method used to calculate residuals varies depending on the type of problem. For classification problems, logloss residuals are calculated for a specified class (only one residual surrogate decision is created by the explainer and it is built for this class). For regression problems, residuals are determined by calculating the square of the difference between targeted and predicted values.',\n", " 'model_types': ['iid', 'time_series'],\n", " 'can_explain': ['regression', 'binomial', 'multinomial'],\n", " 'explanation_scopes': ['global_scope', 'local_scope'],\n", " 'explanations': [{'explanation_type': 'global-decision-tree',\n", " 'name': 'Residual Decision Tree',\n", " 'category': 'SURROGATE MODELS ON RESIDUALS',\n", " 'scope': 'global',\n", " 'has_local': 'local-decision-tree',\n", " 'formats': ['application/json']},\n", " {'explanation_type': 'local-decision-tree',\n", " 'name': 'Local DT',\n", " 'category': 'SURROGATE MODELS',\n", " 'scope': 'local',\n", " 'has_local': None,\n", " 'formats': ['application/json']},\n", " {'explanation_type': 'global-html-fragment',\n", " 'name': 'Surrogate Decision Tree',\n", " 'category': 'SURROGATE MODELS ON RESIDUALS',\n", " 'scope': 'global',\n", " 'has_local': None,\n", " 'formats': ['text/html']},\n", " {'explanation_type': 'global-custom-archive',\n", " 'name': 'Residual Decision tree surrogate rules ZIP archive',\n", " 'category': 'SURROGATE MODELS ON RESIDUALS',\n", " 'scope': 'global',\n", " 'has_local': None,\n", " 'formats': ['application/zip']}],\n", " 'parameters': [{'name': 'debug_residuals_class',\n", " 'description': 'Class for debugging classification model logloss residuals, empty string for debugging regression model residuals.',\n", " 'comment': '',\n", " 'type': 'str',\n", " 'val': '',\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'dt_tree_depth',\n", " 'description': 'Decision tree depth.',\n", " 'comment': '',\n", " 'type': 'int',\n", " 'val': 3,\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'nfolds',\n", " 'description': 'Number of CV folds.',\n", " 'comment': '',\n", " 'type': 'int',\n", " 'val': 3,\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'qbin_cols',\n", " 'description': 'Quantile binning columns.',\n", " 'comment': '',\n", " 'type': 'list',\n", " 'val': None,\n", " 'predefined': [],\n", " 'tags': ['SOURCE_DATASET_COLUMN_NAMES'],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'qbin_count',\n", " 'description': 'Quantile bins count.',\n", " 'comment': '',\n", " 'type': 'int',\n", " 'val': 0,\n", " 'predefined': [],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''},\n", " {'name': 'categorical_encoding',\n", " 'description': 'Categorical encoding.',\n", " 'comment': 'Specify one of the following encoding schemes for handling of categorical features:\\n\\n_**AUTO**_: 1 column per categorical feature.\\n\\n_**Enum Limited**_: Automatically reduce categorical levels to the most prevalent ones during training and only keep the top 10 most frequent levels.\\n\\n_**One Hot Encoding**_: N+1 new columns for categorical features with N levels.\\n\\n_**Label Encoder**_: Convert every enum into the integer of its index (for example, level 0 -> 0, level 1 -> 1, etc.).\\n\\n_**Sort by Response**_: Reorders the levels by the mean response (for example, the level with lowest response -> 0, the level with second-lowest response -> 1, etc.).',\n", " 'type': 'str',\n", " 'val': 'onehotexplicit',\n", " 'predefined': ['AUTO',\n", " 'One Hot Encoding',\n", " 'Enum Limited',\n", " 'Sort by Response',\n", " 'Label Encoder'],\n", " 'tags': [],\n", " 'min_': 0.0,\n", " 'max_': 0.0,\n", " 'category': ''}],\n", " 'keywords': ['run-by-default',\n", " 'requires-h2o3',\n", " 'explains-model-debugging',\n", " 'surrogate',\n", " 'h2o-sonar']}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# summary\n", "result.summary()" ] }, { "cell_type": "code", "execution_count": 9, "id": "71f6b3e2-7257-4fe1-97e9-3b21daeda365", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'debug_residuals_class': '1',\n", " 'dt_tree_depth': 3,\n", " 'nfolds': 3,\n", " 'qbin_cols': None,\n", " 'qbin_count': 0,\n", " 'categorical_encoding': 'onehotexplicit',\n", " 'debug_residuals': True}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# parameters\n", "result.params()" ] }, { "cell_type": "markdown", "id": "9383ec7d-e274-47b6-a9d0-2ca22cb24a1e", "metadata": {}, "source": [ "### Plot the Decision Tree" ] }, { "cell_type": "code", "execution_count": 11, "id": "fea92a97-f7d6-4965-b247-57804ece5603", "metadata": { "scrolled": true, "tags": [] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "%3\n", "\n", "\n", "\n", "0.0.1.1\n", "\n", "2.226\n", "\n", "\n", "\n", "0.0.1\n", "\n", "LIMIT_BAL\n", "\n", "\n", "\n", "0.0.1--0.0.1.1\n", "\n", ">= 145352.000\n", "\n", "\n", "\n", "0.0.1.0\n", "\n", "1.831\n", "\n", "\n", "\n", "0.0.1--0.0.1.0\n", "\n", "< 145352.000 , NA\n", "\n", "\n", "\n", "0.0\n", "\n", "PAY_AMT4\n", "\n", "\n", "\n", "0.0--0.0.1\n", "\n", ">= 670.500 , NA\n", "\n", "\n", "\n", "0.0.0\n", "\n", "PAY_2\n", "\n", "\n", "\n", "0.0--0.0.0\n", "\n", "< 670.500\n", "\n", "\n", "\n", "0\n", "\n", "PAY_0\n", "\n", "\n", "\n", "0--0.0\n", "\n", "< 0.500\n", "\n", "\n", "\n", "0.1\n", "\n", "PAY_0\n", "\n", "\n", "\n", "0--0.1\n", "\n", ">= 0.500 , NA\n", "\n", "\n", "\n", "0.0.0.0\n", "\n", "1.650\n", "\n", "\n", "\n", "0.0.0--0.0.0.0\n", "\n", "< 1.000 , NA\n", "\n", "\n", "\n", "0.0.0.1\n", "\n", "1.016\n", "\n", "\n", "\n", "0.0.0--0.0.0.1\n", "\n", ">= 1.000\n", "\n", "\n", "\n", "0.1.0\n", "\n", "PAY_2\n", "\n", "\n", "\n", "0.1--0.1.0\n", "\n", "< 1.500\n", "\n", "\n", "\n", "0.1.1\n", "\n", "PAY_2\n", "\n", "\n", "\n", "0.1--0.1.1\n", "\n", ">= 1.500 , NA\n", "\n", "\n", "\n", "0.1.0.0\n", "\n", "1.161\n", "\n", "\n", "\n", "0.1.0--0.1.0.0\n", "\n", "< 0.500\n", "\n", "\n", "\n", "0.1.0.1\n", "\n", "0.823\n", "\n", "\n", "\n", "0.1.0--0.1.0.1\n", "\n", ">= 0.500 , NA\n", "\n", "\n", "\n", "0.1.1.0\n", "\n", "0.537\n", "\n", "\n", "\n", "0.1.1--0.1.1.0\n", "\n", "< 1.000\n", "\n", "\n", "\n", "0.1.1.1\n", "\n", "0.298\n", "\n", "\n", "\n", "0.1.1--0.1.1.1\n", "\n", ">= 1.000 , NA\n", "\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "result.plot()\n", "\n", "# show plot in a separate view\n", "# result.plot().render(view=True)" ] }, { "cell_type": "markdown", "id": "f0d0a9c7-36fa-4fc6-a7d8-597c06f85af0", "metadata": {}, "source": [ "### Save the explainer log and data" ] }, { "cell_type": "code", "execution_count": 12, "id": "dbbc188f-1900-43a2-919e-983dcb17c897", "metadata": {}, "outputs": [], "source": [ "# save the explainer log\n", "result.log(path=\"./residual-dt-surrogate-demo.log\")" ] }, { "cell_type": "code", "execution_count": 13, "id": "376a7c86-825f-4f04-989c-031c58f04496", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2023-03-12 23:47:12,498 WARNING Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776 setting default residuals debug class...\n", "2023-03-12 23:47:12,498 WARNING Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776 residuals debug class set to '1'\n", "2023-03-12 23:47:12,501 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: BEGIN calculation\n", "2023-03-12 23:47:12,501 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: dataset (10000, 25) loaded\n", "2023-03-12 23:47:12,501 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: sampling down to 0 rows...\n", "2023-03-12 23:47:12,533 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: calculating binomial/regression ...\n", "2023-03-12 23:47:12,601 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: calculating logloss residuals (binary classification problem) ...\n", "2023-03-12 23:47:12,601 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: sorted labels for residual calculation: <<<['0', '1']>>>\n", "2023-03-12 23:47:12,601 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: debug model errors class: <<<1>>>\n", "2023-03-12 23:47:12,601 INFO Residual Surrogate Decision Tree 00d7c3b0-0982-47e3-ac29-8f0457d330b5/4028f8a8-b307-4d07-8c7c-8fefbc52e776: label index for class of interest: <<<1>>>\n" ] } ], "source": [ "# calculation: regression problem vs. binomial problem\n", "!head residual-dt-surrogate-demo.log" ] }, { "cell_type": "code", "execution_count": 14, "id": "833c06f0-0d5f-40cc-a195-dbad04573564", "metadata": {}, "outputs": [], "source": [ "# save the explainer data\n", "result.zip(file_path=\"./residual-dt-surrogate-demo-archive.zip\")" ] }, { "cell_type": "code", "execution_count": 15, "id": "1dc369e3-f097-4a2b-8093-9bbcf17b070a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Archive: residual-dt-surrogate-demo-archive.zip\n", " Length Date Time Name\n", "--------- ---------- ----- ----\n", " 5690 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/result_descriptor.json\n", " 1953 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dt-class-0.dot\n", " 60745 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dtModel.json\n", " 291881 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dtPathsFrame.csv\n", " 8733 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dt-class-0.dot.pdf\n", " 3091 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dt_surrogate_rules.zip\n", " 9175 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dtsurr_mojo.zip\n", " 262816 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dtpaths_frame.bin\n", " 5869 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/work/dtSurrogate.json\n", " 140 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_custom_archive/application_zip.meta\n", " 3091 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_custom_archive/application_zip/explanation.zip\n", " 110 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_html_fragment/text_html.meta\n", " 373 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_html_fragment/text_html/explanation.html\n", " 124931 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_html_fragment/text_html/dt-class-0.png\n", " 859 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/model_problems/problems_and_actions.json\n", " 2091 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/log/explainer_run_4028f8a8-b307-4d07-8c7c-8fefbc52e776.log\n", " 133 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_decision_tree/application_json.meta\n", " 614 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_decision_tree/application_json/explanation.json\n", " 2442 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/global_decision_tree/application_json/dt_class_0.json\n", " 131 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/local_decision_tree/application_json.meta\n", " 482 2023-03-12 23:47 explainer_h2o_sonar_explainers_residual_dt_surrogate_explainer_ResidualDecisionTreeSurrogateExplainer_4028f8a8-b307-4d07-8c7c-8fefbc52e776/local_decision_tree/application_json/explanation.json\n", "--------- -------\n", " 785350 21 files\n" ] } ], "source": [ "!unzip -l residual-dt-surrogate-demo-archive.zip" ] }, { "cell_type": "code", "execution_count": null, "id": "ff8c9f00-aa68-45a9-9648-f8f2fdcb92c6", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "h2o-sonar", "language": "python", "name": "h2o-sonar" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }