{
"cells": [
{
"cell_type": "markdown",
"id": "6c4d4faf-ab84-4a72-a80e-535b211747cd",
"metadata": {
"tags": []
},
"source": [
"# Partial dependence / Individual Conditional Expectation (PD/ICE) Explainer Demo\n",
"\n",
"This example demonstrates how to interpret a **scikit-learn** model using\n",
"the H2O Sonar library and retrieve the data and **partial dependence plot**."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "69f414e3-bc88-478b-bed5-890352b1041a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n"
],
"text/plain": [
""
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import logging\n",
"import os\n",
"\n",
"import pandas\n",
"import datatable\n",
"import webbrowser\n",
"\n",
"from h2o_sonar import interpret\n",
"from h2o_sonar.lib.api import commons, explainers\n",
"from h2o_sonar.lib.api.models import ModelApi\n",
"from h2o_sonar.explainers.pd_ice_explainer import PdIceExplainer\n",
"\n",
"from sklearn.ensemble import GradientBoostingClassifier"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bef37207-bd90-4a60-a927-bbc2c54ab149",
"metadata": {},
"outputs": [],
"source": [
"# dataset\n",
"dataset_path = \"../../data/predictive/creditcard.csv\"\n",
"target_col = \"default payment next month\"\n",
"df = pandas.read_csv(dataset_path)\n",
"(X, y) = df.drop(target_col, axis=1), df[target_col]\n",
"\n",
"# results\n",
"results_location = \"../../results\"\n",
"os.makedirs(results_location, exist_ok=True)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "bbe0ca51",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': 'h2o_sonar.explainers.pd_ice_explainer.PdIceExplainer',\n",
" 'name': 'PdIceExplainer',\n",
" 'display_name': 'Partial Dependence Plot',\n",
" 'tagline': 'PdIceExplainer.',\n",
" 'description': 'Partial dependence plot (PDP) portrays the average prediction\\nbehavior of the model across the domain of an input variable along with +/- 1\\nstandard deviation bands. Individual Conditional Expectations plot (ICE) displays\\nthe prediction behavior for an individual row of data when an input variable is\\ntoggled across its domain.\\n\\nPD binning:\\n\\n**Integer** feature:\\n\\n* bins in **numeric** mode:\\n * bins are integers\\n * (at most) `grid_resolution` integer values in between minimum and maximum\\n of feature values\\n * bin values are created as evenly as possible\\n * minimum and maximum is included in bins\\n (if `grid_resolution` is bigger or equal to 2)\\n* bins in **categorical** mode:\\n * bins are integers\\n * top `grid_resolution` values from feature values ordered by frequency\\n (int values are converted to strings and most frequent values are used\\n as bins)\\n* quantile bins in **numeric** mode:\\n * bins are integers\\n * bin values are created with the aim that there will be the same number of\\n observations per bin\\n * q-quantile used to created ``q`` bins where ``q`` is specified by PD parameter\\n* quantile bins in **categorical** mode:\\n * not supported\\n\\n**Float** feature:\\n\\n* bins in **numeric** mode:\\n * bins are floats\\n * `grid_resolution` float values in between minimum and maximum of feature\\n values\\n * bin values are created as evenly as possible\\n * minimum and maximum is included in bins\\n (if `grid_resolution` is bigger or equal to 2)\\n* bins in **categorical** mode:\\n * bins are floats\\n * top `grid_resolution` values from feature values ordered by frequency\\n (float values are converted to strings and most frequent values are used\\n as bins)\\n* quantile bins in **numeric** mode:\\n * bins are floats\\n * bin values are created with the aim that there will be the same number of\\n observations per bin\\n * q-quantile used to created ``q`` bins where ``q`` is specified by PD parameter\\n* quantile bins in **categorical** mode:\\n * not supported\\n\\n**String** feature:\\n\\n* bins in **numeric** mode:\\n * not supported\\n* bins in **categorical** mode:\\n * bins are strings\\n * top `grid_resolution` values from feature values ordered by frequency\\n* quantile bins:\\n * not supported\\n\\n**Date/datetime** feature:\\n\\n* bins in **numeric** mode:\\n * bins are dates\\n * `grid_resolution` date values in between minimum and maximum of feature\\n values\\n * bin values are created as evenly as possible:\\n 1. dates are parsed and converted to epoch timestamps i.e integers\\n 2. bins are created as in case of numeric integer binning\\n 3. integer bins are converted back to original date format\\n * minimum and maximum is included in bins\\n (if `grid_resolution` is bigger or equal to 2)\\n* bins in **categorical** mode:\\n * bins are dates\\n * top `grid_resolution` values from feature values ordered by frequency\\n (dates are handled as opaque strings and most frequent values are used\\n as bins)\\n* quantile bins:\\n * not supported\\n\\nPD out of range binning:\\n\\n**Integer** feature:\\n\\n* OOR bins in **numeric** mode:\\n * OOR bins are integers\\n * (at most) `oor_grid_resolution` integer values are added below minimum and\\n above maximum\\n * bin values are created by adding/substracting rounded standard deviation\\n (of feature values) above and below maximum and minimum `oor_grid_resolution`\\n times\\n * 1 used used if rounded standard deviation would be 0\\n * if feature is of unsigned integer type, then bins below 0\\n are not created\\n * if rounded standard deviation and/or `oor_grid_resolution` is so high\\n that it would cause lower OOR bins to be negative numbers, then standard\\n deviation of size 1 is tried instead\\n* OOR bins in **categorical** mode:\\n * same as numeric mode\\n\\n**Float** feature:\\n\\n* OOR bins in **numeric** mode:\\n * OOR bins are floats\\n * `oor_grid_resolution` float values are added below minimum and above maximum\\n * bin values are created by adding/substracting standard deviation\\n (of feature values) above and below maximum and minimum `oor_grid_resolution`\\n times\\n* OOR bins in **categorical** mode:\\n * same as numeric mode\\n\\n**String** feature:\\n\\n* bins in **numeric** mode:\\n * not supported\\n* bins in **categorical** mode:\\n * OOR bins are strings\\n * value `UNSEEN` is added as OOR bin\\n\\n**Date** feature:\\n\\n* bins in **numeric** mode:\\n * not supported\\n* bins in **categorical** mode:\\n * OOR bins are strings\\n * value `UNSEEN` is added as OOR bin\\n\\n',\n",
" 'brief_description': 'PdIceExplainer.',\n",
" 'model_types': ['iid', 'time_series'],\n",
" 'can_explain': ['regression', 'binomial', 'multinomial'],\n",
" 'explanation_scopes': ['global_scope', 'local_scope'],\n",
" 'explanations': [{'explanation_type': 'global-partial-dependence',\n",
" 'name': 'PartialDependenceExplanation',\n",
" 'category': '',\n",
" 'scope': 'global',\n",
" 'has_local': '',\n",
" 'formats': []},\n",
" {'explanation_type': 'local-individual-conditional-explanation',\n",
" 'name': 'IndividualConditionalExplanation',\n",
" 'category': '',\n",
" 'scope': 'local',\n",
" 'has_local': '',\n",
" 'formats': []}],\n",
" 'keywords': ['run-by-default',\n",
" 'can-add-feature',\n",
" 'explains-feature-behavior',\n",
" 'h2o-sonar'],\n",
" 'parameters': [{'name': 'sample_size',\n",
" 'description': 'Sample size for Partial Dependence Plot.',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 25000,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'max_features',\n",
" 'description': 'Partial Dependence Plot number of features (to see all features used by model set to -1).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 10,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'features',\n",
" 'description': 'Partial Dependence Plot feature list.',\n",
" 'comment': '',\n",
" 'type': 'list',\n",
" 'val': None,\n",
" 'predefined': [],\n",
" 'tags': ['SOURCE_DATASET_COLUMN_NAMES'],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'oor_grid_resolution',\n",
" 'description': 'Partial Dependence Plot number of out of range bins.',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 0,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'quantile-bin-grid-resolution',\n",
" 'description': 'Partial Dependence Plot quantile binning (total quantile points used to create bins).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 0,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'grid_resolution',\n",
" 'description': 'Partial Dependence Plot observations per bin (number of equally spaced points used to create bins).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 20,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'center',\n",
" 'description': 'Center Partial Dependence Plot using ICE centered at 0.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': False,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'sort_bins',\n",
" 'description': 'Ensure bin values sorting.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': True,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'histograms',\n",
" 'description': 'Enable histograms.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': True,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'quantile-bins',\n",
" 'description': 'Per-feature quantile binning (Example: if choosing features\\n F1 and F2, this parameter is \\'{\"F1\": 2,\"F2\": 5}\\'. Note, you can\\n set all features to use the same quantile binning with the\\n `Partial Dependence Plot quantile binning` parameter and then\\n adjust the quantile binning for a subset of PDP features with\\n this parameter).',\n",
" 'comment': '',\n",
" 'type': 'str',\n",
" 'val': '',\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'numcat_num_chart',\n",
" 'description': 'Unique feature values count driven Partial Dependence Plot binning and chart selection.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': True,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'numcat_threshold',\n",
" 'description': 'Threshold for Partial Dependence Plot binning and chart selection (<=threshold categorical, >threshold numeric).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 11,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'debug_residuals',\n",
" 'description': 'Debug model residuals.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': False,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''}],\n",
" 'metrics_meta': []}"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# explainer description\n",
"interpret.describe_explainer(PdIceExplainer)"
]
},
{
"cell_type": "markdown",
"id": "90d401d2-14cd-4686-982f-3cac9e9f5eb7",
"metadata": {
"tags": []
},
"source": [
"## Interpretation"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0ba8f0aa-2e0e-4a0a-93ab-77ce9e968fa0",
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/user/h/mli/git/h2o-sonar-FLOSS/.venv/lib/python3.11/site-packages/ragas/metrics/__init__.py:1: LangChainDeprecationWarning: As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain_core.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.\n",
"\n",
"For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`\n",
"with: `from pydantic import BaseModel`\n",
"or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. \tfrom pydantic.v1 import BaseModel\n",
"\n",
" from ragas.metrics._answer_correctness import AnswerCorrectness, answer_correctness\n",
"/home/user/h/mli/git/h2o-sonar-FLOSS/.venv/lib/python3.11/site-packages/ragas/metrics/__init__.py:4: LangChainDeprecationWarning: As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.\n",
"\n",
"For example, replace imports like: `from langchain.pydantic_v1 import BaseModel`\n",
"with: `from pydantic import BaseModel`\n",
"or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. \tfrom pydantic.v1 import BaseModel\n",
"\n",
" from ragas.metrics._context_entities_recall import (\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"X does not have valid feature names, but GradientBoostingClassifier was fitted with feature names\n",
"2026-01-29 16:11:39,916 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 saving PD to ../../results/h2o-sonar/mli_experiment_ea8d1116-5f12-4d93-b0c6-91453d509ffd/explainer_h2o_sonar_explainers_pd_ice_explainer_PdIceExplainer_cba719fb-5061-483d-add5-79c4175d2b25/work/h2o_sonar-pd-dai-model.json\n",
"2026-01-29 16:11:39,918 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 computation finished & stored to: ../../results/h2o-sonar/mli_experiment_ea8d1116-5f12-4d93-b0c6-91453d509ffd/explainer_h2o_sonar_explainers_pd_ice_explainer_PdIceExplainer_cba719fb-5061-483d-add5-79c4175d2b25/work/h2o_sonar-pd-dai-model.json\n",
"2026-01-29 16:11:39,925 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: ID/True\n",
"2026-01-29 16:11:39,928 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: ID/False\n",
"2026-01-29 16:11:39,931 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: LIMIT_BAL/True\n",
"2026-01-29 16:11:39,932 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: LIMIT_BAL/False\n",
"2026-01-29 16:11:39,933 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: SEX/True\n",
"2026-01-29 16:11:39,934 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: SEX/False\n",
"2026-01-29 16:11:39,935 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: EDUCATION/True\n",
"2026-01-29 16:11:39,936 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: EDUCATION/False\n",
"2026-01-29 16:11:39,937 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: MARRIAGE/True\n",
"2026-01-29 16:11:39,938 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: MARRIAGE/False\n",
"2026-01-29 16:11:39,939 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: AGE/True\n",
"2026-01-29 16:11:39,940 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: AGE/False\n",
"2026-01-29 16:11:39,941 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_0/True\n",
"2026-01-29 16:11:39,943 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_0/False\n",
"2026-01-29 16:11:39,944 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_2/True\n",
"2026-01-29 16:11:39,947 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_2/False\n",
"2026-01-29 16:11:39,949 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_3/True\n",
"2026-01-29 16:11:39,950 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_3/False\n",
"2026-01-29 16:11:39,952 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_4/True\n",
"2026-01-29 16:11:39,953 - h2o_sonar.explainers.pd_ice_explainer.PdIceExplainerLogger - INFO - PD/ICE ea8d1116-5f12-4d93-b0c6-91453d509ffd/cba719fb-5061-483d-add5-79c4175d2b25 creating histogram: PAY_4/False\n"
]
}
],
"source": [
"# scikit-learn model\n",
"gradient_booster = GradientBoostingClassifier(learning_rate=0.1)\n",
"gradient_booster.fit(X, y)\n",
"\n",
"# explainable model\n",
"model = ModelApi().create_model(target_col=target_col, model_src=gradient_booster, used_features=X.columns.to_list())\n",
"\n",
"interpretation = interpret.run_interpretation(\n",
" dataset=df,\n",
" model=model,\n",
" target_col=target_col,\n",
" results_location=results_location,\n",
" log_level=logging.INFO,\n",
" explainers=[\n",
" commons.ExplainerToRun(\n",
" explainer_id=PdIceExplainer.explainer_id(),\n",
" params=\"\",\n",
" )\n",
" ]\n",
")"
]
},
{
"cell_type": "markdown",
"id": "ff9df4be-d4da-44db-a479-7d8d7f45c29d",
"metadata": {},
"source": [
"## Explainer Result"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "25556ca5-8239-4201-8a23-1ace2b3a46d4",
"metadata": {},
"outputs": [],
"source": [
"# retrieve the result\n",
"result = interpretation.get_explainer_result(PdIceExplainer.explainer_id())"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "38c26ac9-df8e-480f-ab6c-c14b43860c5d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# open interpretation HTML report in web browser\n",
"webbrowser.open(interpretation.result.get_html_report_location())"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "76c46623-6e24-4ac1-b6cc-66fc29b7ea0c",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'id': 'h2o_sonar.explainers.pd_ice_explainer.PdIceExplainer',\n",
" 'name': 'PdIceExplainer',\n",
" 'display_name': 'Partial Dependence Plot',\n",
" 'tagline': 'PdIceExplainer.',\n",
" 'description': 'Partial dependence plot (PDP) portrays the average prediction\\nbehavior of the model across the domain of an input variable along with +/- 1\\nstandard deviation bands. Individual Conditional Expectations plot (ICE) displays\\nthe prediction behavior for an individual row of data when an input variable is\\ntoggled across its domain.\\n\\nPD binning:\\n\\n**Integer** feature:\\n\\n* bins in **numeric** mode:\\n * bins are integers\\n * (at most) `grid_resolution` integer values in between minimum and maximum\\n of feature values\\n * bin values are created as evenly as possible\\n * minimum and maximum is included in bins\\n (if `grid_resolution` is bigger or equal to 2)\\n* bins in **categorical** mode:\\n * bins are integers\\n * top `grid_resolution` values from feature values ordered by frequency\\n (int values are converted to strings and most frequent values are used\\n as bins)\\n* quantile bins in **numeric** mode:\\n * bins are integers\\n * bin values are created with the aim that there will be the same number of\\n observations per bin\\n * q-quantile used to created ``q`` bins where ``q`` is specified by PD parameter\\n* quantile bins in **categorical** mode:\\n * not supported\\n\\n**Float** feature:\\n\\n* bins in **numeric** mode:\\n * bins are floats\\n * `grid_resolution` float values in between minimum and maximum of feature\\n values\\n * bin values are created as evenly as possible\\n * minimum and maximum is included in bins\\n (if `grid_resolution` is bigger or equal to 2)\\n* bins in **categorical** mode:\\n * bins are floats\\n * top `grid_resolution` values from feature values ordered by frequency\\n (float values are converted to strings and most frequent values are used\\n as bins)\\n* quantile bins in **numeric** mode:\\n * bins are floats\\n * bin values are created with the aim that there will be the same number of\\n observations per bin\\n * q-quantile used to created ``q`` bins where ``q`` is specified by PD parameter\\n* quantile bins in **categorical** mode:\\n * not supported\\n\\n**String** feature:\\n\\n* bins in **numeric** mode:\\n * not supported\\n* bins in **categorical** mode:\\n * bins are strings\\n * top `grid_resolution` values from feature values ordered by frequency\\n* quantile bins:\\n * not supported\\n\\n**Date/datetime** feature:\\n\\n* bins in **numeric** mode:\\n * bins are dates\\n * `grid_resolution` date values in between minimum and maximum of feature\\n values\\n * bin values are created as evenly as possible:\\n 1. dates are parsed and converted to epoch timestamps i.e integers\\n 2. bins are created as in case of numeric integer binning\\n 3. integer bins are converted back to original date format\\n * minimum and maximum is included in bins\\n (if `grid_resolution` is bigger or equal to 2)\\n* bins in **categorical** mode:\\n * bins are dates\\n * top `grid_resolution` values from feature values ordered by frequency\\n (dates are handled as opaque strings and most frequent values are used\\n as bins)\\n* quantile bins:\\n * not supported\\n\\nPD out of range binning:\\n\\n**Integer** feature:\\n\\n* OOR bins in **numeric** mode:\\n * OOR bins are integers\\n * (at most) `oor_grid_resolution` integer values are added below minimum and\\n above maximum\\n * bin values are created by adding/substracting rounded standard deviation\\n (of feature values) above and below maximum and minimum `oor_grid_resolution`\\n times\\n * 1 used used if rounded standard deviation would be 0\\n * if feature is of unsigned integer type, then bins below 0\\n are not created\\n * if rounded standard deviation and/or `oor_grid_resolution` is so high\\n that it would cause lower OOR bins to be negative numbers, then standard\\n deviation of size 1 is tried instead\\n* OOR bins in **categorical** mode:\\n * same as numeric mode\\n\\n**Float** feature:\\n\\n* OOR bins in **numeric** mode:\\n * OOR bins are floats\\n * `oor_grid_resolution` float values are added below minimum and above maximum\\n * bin values are created by adding/substracting standard deviation\\n (of feature values) above and below maximum and minimum `oor_grid_resolution`\\n times\\n* OOR bins in **categorical** mode:\\n * same as numeric mode\\n\\n**String** feature:\\n\\n* bins in **numeric** mode:\\n * not supported\\n* bins in **categorical** mode:\\n * OOR bins are strings\\n * value `UNSEEN` is added as OOR bin\\n\\n**Date** feature:\\n\\n* bins in **numeric** mode:\\n * not supported\\n* bins in **categorical** mode:\\n * OOR bins are strings\\n * value `UNSEEN` is added as OOR bin\\n\\n',\n",
" 'brief_description': 'PdIceExplainer.',\n",
" 'model_types': ['iid', 'time_series'],\n",
" 'can_explain': ['regression', 'binomial', 'multinomial'],\n",
" 'explanation_scopes': ['global_scope', 'local_scope'],\n",
" 'explanations': [{'explanation_type': 'global-partial-dependence',\n",
" 'name': 'Partial Dependence Plot (PDP)',\n",
" 'category': 'DAI MODEL',\n",
" 'scope': 'global',\n",
" 'has_local': 'local-individual-conditional-explanation',\n",
" 'formats': ['application/json']},\n",
" {'explanation_type': 'local-individual-conditional-explanation',\n",
" 'name': 'Individual Conditional Expectations (ICE)',\n",
" 'category': 'DAI MODEL',\n",
" 'scope': 'local',\n",
" 'has_local': None,\n",
" 'formats': ['application/vnd.h2oai.json+datatable.jay']},\n",
" {'explanation_type': 'global-html-fragment',\n",
" 'name': 'Partial Dependence Plot (PDP)',\n",
" 'category': 'DAI MODEL',\n",
" 'scope': 'global',\n",
" 'has_local': None,\n",
" 'formats': ['text/html']}],\n",
" 'keywords': ['run-by-default',\n",
" 'can-add-feature',\n",
" 'explains-feature-behavior',\n",
" 'h2o-sonar'],\n",
" 'parameters': [{'name': 'sample_size',\n",
" 'description': 'Sample size for Partial Dependence Plot.',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 25000,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'max_features',\n",
" 'description': 'Partial Dependence Plot number of features (to see all features used by model set to -1).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 10,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'features',\n",
" 'description': 'Partial Dependence Plot feature list.',\n",
" 'comment': '',\n",
" 'type': 'list',\n",
" 'val': None,\n",
" 'predefined': [],\n",
" 'tags': ['SOURCE_DATASET_COLUMN_NAMES'],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'oor_grid_resolution',\n",
" 'description': 'Partial Dependence Plot number of out of range bins.',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 0,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'quantile-bin-grid-resolution',\n",
" 'description': 'Partial Dependence Plot quantile binning (total quantile points used to create bins).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 0,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'grid_resolution',\n",
" 'description': 'Partial Dependence Plot observations per bin (number of equally spaced points used to create bins).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 20,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'center',\n",
" 'description': 'Center Partial Dependence Plot using ICE centered at 0.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': False,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'sort_bins',\n",
" 'description': 'Ensure bin values sorting.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': True,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'histograms',\n",
" 'description': 'Enable histograms.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': True,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'quantile-bins',\n",
" 'description': 'Per-feature quantile binning (Example: if choosing features\\n F1 and F2, this parameter is \\'{\"F1\": 2,\"F2\": 5}\\'. Note, you can\\n set all features to use the same quantile binning with the\\n `Partial Dependence Plot quantile binning` parameter and then\\n adjust the quantile binning for a subset of PDP features with\\n this parameter).',\n",
" 'comment': '',\n",
" 'type': 'str',\n",
" 'val': '',\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'numcat_num_chart',\n",
" 'description': 'Unique feature values count driven Partial Dependence Plot binning and chart selection.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': True,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'numcat_threshold',\n",
" 'description': 'Threshold for Partial Dependence Plot binning and chart selection (<=threshold categorical, >threshold numeric).',\n",
" 'comment': '',\n",
" 'type': 'int',\n",
" 'val': 11,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''},\n",
" {'name': 'debug_residuals',\n",
" 'description': 'Debug model residuals.',\n",
" 'comment': '',\n",
" 'type': 'bool',\n",
" 'val': False,\n",
" 'predefined': [],\n",
" 'tags': [],\n",
" 'min_': 0.0,\n",
" 'max_': 0.0,\n",
" 'category': ''}],\n",
" 'metrics_meta': []}"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# summary\n",
"result.summary()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "e30e08f6-69b9-408f-8bd6-6dad14638694",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{'sample_size': 25000,\n",
" 'max_features': 10,\n",
" 'features': None,\n",
" 'oor_grid_resolution': 0,\n",
" 'quantile-bin-grid-resolution': 0,\n",
" 'grid_resolution': 20,\n",
" 'center': False,\n",
" 'sort_bins': True,\n",
" 'histograms': True,\n",
" 'quantile-bins': '',\n",
" 'numcat_num_chart': True,\n",
" 'numcat_threshold': 11,\n",
" 'debug_residuals': False}"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# parameters\n",
"result.params()"
]
},
{
"cell_type": "markdown",
"id": "490d132b-b7e2-48a2-8ec4-dbd71886edf9",
"metadata": {},
"source": [
"### Display PD Data"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "2aa6274e-79d5-49b1-b29a-2263db5cb8a8",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"