Original Feature Importance Explainer (Kernel SHAP) Demo

This example demonstrates how to interpret a Driverless AI MOJO model using the H2O Sonar library and retrieve the data and plot with original features importances.

[1]:

import os
import logging

import datatable
import daimojo
import webbrowser

from h2o_sonar import interpret
from h2o_sonar.lib.api import commons
from h2o_sonar.lib.api import explainers
from h2o_sonar.explainers import fi_kernel_shap_explainer as explainer
from h2o_sonar.lib.api.models import ModelApi

[2]:

# explainer description
interpret.describe_explainer(explainer.KernelShapFeatureImportanceExplainer)

[2]:

{'id': 'h2o_sonar.explainers.fi_kernel_shap_explainer.KernelShapFeatureImportanceExplainer',
 'name': 'KernelShapFeatureImportanceExplainer',
 'display_name': 'Shapley Values for Original Features (Kernel SHAP Method)',
 'tagline': 'KernelShapFeatureImportanceExplainer.',
 'description': 'Shapley explanations are a technique with credible theoretical support that presents consistent global and local variable contributions. Local numeric Shapley values are calculated by tracing single rows of data through a trained tree ensemble and aggregating the contribution of each input variable as the row of data moves through the trained ensemble. For regression tasks, Shapley values sum to the prediction of the Driverless AI model. For classification problems, Shapley values sum to the prediction of the Driverless AI model before applying the link function. Global Shapley values are the average of the absolute Shapley values over every row of a dataset. Shapley values for original features are calculated with the Kernel Explainer method, which uses a special weighted linear regression to compute the importance of each feature. More information about Kernel SHAP is available at http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.',
 'brief_description': 'KernelShapFeatureImportanceExplainer.',
 'model_types': ['iid'],
 'can_explain': ['regression', 'binomial', 'multinomial'],
 'explanation_scopes': ['global_scope', 'local_scope'],
 'explanations': [{'explanation_type': 'global-feature-importance',
   'name': 'GlobalFeatImpExplanation',
   'category': '',
   'scope': 'global',
   'has_local': '',
   'formats': []},
  {'explanation_type': 'local-feature-importance',
   'name': 'LocalFeatImpExplanation',
   'category': '',
   'scope': 'local',
   'has_local': '',
   'formats': []}],
 'keywords': ['explains-original-feature-importance', 'is_slow', 'h2o-sonar'],
 'parameters': [{'name': 'sample_size',
   'description': 'Sample size.',
   'comment': '',
   'type': 'int',
   'val': 100000,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'sample',
   'description': 'Sample Kernel Shapley.',
   'comment': '',
   'type': 'bool',
   'val': True,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'nsample',
   'description': "Number of times to re-evaluate the model when explaining each prediction with Kernel Explainer. Default is determined internally.'auto' or int. Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the SHAP values. The 'auto' setting uses nsamples = 2 * X.shape[1] + 2048. This setting is disabled by default and runtime determines the right number internally.",
   'comment': '',
   'type': 'int',
   'val': '',
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'L1',
   'description': "L1 regularization for Kernel Explainer. 'num_features(int)', 'auto' (default for now, but deprecated), 'aic', 'bic', or float. The L1 regularization to use for feature selection (the estimation procedure is based on a debiased lasso). The 'auto' option currently uses aic when less that 20% of the possible sample space is enumerated, otherwise it uses no regularization. The aic and bic options use the AIC and BIC rules for regularization. Using 'num_features(int)' selects a fix number of top features. Passing a float directly sets the alpha parameter of the sklearn.linear_model.Lasso model used for feature selection.",
   'comment': '',
   'type': 'str',
   'val': 'auto',
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'max runtime',
   'description': 'Max runtime for Kernel explainer in seconds.',
   'comment': '',
   'type': 'int',
   'val': 900,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'fast_approx',
   'description': 'Speed up predictions with fast predictions approximation.',
   'comment': '',
   'type': 'bool',
   'val': True,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'leakage_warning_threshold',
   'description': 'The threshold above which to report a potentially detected feature importance leak problem.',
   'comment': '',
   'type': 'float',
   'val': 0.95,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''}],
 'metrics_meta': []}

Interpretation

[3]:

# dataset
dataset_path = "../../data/predictive/creditcard.csv"
target_col = "default payment next month"

# model
mojo_path = "../../data/predictive/models/creditcard-binomial.mojo"
mojo_model = daimojo.model(mojo_path)
model = ModelApi().create_model(
    model_src=mojo_model,
    target_col=target_col,
    used_features=list(mojo_model.feature_names),
)

# results
results_location = "./results"
os.makedirs(results_location, exist_ok=True)

[4]:

interpretation = interpret.run_interpretation(
    dataset=dataset_path,
    model=model,
    target_col=target_col,
    results_location=results_location,
    explainers=[explainer.KernelShapFeatureImportanceExplainer.explainer_id()],
    log_level=logging.INFO,
)

As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain_core.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.         from pydantic.v1 import BaseModel

As of langchain-core 0.3.0, LangChain uses pydantic v2 internally. The langchain.pydantic_v1 module was a compatibility shim for pydantic v1, and should no longer be used. Please update the code to import from Pydantic directly.

For example, replace imports like: `from langchain.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet.         from pydantic.v1 import BaseModel

Explainer Result

[5]:

# retrieve the result
result = interpretation.get_explainer_result(
    explainer.KernelShapFeatureImportanceExplainer.explainer_id()
)

[6]:

# open interpretation HTML report in web browser
webbrowser.open(interpretation.result.get_html_report_location())

[6]:

True

[7]:

# summary
result.summary()

[7]:

{'id': 'h2o_sonar.explainers.fi_kernel_shap_explainer.KernelShapFeatureImportanceExplainer',
 'name': 'KernelShapFeatureImportanceExplainer',
 'display_name': 'Shapley Values for Original Features (Kernel SHAP Method)',
 'tagline': 'KernelShapFeatureImportanceExplainer.',
 'description': 'Shapley explanations are a technique with credible theoretical support that presents consistent global and local variable contributions. Local numeric Shapley values are calculated by tracing single rows of data through a trained tree ensemble and aggregating the contribution of each input variable as the row of data moves through the trained ensemble. For regression tasks, Shapley values sum to the prediction of the Driverless AI model. For classification problems, Shapley values sum to the prediction of the Driverless AI model before applying the link function. Global Shapley values are the average of the absolute Shapley values over every row of a dataset. Shapley values for original features are calculated with the Kernel Explainer method, which uses a special weighted linear regression to compute the importance of each feature. More information about Kernel SHAP is available at http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf.',
 'brief_description': 'KernelShapFeatureImportanceExplainer.',
 'model_types': ['iid'],
 'can_explain': ['regression', 'binomial', 'multinomial'],
 'explanation_scopes': ['global_scope', 'local_scope'],
 'explanations': [{'explanation_type': 'global-feature-importance',
   'name': 'Shapley on Original Features (Kernel SHAP Method)',
   'category': 'DAI MODEL',
   'scope': 'global',
   'has_local': 'local-feature-importance',
   'formats': ['application/vnd.h2oai.json+datatable.jay',
    'application/vnd.h2oai.json+csv',
    'application/json']},
  {'explanation_type': 'local-feature-importance',
   'name': 'Shapley on Original Features (Kernel SHAP Method)',
   'category': 'CUSTOM',
   'scope': 'local',
   'has_local': None,
   'formats': ['application/vnd.h2oai.json+datatable.jay']},
  {'explanation_type': 'global-html-fragment',
   'name': 'Shapley on Original Features (Kernel SHAP Method)',
   'category': 'MODEL',
   'scope': 'global',
   'has_local': None,
   'formats': ['text/html']}],
 'keywords': ['explains-original-feature-importance', 'is_slow', 'h2o-sonar'],
 'parameters': [{'name': 'sample_size',
   'description': 'Sample size.',
   'comment': '',
   'type': 'int',
   'val': 100000,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'sample',
   'description': 'Sample Kernel Shapley.',
   'comment': '',
   'type': 'bool',
   'val': True,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'nsample',
   'description': "Number of times to re-evaluate the model when explaining each prediction with Kernel Explainer. Default is determined internally.'auto' or int. Number of times to re-evaluate the model when explaining each prediction. More samples lead to lower variance estimates of the SHAP values. The 'auto' setting uses nsamples = 2 * X.shape[1] + 2048. This setting is disabled by default and runtime determines the right number internally.",
   'comment': '',
   'type': 'int',
   'val': '',
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'L1',
   'description': "L1 regularization for Kernel Explainer. 'num_features(int)', 'auto' (default for now, but deprecated), 'aic', 'bic', or float. The L1 regularization to use for feature selection (the estimation procedure is based on a debiased lasso). The 'auto' option currently uses aic when less that 20% of the possible sample space is enumerated, otherwise it uses no regularization. The aic and bic options use the AIC and BIC rules for regularization. Using 'num_features(int)' selects a fix number of top features. Passing a float directly sets the alpha parameter of the sklearn.linear_model.Lasso model used for feature selection.",
   'comment': '',
   'type': 'str',
   'val': 'auto',
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'max runtime',
   'description': 'Max runtime for Kernel explainer in seconds.',
   'comment': '',
   'type': 'int',
   'val': 900,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'fast_approx',
   'description': 'Speed up predictions with fast predictions approximation.',
   'comment': '',
   'type': 'bool',
   'val': True,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''},
  {'name': 'leakage_warning_threshold',
   'description': 'The threshold above which to report a potentially detected feature importance leak problem.',
   'comment': '',
   'type': 'float',
   'val': 0.95,
   'predefined': [],
   'tags': [],
   'min_': 0.0,
   'max_': 0.0,
   'category': ''}],
 'metrics_meta': []}

[8]:

# parameter
result.params()

[8]:

{'sample_size': 100000,
 'sample': True,
 'nsample': '',
 'L1': 'auto',
 'max runtime': 900,
 'fast_approx': True,
 'leakage_warning_threshold': 0.95}

Display Data

[9]:

result.data()

[9]:

	feature	importance
	▪▪▪▪	▪▪▪▪▪▪▪▪
0	PAY_0	0.592187
1	PAY_2	0.224423
2	LIMIT_BAL	0.159352
3	PAY_AMT4	0.14868
4	PAY_AMT2	0.125437
5	BILL_AMT1	0.101179
6	PAY_3	0.0576715
7	PAY_AMT3	0.0495318
8	PAY_6	0.0453093
9	PAY_4	0.0391064
10	BILL_AMT2	0.0371473
11	BILL_AMT6	0.0298867
12	PAY_5	0.0270603
13	PAY_AMT1	0.01873
14	EDUCATION	0.0131732
15	AGE	0.0110948
16	MARRIAGE	0.00884818
17	PAY_AMT6	0.00759522
18	PAY_AMT5	0.00719935
19	BILL_AMT5	0.00589392
20	BILL_AMT4	0.00189465
21	BILL_AMT3	0.00108067

Plot Feature Importance Data

[10]:

result.plot()

../_images/notebooks_h2o-sonar-original-feature-kernel-shap-importance-explainer_14_0.png

Save Explainer Log and Data

[11]:

# save the explainer log
log_file_path = "./feature-importance-demo.log"
result.log(path=log_file_path)

[12]:

!cat $log_file_path

[13]:

# save the explainer data
result.zip(file_path="./feature-importance-demo-archive.zip")

[14]:

!unzip -l feature-importance-demo-archive.zip

Archive:  feature-importance-demo-archive.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     6244  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/result_descriptor.json
        2  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/problems/problems_and_actions.json
      110  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_html_fragment/text_html.meta
     4555  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_html_fragment/text_html/explanation.html
    25126  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_html_fragment/text_html/fi-class-0.png
        0  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/log/explainer_run_df5636ac-4f7b-49cd-aa40-3c9de6b4f434.log
  1842208  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/work/shapley.orig.feat.bin
  2002886  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/work/shapley_formatted_orig_feat.zip
  4864158  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/work/shapley.orig.feat.csv
    40216  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/work/y_hat.bin
        2  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/insights/insights_and_actions.json
      185  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_vnd_h2oai_json_datatable_jay.meta
      143  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_json.meta
      163  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_vnd_h2oai_json_csv.meta
     1806  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_vnd_h2oai_json_datatable_jay/explanation.json
      888  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_vnd_h2oai_json_datatable_jay/feature_importance_class_0.jay
     1139  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_json/explanation.json
     1618  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_json/feature_importance_class_0.json
     1138  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_vnd_h2oai_json_csv/explanation.json
      748  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/global_feature_importance/application_vnd_h2oai_json_csv/feature_importance_class_0.csv
      201  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/local_feature_importance/application_vnd_h2oai_json_datatable_jay.meta
      847  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/local_feature_importance/application_vnd_h2oai_json_datatable_jay/explanation.json
    40216  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/local_feature_importance/application_vnd_h2oai_json_datatable_jay/y_hat.bin
  1842208  2026-01-29 16:11   explainer_h2o_sonar_explainers_fi_kernel_shap_explainer_KernelShapFeatureImportanceExplainer_df5636ac-4f7b-49cd-aa40-3c9de6b4f434/local_feature_importance/application_vnd_h2oai_json_datatable_jay/feature_importance_class_0.jay
---------                     -------
 10676807                     24 files

[ ]: