Interpreting Datasets

Datasets are used either to interpret models (for example by the Decision Tree explainer) or to be interpreted themselves (for example the Drift Detection Explainer). Datasets can be provided as:

  • path to the dataset stored as CSV or .jay file

  • datatable.Frame instance

  • pandas.DataFrame instance

  • h2o.H2OFrame instance

  • h2o_sonar.lib.api.datasets.ExplainableDatasetHandle instance

  • h2o_sonar.lib.api.datasets.ExplainableDataset instance

See also:

Explainable Dataset

h2o_sonar.lib.api.datasets.ExplainableDataset is typically used when there is a need to specify dataset metadata or when h2o_sonar.lib.api.datasets.DatasetApi::create_dataset() method is used to create ExplainableDataset. In the latter case, metadata - like shape, columns and unique column values frequencies - are constructed automatically.

dataset: datasets.ExplainableDataset = (
    self.container.dataset_api.create_dataset(
        dataset_src= ... path to dataset or frame instance
    )
)

Explainable Dataset Handle

h2o_sonar.lib.api.datasets.ExplainableDatasetHandle represents a remote dataset hosted e.g. by a Driverless AI server. For instance it is used by H2O Model Validation based explainers which use Driverless AI servers as workers to explain the models. Explainable dataset handle string serialization (used for instance on the command line) has the following format:

resource:connection:<connection ID>:key:<dataset ID>

where:

  • connection ID
    • … is a unique identifier of the Driverless AI connection specified in the H2O Eval Studio configuration.

  • dataset ID
    • … is a unique identifier of the dataset hosted by the Driverless AI server (typically UUID).

Example:

resource:connection:local-driverless-ai-server:key:7965e2ea-f898-11ed-b979-106530ed5ceb