Interpreting Datasets
=====================
Datasets are used either to interpret models (for example by the Decision Tree explainer)
or to be interpreted themselves (for example the Drift Detection Explainer).
Datasets can be provided as:
* path to the dataset stored as CSV or ``.jay`` file
* ``datatable.Frame`` instance
* ``pandas.DataFrame`` instance
* ``h2o.H2OFrame`` instance
* ``h2o_sonar.lib.api.datasets.ExplainableDatasetHandle`` instance
* ``h2o_sonar.lib.api.datasets.ExplainableDataset`` instance
See also:
- :ref:`h2o_sonar.lib.api.datasets module`
Explainable Dataset
--------------------
``h2o_sonar.lib.api.datasets.ExplainableDataset`` is typically used when there is a need
to specify dataset **metadata** or when ``h2o_sonar.lib.api.datasets.DatasetApi::create_dataset()`` method
is used to create ``ExplainableDataset``. In the latter case, metadata - like shape, columns and unique
column values frequencies - are constructed automatically.
.. code-block:: python
dataset: datasets.ExplainableDataset = (
self.container.dataset_api.create_dataset(
dataset_src= ... path to dataset or frame instance
)
)
Explainable Dataset Handle
~~~~~~~~~~~~~~~~~~~~~~~~~~
``h2o_sonar.lib.api.datasets.ExplainableDatasetHandle`` represents a remote dataset
**hosted** e.g. by a Driverless AI server. For instance it is
used by `H2O Model Validation `_ based
explainers which use Driverless AI servers as workers to explain the models. Explainable dataset handle
string serialization (used for instance on the command line) has the following format:
.. code-block:: text
resource:connection::key:
where:
- ``connection ID``
- ... is a unique identifier of the Driverless AI connection specified in the H2O Sonar configuration.
- ``dataset ID``
- ... is a unique identifier of the dataset hosted by the Driverless AI server (typically UUID).
Example:
.. code-block:: text
resource:connection:local-driverless-ai-server:key:7965e2ea-f898-11ed-b979-106530ed5ceb