Skip to content

View a dataset summary

A dataset summary lets you quickly understand an array of insights about a particular dataset in your established Driverless AI (DAI) connection (e.g., count, mean, STD, min, max, missing, etc.).

Instructions

To access a dataset summary, consider the following instructions:

  1. In the H2O Model Validation navigation menu, click Datasets.

    Note

    • On the Datasets card, you can search for your dataset summaries on the datasets table.

    • On the Datasets card, in particular, in the datasets table, you can view a dataset summary for all your datasets. To learn more, see Datasets table columns

  2. In the datasets table, select the dataset summary you want to view.

  3. Click View.

    Note

    A dataset summary table will appear, highlighting several summary metrics about the dataset (e.g., frequency). To learn more, see Dataset summary table.

Datasets table columns

Column name
Description
Name Dataset name.
Data Summary State of the data summary.
Rows Row numbers.
Columns Column numbers.
File Size File size.
Adversarial Similarity The number of Adversarial Similarity tests complete and scheduled to run.
Drift Detection The number of Drift Detection tests complete and scheduled to run.

Dataset summary states

As follows are the different types of states a dataset summary can be in:

  • NotCreated

    H2O Model Validation has not created the dataset summary.

  • Created

    H2O Model Validation created the dataset summary.

  • Running

    H2O Model Validation is currently running the dataset summary.

  • Done

    H2O Model Validation has completed the dataset summary.

  • Deleted

    H2O Model Validation deleted the dataset summary.

  • Error

    An error occurred during the dataset summary.

  • Timeout

    There was not enough time to complete the dataset summary.

Dataset summary table

Column name Description
Feature Feature name (one of the column names in the dataset).
Count Count (number) of value features present in the feature column.
Mean The typical feature value.
Std Feature values standard deviation (a measure of divergence or distribution of the feature values).
Min The minimum feature value.
Max The maximum feature value.
Missing Missing feature values.
Unique Unique feature values.
Freq The feature frequency value.

Note

H2O Model Validation will mark feature columns with N/A (not applicable) if the column feature value is non-numeric.