Skip to content

Settings: Adversarial Similarity

H2O Model Validation offers an array of settings for an Adversarial Similarity test. Below, each setting is described in turn.

Training dataset

Defines the training dataset, one of the two datasets H2O Model Validation uses during the validation test to observe similar or dissimilar rows between the training dataset and reference dataset (test dataset). The defined training dataset dictates the required structure of the reference dataset (i.e., similar columns). H2O Model Validation requires you to define this setting before it can initiate an Adversarial Similarly validation test.

Reference dataset

Defines the reference dataset (test dataset), one of the two datasets H2O Model Validation uses during the validation test to observe similar or dissimilar rows between the training dataset and reference dataset (test dataset). The defined training dataset dictates the required structure of the reference dataset (i.e., similar columns). H2O Model Validation requires you to define this setting before it can initiate an Adversarial Similarly validation test.

Note

H2O Model Validation drops a particular column in the reference dataset if that column is not present in the defined train dataset.

ID column

Defines the ID column of the train and reference dataset, which H2O Model Validation does not use during training.

Note

An identity (ID) column is a column in a dataset that uniquely identifies the rows.

Columns to drop

Defines the columns H2O Model Validation drops during model training.

Info

This setting is proper when you want to drop columns that cause high dissimilarity (e.g., a time column).

Compute Shapley values

Determines if H2O Model Validation computes Shapley values for the model used to analyze the similarity between the train and reference dataset. H2O Model Validation uses the generated Shapley values to create an array of visual metrics that provide valuable insights into the contribution of individual features to the overall model performance.

Note

  • Generating Shapley values for the model can lead to a significant impact on the runtime.

  • Generated visual metrics can help understand what might cause a higher degree of dissimilarity between the train and reference dataset. To learn more about generated visual metrics, see Metrics.

Remove validation experiments from DAI after finish

Determines if H2O Model Validation should delete the Driverless AI (DAI) experiments generated during the Adversarial Similarity test. By default, H2O Model Validation checks this setting (enables it), and accordingly, H2O Model Validation deletes all DAI experiments because they are no longer needed after the validation test is complete.