Version: 1.2.0

Recommendation API

A Recommendation API can be used to suggest personalized recommendations based on the data stored in the feature sets. If you have two different feature sets, you can use a Recommendation API to find similarities between the features in those sets and recommend features that are similar in nature or data type.

A classifier can be considered as pattern recognition. Classifiers are used for recommending features based on pattern matching amongst different feature sets. For example, assume you specify a pattern for one feature set. If the same pattern appears in another feature set, the feature store will automatically recognize the pattern in the second feature set and recommend it to the user.

Feature store supports three types of classifiers:

Empty classifier - this classifier can only be assigned to the feature manually
Regex classifier - this classifier will be assigned to the feature after ingestion if the feature values match the configured regex. Regex classifier is typically used for numerical features.
Sample classifier - this classifier will be assigned to the feature after ingestion if the feature values match the configured sample data. Sample classifier is used for text-based features.

note

Classifiers can be defined only by the admins, and it applies to the entire feature store.

When multiple feature sets contain the same classifier, the Recommendation API generates a list of these feature sets. This list then can be used for joining feature sets that have common classifiers.

Creating a regex classifier

Regex classifiers are used to check if the value of the feature matches the regular expression provided by the classifier specification.

Python
Scala

from featurestore import RegexClassifier

# Create a regex classifier for a feature "zipcode" if 90% of incoming data match a pattern of 5 digits.
client.classifiers.create(RegexClassifier("zipcode", "^\d{5}$", percentage_match=90))

zipcode is the name of the classifier
^\d{5}$ is the classifier pattern that begins and ends with 5 digits
percentage_match=90 indicates at least 90% of the numbers should be 5 digits. percentage_match defines the minimum percentage of data that should match the pattern.

import ai.h2o.featurestore.core.collections.RegexClassifier

// Create a regex classifier for a feature "zipcode" if 90% of incoming data match a pattern of 5 digits.
client.classifiers.create(RegexClassifier("zipcode", "^\d{5}$", percentageMatch=90))

zipcode is the name of the classifier
^\d{5}$ is the classifier pattern that begins and ends with 5 digits
percentageMatch=90 indicates at least 90% of the numbers should be 5 digits. percentageMatch defines the minimum percentage of data that should match the pattern.

To check the output, run the following code, which will list all the classifiers you have created.

client.classifiers.list()

Output:

[
    RegexClassifier(name=zipcode, regex=^\d{5}$, percentage_match=90)
]

Creating a sample classifier

Sample classifiers partition an existing dataset to obtain a sample and find the closest pattern match on the new dataset.

Python
Scala

from featurestore import SampleClassifier

# Parameters included: Sampling fraction, Fuzzy distance and the minimum percentage that the data must match the pattern.
client.classifiers.create(SampleClassifier.from_feature_set(feature_set = fs, name = "countyname_classifier",  column_name="CountyName", sample_fraction=0.50, fuzzy_distance=1,  percentage_match=85))

feature_set is the feature set that you want to apply
name is the name of the classifier
column_name is the name of the column on which you create the classifier. You have to specify which text column you want to match.
sample_fraction specifies the fraction percentage of sample data that should be taken from the above column as opposed to taking the whole set of data. For example, the value specified above (0.50) indicates that only 50% of the sample data should be used from the above column.
fuzzy_distance means if you change one character, it should still match the pattern. For example, let’s say you have AZ for Arizona, and if there’s TZ somewhere, it will be treated as AZ because only one character is changed
percentage_match indicates that you want to match about 85% of the sample fraction

import ai.h2o.featurestore.core.collections.SampleClassifier

// Parameters included: Sampling fraction, Fuzzy distance and the minimum percentage that the data must match the pattern.
client.classifiers.create(SampleClassifier(name = "countyname_classifier", featureSet = fs, columnName="CountyName", sampeFraction=0.50, fuzzyDistance=1, percentageMatch=85))

featureSet is the feature set that you want to apply
name is the name of the classifier
columnName is the name of the column on which you create the classifier. You have to specify which text column you want to match.
sampleFraction specifies the fraction percentage of sample data that should be taken from the above column as opposed to taking the whole set of data. For example, the value specified above (0.50) indicates that only 50% of the sample data should be used from the above column.
fuzzyDistance means if you change one character, it should still match the pattern. For example, let’s say you have AZ for Arizona, and if there’s TZ somewhere, it will be treated as AZ because only one character is changed
percentageMatch indicates that you want to match about 85% of the sample fraction

Creating an empty classifier

An empty classifier is used to manually apply a pattern to a feature.

Python
Scala

# create an empty classifier
client.classifiers.create("classifierName")

// create empty classifier
client.classifiers.create("classifierName")

To check the output, run the following code, which will list all the classifiers you have created.

client.classifiers.list()

Output:

[
    EmptyClassifier(name=classifierName)
]

Changing a classifier manually

By using this method, you can annotate a feature with a specific classifier directly. The main advantage of classifiers is that they are assigned automatically, but users can also do this manually.

Python
Scala

fs = project.feature_sets.get("name")
feature = fs.features["feature"]
client.classifiers.list()  # lists all classifiers
feature.classifiers = {"ssn"}

val fs = project.featureSets.get("name")
val feature = fs.features("feature")
client.classifiers.list()  // lists all classifiers
feature.classifiers = Set("ssn")

Updating an existing classifier

An administrator of the Feature Store can update the classifiers:

Python
Scala

from featurestore import RegexClassifier, SampleClassifier

# create an empty classifier
client.classifiers.create("classifierName")

# update empty classifiers to regex classifier which will be applied if 10% of data match "test\d+" regex
client.classifiers.update(RegexClassifier("classifierName", "test\d+", 10))

import ai.h2o.featurestore.core.collections.{RegexClassifier, SampleClassifier}

// create an empty classifier
client.classifiers.create("classifierName")

// update empty classifiers to regex classifier which will be applied if 10% of data match "test\d+" regex
client.classifiers.update(RegexClassifier("classifierName", "test\d+", 10))

note

No update will be executed on the features. All automatically applied classifiers won't be changed until a new ingestion.

Deleting an existing classifier

An administrator of the Feature Store can delete the classifiers:

Python
Scala

from featurestore import RegexClassifier, SampleClassifier

# create empty classifier
client.classifiers.create("classifierName")

# delete classifier
client.classifiers.delete("classifierName")

import ai.h2o.featurestore.core.collections.{RegexClassifier, SampleClassifier}

// create empty classifier
client.classifiers.create("classifierName")

// delete classifier
client.classifiers.delete("classifierName")

note

No classifiers will be deleted from the features. To delete a classifier from a feature, you need to do so manually.

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Recommendation API

Creating a regex classifier​

Creating a sample classifier​

Creating an empty classifier​

Changing a classifier manually​

Updating an existing classifier​

Deleting an existing classifier​

Creating a regex classifier

Creating a sample classifier

Creating an empty classifier

Changing a classifier manually

Updating an existing classifier

Deleting an existing classifier