Recommendation API
A Recommendation API can be used to suggest personalized recommendations based on the data stored in the feature sets. If you have two different feature sets, you can use a Recommendation API to find similarities between the features in those sets and recommend features that are similar in nature or data type.
A classifier can be considered as pattern recognition. Classifiers are used for recommending features based on pattern matching amongst different feature sets. For example, assume you specify a pattern for one feature set. If the same pattern appears in another feature set, the feature store will automatically recognize the pattern in the second feature set and recommend it to the user.
Feature store supports three types of classifiers:
- Empty classifier - this classifier can only be assigned to the feature manually
- Regex classifier - this classifier will be assigned to the feature after ingestion if the feature values match the configured regex. Regex classifier is typically used for numerical features.
- Sample classifier - this classifier will be assigned to the feature after ingestion if the feature values match the configured sample data. Sample classifier is used for text-based features.
Classifiers can be defined only by the admins, and it applies to the entire feature store.
When multiple feature sets contain the same classifier, the Recommendation API generates a list of these feature sets. This list then can be used for joining feature sets that have common classifiers.
Creating a regex classifier
Regex classifiers are used to check if the value of the feature matches the regular expression provided by the classifier specification.
- Python
- Scala
from featurestore import RegexClassifier
# Create a regex classifier for a feature "zipcode" if 90% of incoming data match a pattern of 5 digits.
client.classifiers.create(RegexClassifier("zipcode", "^\d{5}$", percentage_match=90))
zipcode
is the name of the classifier^\d{5}$
is the classifier pattern that begins and ends with 5 digitspercentage_match=90
indicates at least 90% of the numbers should be 5 digits.percentage_match
defines the minimum percentage of data that should match the pattern.
import ai.h2o.featurestore.core.collections.RegexClassifier
// Create a regex classifier for a feature "zipcode" if 90% of incoming data match a pattern of 5 digits.
client.classifiers.create(RegexClassifier("zipcode", "^\d{5}$", percentageMatch=90))
zipcode
is the name of the classifier^\d{5}$
is the classifier pattern that begins and ends with 5 digitspercentageMatch=90
indicates at least 90% of the numbers should be 5 digits.percentageMatch
defines the minimum percentage of data that should match the pattern.
To check the output, run the following code, which will list all the classifiers you have created.
client.classifiers.list()
Output:
[
RegexClassifier(name=zipcode, regex=^\d{5}$, percentage_match=90)
]
Creating a sample classifier
Sample classifiers partition an existing dataset to obtain a sample and find the closest pattern match on the new dataset.
- Python
- Scala
from featurestore import SampleClassifier
# Parameters included: Sampling fraction, Fuzzy distance and the minimum percentage that the data must match the pattern.
client.classifiers.create(SampleClassifier.from_feature_set(feature_set = fs, name = "countyname_classifier", column_name="CountyName", sample_fraction=0.50, fuzzy_distance=1, percentage_match=85))
feature_set
is the feature set that you want to applyname
is the name of the classifiercolumn_name
is the name of the column on which you create the classifier. You have to specify which text column you want to match.sample_fraction
specifies the fraction percentage of sample data that should be taken from the above column as opposed to taking the whole set of data. For example, the value specified above (0.50) indicates that only 50% of the sample data should be used from the above column.fuzzy_distance
means if you change one character, it should still match the pattern. For example, let’s say you have AZ for Arizona, and if there’s TZ somewhere, it will be treated as AZ because only one character is changedpercentage_match
indicates that you want to match about 85% of the sample fraction
import ai.h2o.featurestore.core.collections.SampleClassifier
// Parameters included: Sampling fraction, Fuzzy distance and the minimum percentage that the data must match the pattern.
client.classifiers.create(SampleClassifier(name = "countyname_classifier", featureSet = fs, columnName="CountyName", sampeFraction=0.50, fuzzyDistance=1, percentageMatch=85))
featureSet
is the feature set that you want to applyname
is the name of the classifiercolumnName
is the name of the column on which you create the classifier. You have to specify which text column you want to match.sampleFraction
specifies the fraction percentage of sample data that should be taken from the above column as opposed to taking the whole set of data. For example, the value specified above (0.50) indicates that only 50% of the sample data should be used from the above column.fuzzyDistance
means if you change one character, it should still match the pattern. For example, let’s say you have AZ for Arizona, and if there’s TZ somewhere, it will be treated as AZ because only one character is changedpercentageMatch
indicates that you want to match about 85% of the sample fraction
Creating an empty classifier
An empty classifier is used to manually apply a pattern to a feature.
- Python
- Scala
# create an empty classifier
client.classifiers.create("classifierName")
// create empty classifier
client.classifiers.create("classifierName")
To check the output, run the following code, which will list all the classifiers you have created.
client.classifiers.list()
Output:
[
EmptyClassifier(name=classifierName)
]
Changing a classifier manually
By using this method, you can annotate a feature with a specific classifier directly. The main advantage of classifiers is that they are assigned automatically, but users can also do this manually.
- Python
- Scala
fs = project.feature_sets.get("name")
feature = fs.features["feature"]
client.classifiers.list() # lists all classifiers
feature.classifiers = {"ssn"}
val fs = project.featureSets.get("name")
val feature = fs.features("feature")
client.classifiers.list() // lists all classifiers
feature.classifiers = Set("ssn")
Updating an existing classifier
An administrator of the Feature Store can update the classifiers:
- Python
- Scala
from featurestore import RegexClassifier, SampleClassifier
# create an empty classifier
client.classifiers.create("classifierName")
# update empty classifiers to regex classifier which will be applied if 10% of data match "test\d+" regex
client.classifiers.update(RegexClassifier("classifierName", "test\d+", 10))
import ai.h2o.featurestore.core.collections.{RegexClassifier, SampleClassifier}
// create an empty classifier
client.classifiers.create("classifierName")
// update empty classifiers to regex classifier which will be applied if 10% of data match "test\d+" regex
client.classifiers.update(RegexClassifier("classifierName", "test\d+", 10))
No update will be executed on the features. All automatically applied classifiers won't be changed until a new ingestion.
Deleting an existing classifier
An administrator of the Feature Store can delete the classifiers:
- Python
- Scala
from featurestore import RegexClassifier, SampleClassifier
# create empty classifier
client.classifiers.create("classifierName")
# delete classifier
client.classifiers.delete("classifierName")
import ai.h2o.featurestore.core.collections.{RegexClassifier, SampleClassifier}
// create empty classifier
client.classifiers.create("classifierName")
// delete classifier
client.classifiers.delete("classifierName")
No classifiers will be deleted from the features. To delete a classifier from a feature, you need to do so manually.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai