Skip to main content
Version: 1.2.0

Recommendation API

A Recommendation API can be used to suggest personalized recommendations based on the data stored in the feature sets. If you have two different feature sets, you can use a Recommendation API to find similarities between the features in those sets and recommend features that are similar in nature or data type.

A classifier can be considered as pattern recognition. Classifiers are used for recommending features based on pattern matching amongst different feature sets. For example, assume you specify a pattern for one feature set. If the same pattern appears in another feature set, the feature store will automatically recognize the pattern in the second feature set and recommend it to the user.

Feature store supports three types of classifiers:

  • Empty classifier - this classifier can only be assigned to the feature manually
  • Regex classifier - this classifier will be assigned to the feature after ingestion if the feature values match the configured regex. Regex classifier is typically used for numerical features.
  • Sample classifier - this classifier will be assigned to the feature after ingestion if the feature values match the configured sample data. Sample classifier is used for text-based features.
note

Classifiers can be defined only by the admins, and it applies to the entire feature store.

When multiple feature sets contain the same classifier, the Recommendation API generates a list of these feature sets. This list then can be used for joining feature sets that have common classifiers.

Creating a regex classifier

Regex classifiers are used to check if the value of the feature matches the regular expression provided by the classifier specification.

from featurestore import RegexClassifier

# Create a regex classifier for a feature "zipcode" if 90% of incoming data match a pattern of 5 digits.
client.classifiers.create(RegexClassifier("zipcode", "^\d{5}$", percentage_match=90))
  • zipcode is the name of the classifier
  • ^\d{5}$ is the classifier pattern that begins and ends with 5 digits
  • percentage_match=90 indicates at least 90% of the numbers should be 5 digits. percentage_match defines the minimum percentage of data that should match the pattern.

To check the output, run the following code, which will list all the classifiers you have created.

client.classifiers.list() 

Output:

[
RegexClassifier(name=zipcode, regex=^\d{5}$, percentage_match=90)
]

Creating a sample classifier

Sample classifiers partition an existing dataset to obtain a sample and find the closest pattern match on the new dataset.

from featurestore import SampleClassifier

# Parameters included: Sampling fraction, Fuzzy distance and the minimum percentage that the data must match the pattern.
client.classifiers.create(SampleClassifier.from_feature_set(feature_set = fs, name = "countyname_classifier", column_name="CountyName", sample_fraction=0.50, fuzzy_distance=1, percentage_match=85))
  • feature_set is the feature set that you want to apply
  • name is the name of the classifier
  • column_name is the name of the column on which you create the classifier. You have to specify which text column you want to match.
  • sample_fraction specifies the fraction percentage of sample data that should be taken from the above column as opposed to taking the whole set of data. For example, the value specified above (0.50) indicates that only 50% of the sample data should be used from the above column.
  • fuzzy_distance means if you change one character, it should still match the pattern. For example, let’s say you have AZ for Arizona, and if there’s TZ somewhere, it will be treated as AZ because only one character is changed
  • percentage_match indicates that you want to match about 85% of the sample fraction

Creating an empty classifier

An empty classifier is used to manually apply a pattern to a feature.

# create an empty classifier
client.classifiers.create("classifierName")

To check the output, run the following code, which will list all the classifiers you have created.

client.classifiers.list() 

Output:

[
EmptyClassifier(name=classifierName)
]

Changing a classifier manually

By using this method, you can annotate a feature with a specific classifier directly. The main advantage of classifiers is that they are assigned automatically, but users can also do this manually.

fs = project.feature_sets.get("name")
feature = fs.features["feature"]
client.classifiers.list() # lists all classifiers
feature.classifiers = {"ssn"}

Updating an existing classifier

An administrator of the Feature Store can update the classifiers:

from featurestore import RegexClassifier, SampleClassifier

# create an empty classifier
client.classifiers.create("classifierName")

# update empty classifiers to regex classifier which will be applied if 10% of data match "test\d+" regex
client.classifiers.update(RegexClassifier("classifierName", "test\d+", 10))
note

No update will be executed on the features. All automatically applied classifiers won't be changed until a new ingestion.

Deleting an existing classifier

An administrator of the Feature Store can delete the classifiers:

from featurestore import RegexClassifier, SampleClassifier

# create empty classifier
client.classifiers.create("classifierName")

# delete classifier
client.classifiers.delete("classifierName")
note

No classifiers will be deleted from the features. To delete a classifier from a feature, you need to do so manually.


Feedback