Key terms
This page houses the keys terms used throughout this documentation.
Classifier
Classifiers are used for recommending features based on pattern matching amongst different feature sets. For example, if you provide the pattern to Feature Store on feature set A that a column with 5 digits is a zip code, then Feature Store will be able to identify any single column in feature set B that has 5 digits as a zip code (provided that there are not multiple columns with 5 digits).
Consumer
This is a user with view-only rights.
Core
The Feature Store Core is an application within Feature Store and has multiple duties. We use the Core to create the features for the database. It is also used to trigger the start of data manipulation tasks on the Spark cluster. It also performs authentication and queries for authorization permissions.
Data source
A data source is the file you ingest into Feature Store.
Derived feature set
When you apply transformations to a feature set, it will create a derived (new) feature set.
Editor
This is a user that has been given permission by the owner allowing them to view and update a project and its contents.
Extraction
Extraction is the act of retrieving the schema from a data source.
Feature
Features are highly curated data. They are used to enhance the performance of ML models for training models and model prediction.
Feature set
A feature set is a collection of features.
Feature view
A feature view allows you to retrieve features from different feature sets within a project. You can select relevant features by joining two or more feature sets with applied filters. This creates an ML dataset (also called a training dataset).
Ingesting
Ingesting is the term used to describe the act of loading a data source into Feature Store.
Joining
This is the act of combining two different feature sets.
Keys
Keys are used to search for a specific item in your data. Primary keys use in Feature Store have to be a unique value (e.g., a social security number).
Locked
Only projects can be locked. This means the project is visible, but only the owner and editors with permission can edit it.
Offline Feature Store
Offline Feature Store is responsible for storing features based on big data. It stores all the metadata about feature set schema, features, etc.
Online Feature Store
Online Feature Store is responsible for working with feature sets with which needs to be stored and obtained very quickly.
Owner
This is the person who created the project. They can view, edit, and update a project and its contents without any extra permissions. Owners can give permission.
Permission
Permission dictates what you can interact with and to what degree you can interact with it. Permission is granted by owners to editors. It allows editors to view secret projects and feature sets and to edit locked projects.
Project
A project is used to store feature sets. It is the highest level of the organizational hierarchy. Projects are the first thing that must be created when using Feature Store because they house all of the information the data sources, schemas, feature sets, etc.
Query
Queries are needed for creating feature views. It is built several ways selecting features from feature sets, joining feature sets together, and applying filters. The query for a feature view cannot be updated.
Registration
Registration is the act of registering feature sets into Feature Store. It is the command that creates a new feature set.
Retrieving
Retrieving is the action of re-acquiring your ingested data. You can
filter data by start_date_time
and end_date_time
.
Reverting
Reverting is the removal of ingested data. The act of reversion creates a new version of the feature set with that data removed.
Schema
A schema represents the features of the feature set. It is extracted from a data source.
Secret
Projects and feature sets can be made secret. This means that the project or feature set is only visible to the owner and the editors that the owner has given permission to.
Serialization and deserialization
Serialization is the process of converting data into a series of bytes that can be stored and transmitted between objects. Deserialization is the reverse process where you create objects from a sequence of bytes.
Training data
Also called the Machine Learning (ML) dataset. It is a new feature set that is created from a feature view where you append two or more feature sets to retrieve specific features.
Transformation
A transformation is a change to the raw data that makes it usable by a model. There are different types of transformations, like changing the data format.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai