Skip to main content
Version: 0.19.3

Key terms

This page houses the keys terms used throughout this documentation.

Classifier

Classifiers are used for recommending features based on pattern matching amongst different feature sets. For example, if you provide the pattern to Feature Store on feature set A that a column with 5 digits is a zip code, then Feature Store will be able to identify any single column in feature set B that has 5 digits as a zip code (provided that there are not multiple columns with 5 digits).

Consumer

This is a user with view-only rights.

Core

The Feature Store Core is an application within Feature Store and has multiple duties. We use the Core to create the features for the database. It is also used to trigger the start of data manipulation tasks on the Spark cluster. It also performs authentication and queries for authorization permissions.

Data source

A data source is the file you ingest into Feature Store.

Derived feature set

When you apply transformations to a feature set, it will create a derived (new) feature set.

Editor

This is a user that has been given permission by the owner allowing them to view and update a project and its contents.

Extraction

Extraction is the act of retrieving the schema from a data source.

Feature

Features are highly curated data. They are used to enhance the performance of ML models for training models and model prediction.

Feature set

A feature set is a collection of features.

Feature view

A feature view allows you to retrieve features from different feature sets within a project. You can select relevant features by joining two or more feature sets with applied filters. This creates an ML dataset (also called a training dataset).

Ingesting

Ingesting is the term used to describe the act of loading a data source into Feature Store.

Joining

This is the act of combining two different feature sets.

Keys

Keys are used to search for a specific item in your data. Primary keys use in Feature Store have to be a unique value (e.g., a social security number).

Locked

Only projects can be locked. This means the project is visible, but only the owner and editors with permission can edit it.

Offline Feature Store

Offline Feature Store is responsible for storing features based on big data. It stores all the metadata about feature set schema, features, etc.

Online Feature Store

Online Feature Store is responsible for working with feature sets with which needs to be stored and obtained very quickly.

Owner

This is the person who created the project. They can view, edit, and update a project and its contents without any extra permissions. Owners can give permission.

Permission

Permission dictates what you can interact with and to what degree you can interact with it. Permission is granted by owners to editors. It allows editors to view secret projects and feature sets and to edit locked projects.

Project

A project is used to store feature sets. It is the highest level of the organizational hierarchy. Projects are the first thing that must be created when using Feature Store because they house all of the information the data sources, schemas, feature sets, etc.

Query

Queries are needed for creating feature views. It is built several ways selecting features from feature sets, joining feature sets together, and applying filters. The query for a feature view cannot be updated.

Registration

Registration is the act of registering feature sets into Feature Store. It is the command that creates a new feature set.

Retrieving

Retrieving is the action of re-acquiring your ingested data. You can filter data by start_date_time and end_date_time.

Reverting

Reverting is the removal of ingested data. The act of reversion creates a new version of the feature set with that data removed.

Schema

A schema represents the features of the feature set. It is extracted from a data source.

Secret

Projects and feature sets can be made secret. This means that the project or feature set is only visible to the owner and the editors that the owner has given permission to.

Serialization and deserialization

Serialization is the process of converting data into a series of bytes that can be stored and transmitted between objects. Deserialization is the reverse process where you create objects from a sequence of bytes.

Training data

Also called the Machine Learning (ML) dataset. It is a new feature set that is created from a feature view where you append two or more feature sets to retrieve specific features.

Transformation

A transformation is a change to the raw data that makes it usable by a model. There are different types of transformations, like changing the data format.


Feedback