Version: 0.19.3

Concepts

This page explains the main concepts of Feature Store.

Projects

Projects are the repository that contain feature sets which are comprised of features. A project is the first thing you create in Feature Store. Projects can be used to separate work by department (e.g., engineering and accounting).

Projects can be made secret and can be locked.

Features

Features are columns of highly curated data. Features are used to enhance the performance of ML models because features are measurable data. Features can be seen when you call the schema, and the printout will be in the order of <column title> <feature>. For example:

category STRING, jobtitle STRING

Feature sets

A feature set is a collection of features. Feature sets are created via registration from the feature set schema. Registering a feature set simply means you are creating a new feature set. This information comes from a schema that you have extracted from a raw data source that you ingested into Feature Store.

The data sources for ingestion are available on the Supported data sources page.

Feature sets can be made secret.

Derived feature sets

Feature Store has the ability to create derived feature sets. Derived feature sets are created from a parent feature set that has applied transformations. When the parent feature set is ingested to or reverted from, it automatically triggers the ingesting and/or reverting changes for its derived feature set.

The supported ways of transformation are:

Feature views

Feature view allows you to retrieve features from different feature sets within a Project. You can select relevant features by joining two or more feature sets with applied filters. This creates an ML dataset (also called a training dataset).

By creating the ML dataset, you materialize feature view into your storage with a given start and end time.

Keys

A feature in the feature set can be marked as a primary key. This primary key can be used to search for a specific item in your data. Primary keys must have a unique value (e.g., a social security number). When you want to create data from more feature sets, these are the keys used for the joining process.

Secret and locked

Aspects of the Feature Store can be hidden from view or restricted to certain users:

Secret: Projects and feature sets can be made secret. This means that secret projects can only be seen by the project owner and secret feature sets can only be seen by the feature set owner.
Locked: Only projects can be locked. This means that only users with consumer or sensitive consumer permissions can get and list feature sets from within the project.
Permission: Owners can grant any permission. For more information on permissions, refer to the Permissions page.

Types of Feature Store users

There are four types of users for Feature Store:

Owner: This user created the project or feature set.
Editor: This user has been given permission by the owner allowing them to view and update a project and/or a feature set.
Consumer: This user has view-only retrieval rights.

Sensitive consumer: This user can retrieve feature sets with sensitive features.

	Owner	Editor	Consumer	Sensitive consumer
`Secret=True`	Project owner can see the secret project. Feature set owner can see the secret feature set.	Cannot see secret project or secret feature set without owner permission.	Cannot see secret project or secret feature set without owner permission.	Cannot see secret project or secret feature set without owner permission.
`Locked=True`	Can get and list feature sets from a locked project.	Can get and list feature sets from a locked project.	Can get and list feature sets from a locked project.	Can get and list feature sets from a locked project.

Storage

Feature Store uploads outputted data to a data store. You can obtain the data by downloading it using the pre-signed URL link.

Storage backend

Multiple storage backends are supported:

Any system exposing S3 API (AWS, Google Cloud, Minio)
Azure Data Lake Gen 2

Storage file format

Files are written in delta format.

Output data

Output data results from the materialization of the features. The data can then be used inside any ML platform.

Incremental ingest

Incremental ingestion is a consistent ingestion that takes place over time. Instead of ingesting all the data at once, it ingests new data over time (e.g., every five hours or every day). This can be done through scheduled ingestion.

Feature Store maintains one entry in storage for each major version of a feature set. New data are appended to storage during each new data ingest. Only unique values are appended.

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Concepts

Projects​

Features​

Feature sets​

Derived feature sets​

Feature views​

Keys​

Tags​

Secret and locked​

Types of Feature Store users​

Storage​

Storage backend​

Storage file format​

Output data​

Incremental ingest​