Concepts
This page explains the main concepts of Feature Store.
Projects
Projects are the repository that contain feature sets which are comprised of features. A project is the first thing you create in Feature Store. Projects can be used to separate work by department (e.g., engineering and accounting).
Projects can be made secret and can be locked.
Features
Features are columns of highly curated data. Features are used to
enhance the performance of ML models because features are measurable
data. Features can be seen when you call the
schema, and the printout will be in
the order of <column title> <feature>
. For example:
category STRING, jobtitle STRING
Feature sets
A feature set is a collection of features. Feature sets are created via registration from the feature set schema. Registering a feature set simply means you are creating a new feature set. This information comes from a schema that you have extracted from a raw data source that you ingested into Feature Store.
The data sources for ingestion are available on the Supported data sources page.
Feature sets can be made secret.
Derived feature sets
Feature Store has the ability to create derived feature sets. Derived feature sets are created from a parent feature set that has applied transformations. When the parent feature set is ingested to or reverted from, it automatically triggers the ingesting and/or reverting changes for its derived feature set.
The supported ways of transformation are:
Feature views
Feature view allows you to retrieve features from different feature sets within a Project. You can select relevant features by joining two or more feature sets with applied filters. This creates an ML dataset (also called a training dataset).
By creating the ML dataset, you materialize feature view into your storage with a given start and end time.
Keys
A feature in the feature set can be marked as a primary key. This primary key can be used to search for a specific item in your data. Primary keys must have a unique value (e.g., a social security number). When you want to create data from more feature sets, these are the keys used for the joining process.
Tags
Tags can be attached to feature sets for filtering purposes.
Secret and locked
Aspects of the Feature Store can be hidden from view or restricted to certain users:
- Secret: Projects and feature sets can be made secret. This means that secret projects can only be seen by the project owner and secret feature sets can only be seen by the feature set owner.
- Locked: Only projects can be locked. This means that only users with consumer or sensitive consumer permissions can get and list feature sets from within the project.
- Permission: Owners can grant any permission. For more information on permissions, refer to the Permissions page.
Types of Feature Store users
There are four types of users for Feature Store:
Owner: This user created the project or feature set.
Editor: This user has been given permission by the owner allowing them to view and update a project and/or a feature set.
Consumer: This user has view-only retrieval rights.
Sensitive consumer: This user can retrieve feature sets with sensitive features.
Owner Editor Consumer Sensitive consumer Secret=True
Project owner can see the secret project. Feature set owner can see the secret feature set. Cannot see secret project or secret feature set without owner permission. Cannot see secret project or secret feature set without owner permission. Cannot see secret project or secret feature set without owner permission. Locked=True
Can get and list feature sets from a locked project. Can get and list feature sets from a locked project. Can get and list feature sets from a locked project. Can get and list feature sets from a locked project.
Storage
Feature Store uploads outputted data to a data store. You can obtain the data by downloading it using the pre-signed URL link.
Storage backend
Multiple storage backends are supported:
- Any system exposing S3 API (AWS, Google Cloud, Minio)
- Azure Data Lake Gen 2
Storage file format
Files are written in delta format.
Output data
Output data results from the materialization of the features. The data can then be used inside any ML platform.
Incremental ingest
Incremental ingestion is a consistent ingestion that takes place over time. Instead of ingesting all the data at once, it ingests new data over time (e.g., every five hours or every day). This can be done through scheduled ingestion.
Feature Store maintains one entry in storage for each major version of a feature set. New data are appended to storage during each new data ingest. Only unique values are appended.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai