Skip to main content
Version: 1.2.0

Ingest API

Feature store ensures that data for each specific feature set does not contain duplicates. That means that only data which are unique to the feature set cache are ingested as part of the ingest operation. The rows that would lead to duplicates are skipped.

Ingest can be run on instance of feature set representing any minor version. The data are always ingested on top of latest storage stage.

Offline ingestion

To ingest data into the Feature Store, run:

Blocking approach:

fs = project.feature_sets.get("gdp")
fs.ingest(source)
fs.ingest(source, credentials=credentials)
note

This method is not allowed for derived feature sets.

Non-Blocking approach:

fs = project.feature_sets.get("gdp")
future = fs.ingest_async(source)
fs.ingest_async(source, credentials=credentials)
note

This method is not allowed for derived feature sets.

More information about asynchronous methods is available at Asynchronous methods.

Parameters explanation:

  • source is the data source where Feature Store will ingest from.
  • credentials are credentials for the data source. If not provided, the client tries to read them from environmental variables. For more information about passing credentials as a parameter or via environmental variables, see Credentials configuration.

To ingest data into feature store from sources that gets changed periodically, run:

fs = project.feature_sets.get("gdp")
new_schema = client.extract_from_source(ingest_source)
if not fs.schema.is_compatible_with(new_schema, compare_data_types=True):
patched_schema = fs.schema.patch_from(new_schema, compare_data_types=True)
new_feature_set = fs.create_new_version(schema=patched_schema, reason="schema changed before ingest")
new_feature_set.ingest(ingest_source)
else:
fs.ingest(ingest_source)

This call materializes the data and stores it in the Feature Store storage.

note

Feature Store does not allow specification of a feature with the same name but different case. However, during ingestion we treat feature names case-insensitive. For example, when ingesting into feature set with single feature named city, the data are ingested correctly regardless of the case of the column name in the provided data source. We correctly match and ingest into city feature if column in the data source is named for example as CITY, CiTy or city.

Online ingestion

To ingest data into the online Feature Store, run:

feature_set.ingest_online(row/s)

The row/s is either a single JSON row or an array of JSON strings used to ingest into the online.

note

Feature set must have a primary key defined in order to ingest and retrieve from the online Feature Store.

This method is not allowed for derived feature sets.

Lazy ingestion

Lazy ingestion is a method which when be used to migrate feature sets from different systems without the need of ingesting feature sets immediately. In lazy ingestion, the ingestion process starts the first time data from feature sets are retrieved.

Corresponding scheduled task is created when lazy ingest is defined. For more information, please see Obtaining a lazy ingest task. Only one lazy ingest task can be defined per feature set major version.

When you ingest feature sets directly using feature_set.ingest(source), the ingest task associated with the feature set will be deleted from the feature store.

To ingest data lazy, run:

fs.ingest_lazy(source)
fs.ingest(source, credentials=credentials)
note

This method will run ingest on feature set retrieve. See Feature set schedule API for information on how to delete or edit scheduled task.


Feedback