Feature view API
Creating a feature view
To create a feature view, you need to build a query. You build a query by selecting features from feature sets, joining feature sets together, and by applying filters. You can also apply specific transformations through a feature view query. These transformations are supported:
- min_max_scaler
- standard_scaler
- robust_scaler
- string_indexer
During join transformations, Feature Store performs point in time inner or left joins
To create query with inner join execute:
- Python
- Scala
from featurestore.core.entities.query import Query
min_max = client.transformation_functions.get("min_max_scaler")
query = Query.select([feature_set1.features["UserId"], feature_set1.features["Label"], min_max.apply(feature_set2.features["X"])]) \
.from_feature_set(feature_set1, "alias1") \
.join(feature_set2, "alias2").on(feature_set1.features["UserId"], feature_set2.features["UserId"]) \
.end()
import ai.h2o.featurestore.core.entities.Query
val minMax = client.transformationFunctions.get("min_max_scaler")
val query = Query.select([featureSet1.features["UserId"], featureSet1.features["Label"], minMax(featureSet2.features["X"])])
.from(featureSet1, "alias1")
.join(featureSet2, "alias2").on(featureSet1.features["UserId"], featureSet2.features["UserId"])
.end()
To create query with left join execute:
- Python
- Scala
from featurestore.core.entities.query import Query
min_max = client.transformation_functions.get("min_max_scaler")
query = Query.select([feature_set1.features["UserId"], feature_set1.features["Label"], min_max.apply(feature_set2.features["X"])]) \
.from_feature_set(feature_set1, "alias1") \
.left_join(feature_set2, "alias2").on(feature_set1.features["UserId"], feature_set2.features["UserId"]) \
.end()
import ai.h2o.featurestore.core.entities.Query
val minMax = client.transformationFunctions.get("min_max_scaler")
val query = Query.select([featureSet1.features["UserId"], featureSet1.features["Label"], minMax(featureSet2.features["X"])])
.from(featureSet1, "alias1")
.leftJoin(featureSet2, "alias2").on(featureSet1.features["UserId"], featureSet2.features["UserId"])
.end()
To create feature view execute:
- Python
- Scala
feature_view = project.feature_views.create(name = "test", description="", query)
val featureView = project.featureViews.create(name = "test", description="", query)
Listing feature views within a project
- Python
- Scala
project.feature_views.list()
project.featureViews.list()
Obtaining a feature view
- Python
- Scala
feature_view = project.feature_views.get("feature_view_name", version=None)
val featureView = project.featureViews.get("feature_view_name")
or
val fs = project.featureViews.get("feature_set_name", 1)
If the version is not specified, the latest version of the feature view is returned.
Deleting feature views
- Python
- Scala
fv = project.feature_views.get("name")
fv.delete()
val fv = project.featureViews.get("name")
fv.delete()
Updating feature view fields
To update the field, simply call the setter of that field:
- Python
- Scala
fv = project.feature_views.get("name")
fv.description = "description"
val fv = project.featureViews.get("name")
fv.description = "description"
Creating a new feature view version
The query for a feature view cannot be updated directly. To change the query, you need to create a new version of the feature view with the updated query.
To create a new version of the feature view, you can use the create_new_version
method of the feature view object and pass the updated query as a parameter to the method. The query retrieves the data from the data source and updates the feature view with the new data.
- Python
- Scala
fv = project.feature_views.get("name")
query = Query.select([fs_1.features["abc"], fs_1.features["xyz"]]).from_feature_set(fs_1,"alias1").join(fs_2,"alias2").on(fs_1.features["pqr"], fs_2.features["mno"]).end() # Define the query to update the feature view
fv.create_new_version(query)
val fv = project.featureViews.get("name")
query = Query.select([fs_1.features["abc"], fs_1.features["xyz"]]).from_feature_set(fs_1,"alias1").join(fs_2,"alias2").on(fs_1.features["pqr"], fs_2.features["mno"]).end() // Define the query to update the feature view
fv.createNewVersion(query)
Obtaining data as a Spark Frame
You can read the data directly as a Spark Frame:
- Python
- Scala
data_frame = my_feature_view.as_spark_frame(spark_session, start_at=None, end_at=None)
val dataFrame = myFeatureView.asSparkFrame(sparkSession, startAt=None, endAt=None)
Read more about Spark dependencies.
Parameters Explanation:
- Python
- Scala
If start_at
and end_at
are empty, all ingested data are fetched.
Otherwise, these parameters are used to retrieve only a specific range
of ingested data. For example, when ingested data are in a time range
between T1 <= T2
, start_date_time
can have any value T3
and
end_date_time
can have any value T4
, where T1 <= T3 <= T4 <= T2
.
If startAt
and endAt
are empty, all ingested data are fetched.
Otherwise, these parameters are used to retrieve only a specific range
of ingested data. For example, when ingested data are in a time range
between T1 <= T2
, startDateTime
can have any value T3
and
endDateTime
can have any value T4
, where T1 <= T3 <= T4 <= T2
.
Downloading the files from Feature Store
You can download the data to your local machine by:
- Python
- Scala
dir = my_feature_view.download(start_at=None, end_at=None)
val dir = myFeatureView.download(startAt=None, endAt=None)
Parameters Explanation:
- Python
- Scala
If start_at
and end_at
are empty, all ingested data are fetched.
Otherwise, these parameters are used to retrieve only a specific range
of ingested data. For example, when ingested data are in a time range
between T1 <= T2
, start_date_time
can have any value T3
and
end_date_time
can have any value T4
, where T1 <= T3 <= T4 <= T2
.
If startAt
and endAt
are empty, all ingested data are fetched.
Otherwise, these parameters are used to retrieve only a specific range
of ingested data. For example, when ingested data are in a time range
between T1 <= T2
, startDateTime
can have any value T3
and
endDateTime
can have any value T4
, where T1 <= T3 <= T4 <= T2
.
Creating a machine learning dataset
Creating a machine learning (ML) dataset allows you to materialize a feature view into the Feature Store. To create a machine learning dataset in a Feature Store, you can call the create
method of the ml_datasets
object of the Feature Store. You need to provide a name for the ML dataset, and if required, you can also specify the time period for which you want to include data in your ML dataset.
- Python
- Scala
ml_dataset = my_feature_view.ml_datasets.create("name", start_date_time=None, end_date_time=None)
mlDataSet = myFeatureView.mlDatasets.create("name", startDateTime=None, endDateTime=None)
Parameters Explanation:
- Python
- Scala
If start_date_time
and end_date_time
are empty, all ingested data
are fetched. Otherwise, these parameters are used to retrieve only a
specific range of ingested data. For example, when ingested data are in
a time range between T1 <= T2
, start_date_time
can have any value
T3
and end_date_time
can have any value T4
, where
T1 <= T3 <= T4 <= T2
.
If startDateTime
and endDateTime
are empty, all ingested data are
fetched. Otherwise, these parameters are used to retrieve only a
specific range of ingested data. For example, when ingested data are in
a time range between T1 <= T2
, startDateTime
can have any value T3
and endDateTime
can have any value T4
, where T1 <= T3 <= T4 <= T2
.
Obtaining data as a Spark Frame from the ML dataset
- Python
- Scala
ml_dataset = my_feature_view.ml_datasets.get("name")
data_frame = ml_dataset.as_spark_frame(sparkSession)
mlDataset = myFeatureView.mlDatasets.get("name")
dataFrame = mlDataset.asSparkFrame(sparkSession)
Downloading the files from Feature Store from the ML dataset
You can download the data to your local machine by:
- Python
- Scala
ml_dataset = my_feature_view.ml_datasets.get("name")
dir = ml_dataset.download()
mlDataset = myFeatureView.mlDatasets.get("name")
dir = mlDataset.download()
Retrieving data from online feature store
Once the ML dataset is created and the job finished, you can retrieve the latest feature value from the online store. To retrieve these feature values, you have to provide all primary keys to the feature sets. All transformations defined in the query will be applied during this retrieval by a pipeline created during the creation of the ML dataset.
- Python
- Scala
ml_dataset = my_feature_view.ml_datasets.get("name")
ml_dataset.retrieve_online(1)
mlDataset = myFeatureView.mlDatasets.get("name")
mlDataset.retrieveOnline(1)
Feature view and ML dataset permissions
The permission model of the project and feature sets is inherited by feature views and ML datasets that are created within that project and feature set.
In other words, any permissions that apply to a project and feature set, also apply to feature views and ML datasets created within a particular project and feature sets. For more information, see Permissions.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai