Version: 1.2.0

Supported derived transformation

Transformation changes the raw data and makes it usable by a model.

Spark pipeline

Creating a feature set via Spark pipeline. The Spark pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the Spark pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.

User API:

Python
Scala

Parameters:

pipeline_local_location: String or Pipeline Object - you pass the local path to the pipeline or the pipeline object itself. Once the feature set is registered, this parameter contains the path to the uploaded Spark pipeline in the Feature Store artifacts storage.

import featurestore.core.transformations as t
spark_pipeline_transformation = t.SparkPipeline("...")

Parameters:

pipelineLocalLocation: String or Pipeline Object - you pass the local path to the pipeline or the pipeline object itself. Once the feature set is registered, this parameter contains the path to the uploaded Spark pipeline in the Feature Store artifacts storage.

import ai.h2o.featurestore.core.transformations.SparkPipeline
val sparkPipelineTransformation = t.SparkPipeline("...")

Driverless AI MOJO

Creating a feature set via Driverless AI MOJO. The MOJO pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the MOJO pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.

note

Only features created from Driverless AI with the make_mojo_scoring_pipeline_for_features_only setting are supported in Feature Store.

User API:

Python
Scala

Parameters:

mojo_local_location: String - you pass the local path to the pipeline. Once the feature set is registered, this parameter contains the path to the uploaded MOJO pipeline in the Feature Store artifacts cache

import featurestore.core.transformations as t

transformation = t.DriverlessAIMOJO(...)

Parameters:

mojoLocalLocation: String - you pass the local path to the pipeline. Once the feature set is registered, this parameter contains the path to the uploaded MOJO pipeline in the Feature Store artifacts cache

import ai.h2o.featurestore.core.transformations.DriverlessAIMOJO

val transformation = DriverlessAIMOJO(...)

JoinFeatureSets

Creating a new feature set by joining together two different feature sets.

User API:

Python
Scala

Parameters:

left_key: String - joining key which must be present in left feature set
right_key: String - joining key which must be present in right feature set
join_type: JoinFeatureSetsType - join type (default: JoinFeatureSetsType.INNER)

JoinFeatureSetsType

JoinFeatureSetsType.INNER - The inner join is the default join in Spark SQL. It selects rows that have matching values in both relations.
JoinFeatureSetsType.LEFT - A left join returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match.
JoinFeatureSetsType.RIGHT - A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match.
JoinFeatureSetsType.FULL - A full join returns all values from both relations, appending NULL values on the side that does not have a match.
JoinFeatureSetsType.CROSS - A cross join returns the Cartesian product of two relations.

import featurestore.core.transformations as t

transformation = t.JoinFeatureSets(left_key=..., right_key=..., join_type=...)

Parameters:

leftKey: String - joining key which must be present in left feature set
rightKey: String - joining key which must be present in right feature set
joinType: JoinFeatureSetsType - join type (default: JoinFeatureSetsType.INNER)

JoinFeatureSetsType

JoinFeatureSetsType.INNER - The inner join is the default join in Spark SQL. It selects rows that have matching values in both relations.
JoinFeatureSetsType.LEFT - A left join returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match.
JoinFeatureSetsType.RIGHT - A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match.
JoinFeatureSetsType.FULL - A full join returns all values from both relations, appending NULL values on the side that does not have a match.
JoinFeatureSetsType.CROSS - A cross join returns the Cartesian product of two relations.

import ai.h2o.featurestore.core.transformations.JoinFeatureSets

val transformation = JoinFeatureSets(leftKey=..., rightKey=...,joinType=...)

note

During join transformations, Feature Store perform inner joins

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Supported derived transformation

Spark pipeline​

Driverless AI MOJO​

JoinFeatureSets​

Spark pipeline

Driverless AI MOJO

JoinFeatureSets