Skip to main content
Version: 1.2.0

Supported derived transformation

Transformation changes the raw data and makes it usable by a model.

Spark pipeline

Creating a feature set via Spark pipeline. The Spark pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the Spark pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.

User API:

Parameters:

  • pipeline_local_location: String or Pipeline Object - you pass the local path to the pipeline or the pipeline object itself. Once the feature set is registered, this parameter contains the path to the uploaded Spark pipeline in the Feature Store artifacts storage.
import featurestore.core.transformations as t
spark_pipeline_transformation = t.SparkPipeline("...")

Driverless AI MOJO

Creating a feature set via Driverless AI MOJO. The MOJO pipeline generates the data from an existing feature set that you pass in as an input to the pipeline. Feature Store then uploads the MOJO pipeline to the Feature Store artifacts cache and stores only the location of the pipeline in the database.

note

Only features created from Driverless AI with the make_mojo_scoring_pipeline_for_features_only setting are supported in Feature Store.

User API:

Parameters:

  • mojo_local_location: String - you pass the local path to the pipeline. Once the feature set is registered, this parameter contains the path to the uploaded MOJO pipeline in the Feature Store artifacts cache
import featurestore.core.transformations as t

transformation = t.DriverlessAIMOJO(...)

JoinFeatureSets

Creating a new feature set by joining together two different feature sets.

User API:

Parameters:

  • left_key: String - joining key which must be present in left feature set
  • right_key: String - joining key which must be present in right feature set
  • join_type: JoinFeatureSetsType - join type (default: JoinFeatureSetsType.INNER)

JoinFeatureSetsType

  • JoinFeatureSetsType.INNER - The inner join is the default join in Spark SQL. It selects rows that have matching values in both relations.
  • JoinFeatureSetsType.LEFT - A left join returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match.
  • JoinFeatureSetsType.RIGHT - A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match.
  • JoinFeatureSetsType.FULL - A full join returns all values from both relations, appending NULL values on the side that does not have a match.
  • JoinFeatureSetsType.CROSS - A cross join returns the Cartesian product of two relations.
import featurestore.core.transformations as t

transformation = t.JoinFeatureSets(left_key=..., right_key=..., join_type=...)
note

During join transformations, Feature Store perform inner joins


Feedback