Skip to main content
Version: 0.19.3

Migration guide

From 0.19.1 to 0.19.2

  • Helm argument core.config.databaseName has been removed without replacement. Database must be included in the PostgreSQL JDBC connection string

  • Helm argument core.config.dbConnectionString has been renamed to core.database.dsn. This parameter expects PostgreSQL JDBC connection string

  • Helm arguments core.database.username and core.database.password have been removed. Username and password (if applicable) must be passed to core.database.dsn. Please inspect PostgreSQL connection string format for more details.

  • The following Helm parameters were removed:

    • core.config.spark.userNameAttribute - from now on there is no possibility to select which attribute will be set to label
  • The following k8 labels on spark jobs are renamed:

    • job-id to featurestore.h2o.ai/job-id
    • job-name to featurestore.h2o.ai/job-name
    • project to cloud.h2o.ai/workspace
    • feature-set to featurestore.h2o.ai/feature-set-id and featurestore.h2o.ai/feature-set-version
    • feature-view to featurestore.h2o.ai/feature-view-id and featurestore.h2o.ai/feature-view-version
    • mldataset to featurestore.h2o.ai/mldataset-id
    • user-name to cloud.h2o.ai/creator

From 0.18.0 to 0.19.0

  • Starting from 0.19.0 feature name cannot contain `
  • Starting from 0.19.0 feature in partition_by cannot be nested or have complex type (struct, array)
  • Starting from 0.19.0 api GetUserByMail is deleted
  • In Helm, extra Spark options in property sparkoperator.config.spark.extraOptions should be passed as array elements instead as single value

From 0.16.0 to 0.17.0

  • Starting from 0.17.0 methods feature_sets.register, feature_set.flow use enum FeatureSetFlow instead of string
  • To enable the pg_trgm extension, which is required by the Azure platform, you can follow the steps outlined in the Azure extensions documentation

From 0.15.0 to 0.16.0

  • Starting from 0.16.0 Azure Gen2 Dependencies jar doesn't contain the transitive dependencies. Please refer to Spark dependencies to see which dependencies must be present on your local Spark cluster to support retrieval of data using Spark frames.

  • The following Helm parameters were renamed:

    • global.cache.username to global.storage.username
    • global.cache.password to global.storage.password
    • global.config.cacheBackend to global.config.storageBackend

From 0.14.0 to 0.15.0

  • Kafka related Helm properties global.config.messaging.kafka.topicsConfig.[topic-name].retentionMs, global.config.messaging.kafka.topicsConfig.[topic-name].retentionMinutes and global.config.messaging.kafka.topicsConfig.[topic-name].retentionHours are replaced by single global.config.messaging.kafka.topicsConfig.[topic-name].retentionPolicy. Policy is specified by duration format defined in ISO 8601-1 standard.
  • Added new fields feature_set_id and feature_set_version and marked feature_set as deprecated in proto message IngestResponse. feature_set field will be deleted in next major version 1.0.0.
  • Added new field project_id and marked project field as deprecated proto messages: ProjectPermissionRequest, DeleteProjectRequest, GetFeatureSetRequest. project field will be deleted in next major version 1.0.0.
  • Added new fields feature_set_id and marked feature_set as deprecated in proto messages: ListJobsRequest, GetRecommendationRequest, FeatureSetPermissionRequest, ListFeatureSetsVersionRequest, DeleteFeatureSetRequest. feature_set field will be deleted in next major version 1.0.0.
  • Added new fields feature_set_id and feature_set_version and marked feature_set as deprecated in proto messages: GetIngestHistoryRequest, StartRevertIngestJobRequest, StartIngestJobRequest, RetrieveRequest, StartMaterializationOnlineRequest, CreateNewFeatureSetVersionRequest. feature_set field will be deleted in release 1.0.0
  • GRPC method ListTokens has been deprecated and replaced by ListPersonalAccessTokens which uses pagination. The former method will be removed in release 1.0.0
  • In Scala and Python client, client.auth.pats.list() now returns iterator instead of list.

From 0.13.0 to 0.14.0

  • Deprecated behaviour starting preview job has been removed.
  • Bearer token prefix is now required in Authorization header.
  • All deprecated arguments in release 0.12.0 are now removed.
  • All deprecated updated API methods are removed.
  • Deprecated owner field has been removed.
  • On GRPC level, FeatureSetHeader has been replaced by featureSetId and featureSetVersion fields in the following messages: OnlineRetrieveRequest, OnlineIngestRequest and GetFeatureSetsLastMinorForCurrentMajorRequest.
  • Helm properties disable-api.deletion and disable-api.role-assigment have been removed. New Helm property prohibited.cli.methods has been introduced. This property allows the admin to specify list of methods to be disabled from CLI, such as:ai.h2o.featurestore.api.v1.CoreService/DeleteFeatureSet,ai.h2o.featurestore.api.v1.CoreService/DeleteProject.
  • GRPC method listFeature has been renamed to listFeatures.
  • Event method GetFeatureSetLastMinor has been renamed to GetFeatureSet.

From 0.12.0 to 0.12.1

In Scala CLI, the arguments tags, filterBuilder and jsonQuery in featureSets.list and argument filterBuilder in projects.listFeatureSets method are deprecated and will be removed in 0.14.0. If you need to filter the listed feature sets, please use Scala filtering capabilities on received FeatureSet iterator.

In Python CLI, the arguments tags, filters in feature_sets.list and argument filters in projects.list_feature_sets method are deprecated and will be removed in 0.14.0. If you need to filter the listed feature sets, please use Python filtering capabilities on received FeatureSet iterator (such as list comprehensions).

On GRPC API, the argument query in ListFeatureSetsPageRequest is deprecated and will be removed in 0.14.0. If you need to filter the feature sets, please filter those on the received end of your application.

From 0.11.0 to 0.12.0

From version 0.12.0, it is recommended to add prefix "Bearer" to Authorization header. Handling Authorization header without that prefix will be removed in 0.14.0

Java GRPC API methods were previously generated into a single class. With version 0.12.0 the API is split into multiple classes. If you are using Java GRPC api, you will need to update the imports on your application.

From 0.10.0 to 0.11.0

Deprecated GRPC methods:

  • ListFeatureSets and ListFeatureSetsAcrossProjects have been removed. Please use ListFeatureSetsPage instead.

  • ListProjects has been removed. Please use ListProjectsPage instead.

  • UpdateFeatureSetPrimaryKey will be removed in 0.14.0 without replacement. Changing the primary key is now only possible during the creation of a new feature set version.

  • The following will be removed in 0.14.0, so please use UpdateProject instead:

    • UpdateProjectCustomData
    • UpdateProjectDescription
    • UpdateProjectSecret
    • UpdateProjectLocked
  • The following will be removed in 0.14.0, so please use UpdateFeatureSet instead:

    • UpdateFeatureSetTags
    • UpdateFeatureSetDataSourceDomains
    • UpdateFeatureSetDescription
    • UpdateFeatureSetType
    • UpdateFeatureSetApplicationName
    • UpdateFeatureSetApplicationId
    • UpdateFeatureSetDeprecated
    • UpdateFeatureSetProcessInterval
    • UpdateFeatureSetProcessIntervalUnit
    • UpdateFeatureSetFlow
    • UpdateFeatureSetState
    • UpdateFeatureSetSecret
    • UpdateFeatureSetCustomData
    • UpdateTimeToLiveOfflineInterval
    • UpdateTimeToLiveOfflineIntervalUnit
    • UpdateTimeToLiveOnlineInterval
    • UpdateTimeToLiveOnlineIntervalUnit
    • UpdateFeatureSetOnlineNamespace
    • UpdateFeatureSetOnlineTopic
    • UpdateFeatureSetOnlineConnectionType
    • UpdateFeatureSetLegalApproved
    • UpdateFeatureSetLegalApprovedNotes
  • The following will be removed in 0.14.0, so please use UpdateFeature instead:

    • UpdateFeatureStatus
    • UpdateFeatureType
    • UpdateFeatureImportance
    • UpdateFeatureDescription
    • UpdateFeatureSpecial
    • UpdateFeatureAnomalyDetection
    • UpdateFeatureCustomData
    • UpdateFeatureClassifiers
  • UpdateProjectOwner will be removed in 0.14.0. Please use AddProjectPermission or RemoveProjectPermission instead.

  • UpdateFeatureSetOwner will be removed in 0.14.0. Please use AddFeatureSetPermission or RemoveFeatureSetPermission instead.

In both the Scala and Python CLI, the setter for the primary key is deprecated and will be removed in 0.14.0. Changing the primary key is now only possible using a new argument exposed on the create new version API call.

Deprecated classifierName field has been removed from CreateRecommendationClassifierRequest GRPC API.

Deprecated preview on the retrieve holder has been removed. Please use fs.get_preview() instead.

Deprecated secondary_key field has been removed from feature set. All secondary keys are pushed into primary_key field.

Deprecated owner field will be removed from project and feature set in 0.14.0 on API and also on proto entities. Please use owners instead.

Starting with release 0.11.0, the retrieve method starts respecting minor versions of feature sets. That means that running retrieve on version 1.3 retrieves the data up to version 1.3. This ensures proper consistency for external tools depending on a specific feature set version. The data are also immutable in case of reverts. Meaning that previously, when you reverted an ingest, the data retrieved for that feature set were different after that retrieve operation.

note

The consistent retrieval works as explained above for all ingestions and reverts called starting with version 0.11.0. Retrieving feature set prior version 0.10.0 leads to the original behaviour.

From 0.9.0 to 0.10.0

The collection of historical policies has been removed and migrated to a new permissions collection. This collection contains information about previous permission updates. If a permission has been replaced by a new higher permission, its state is PROMOTED. If the permission has been removed, its state is REVOKED.

ref.preview() has been deprecated and replaced with the new API command fs.get_preview(). This preview is computed and stored during the first ingestion. Until the upcoming Feature Store release of 0.14.0, the get_preview() method will compute the missing preview and store it on the backend. We highly recommend that you run this method on prior existing feature sets before 0.14.0 to make sure that the preview is populated.

From 0.8.0 to 0.9.0

The optional arguments start_date_time and end_date_time for the Python CLI have been removed from the ingest / ingest_async methods as they are no longer needed.

The optional arguments startDateTime and endDateTime for the Scala CLI have been removed from the ingest / ingestAsync methods as they are no longer needed.

From 0.6.0 to 0.8.0

GRPC method listProjects is now deprecated. Please switch to the listProjectsPage API which uses pagination. While we don\'t plan to remove the original methods to preserve backwards compatibility, we strongly suggest using the paginated variant.

GRPC methods ListFeatureSets and ListFeatureSetsAcrossProjects are now deprecated. Please switch to the ListFeatureSetsPage API which uses pagination and replaces both of the former methods. While we don\'t plan to remove the original methods to preserve backwards compatibility, we strongly suggest using the paginated variant.

The list methods in Python and Scala for projects and feature sets now return iterators instead of full collections starting from version 0.7.0.

partitionPattern is now deprecated and has been removed on folder data sources.

From 0.5.0 to 0.6.0

All Proto and GRPC classed have been moved from package ai.h2o.featurestore.core to package ai.h2o.featurestore.api.v1. Please update your code using our GRPC API by updating your imports.

The GenerateToken RPC call now accepts a Proto timestamp as an expiration date instead of string representation.

From 0.4.0 to 0.5.0

The environment variables required to pass AWS credentials have been changed to a more generic name to support AWS S3 and S3 compatible sources like Minio, Google Cloud, etc.

Previously, you needed to set the following environment variables to read data from AWS:

export AWS_ACCESS_KEY=my aws key
export AWS_SECRET_KEY=my secret
export AWS_REGION=my region
export AWS_ROLE_ARN=my role

Now, to achieve the same, you set the following variables:

export S3_ACCESS_KEY=my aws key
export S3_SECRET_KEY=my secret
export S3_REGION=my region
export S3_ROLE_ARN=my role

We have also renamed the AWS credentials pass on the clients from AWSCredentials to S3Credentials.

Derived feature sets

In 0.5.0, we introduced derived feature sets. Derived data sources (e.g., SparkPipeline, DriverlessAIMOJO, JoinFeatureSets) are now deprecated and will be removed 0.6.0. As such, if you want to ingest new data to your feature sets that are using those derived data sources, you must migrate to derived feature sets instead.

To migrate to a derived feature set, a new version needs to be created using that derived schema with the selected transformation. Once this new version is created, ingestion is automatically triggered. This action will write all data from the parent feature set(s) with the applied transformation. The following example shows how to do this using the Python client:

import featurestore.transformations as t

spark_pipeline_transformation = t.SparkPipeline("...")
spark_pipeline_schema = client.extract_derived_schema([parent_feature_set], spark_pipeline_transformation)
derived_feature_set = feature_set_to_be_derived.create_new_version(schema=spark_pipeline_schema)
note

To allow automatic ingestion on the derived feature set that uses DriverlessAIMOJO, the new parameter sparkoperator.driverlessAiLicenseKey needs to be added to the Helm values. It should contain your license to Driverless AI (which is kept in k8 secrets).

From 0.2.0 to 0.3.0

Prior to version 0.3.0, the partition pattern accepted date{} syntax in the folder's data sources. This has been removed as it is now obsolete due to several optimizations of the internal code.

Please update all your existing partition patterns and update the date{..} by .*.

Feature set ingest API changes

Previously, when ingesting data from data sources that periodically change using the Python CLI, you would use the following API call:

fs.ingest(ingest_source, new_version_on_schema_change=True)

Now, to achieve the same, you use the following API:

new_schema = client.extract_from_source(ingest_source)
if not fs.schema.is_compatible_with(new_schema, compare_data_types=False):
patched_schema = fs.schema.patch_from(new_schema, compare_data_types=False)
new_feature_set = fs.create_new_version(schema=patched_schema, reason="schema changed before ingest")
new_feature_set.ingest(ingest_source)
else:
fs.ingest(ingest_source)

Feature set schema API changes

Previously, when loading a schema from a feature set using the Python CLI, you would use the following API call:

schema = feature_set.get_schema()

Now, to achieve the same, you use the following API:

schema = feature_set.schema.get()

From 0.1.3 to 0.2.0

Prior to version 0.2.0, the feature type was determined as part of the statistics computation. Now, in version version 0.2.0, you can specify the feature type using the schema API. The feature type can be specified explicitly or can be left empty (i.e. the backend will automatically discover it).

We have removed the Undefined feature type because each feature now is correctly assigned its feature type after being registered or creating a new version. We have also introduced the Composite feature type; it is used for features containing nested features.

We have stopped the backend from automatically marking specific textual features with the Categorical feature type since the logic behind it was not solid. Now, if you want to mark the feature type as Categorical, please specify that during registration explicitly using the schema API.

For more information, please see the schema_api{.interpreted-text role="ref"}.

From 0.1.1 to 0.1.2

The Custom Resource Definition (CRD) has been changed.

The Python CLI method from_string has been renamed to create_from in the schema.

The Scala CLI argument maskedFeatures from the register feature set call has been removed. Please use the schema API to describe which features should be masked. For example:

schema["my_feature_name"].special_data.pci = True
project.register_feature_set(schema, "feature_set_name")

The feature set type and feature type on the GRPC API has been migrated from strings to enums. This allows for better validation.

A new parameter, jobsCredentialsKey, was added to Helm values. Please make sure to provide this. Supported sizes for this variable are 16, 24, and 32 (in bytes)

core:
...

core:
salt: Yy7c8pzqSJXw6LHbUnhQ1234
jobsCredentialsKey: Yy7c8pzqSJXw6LHbUnhQ1234

From 0.1.0 to 0.1.1

Version change setter for feature removed

The Python CLI setter version_change and the Scala CLI setter versionChange on the feature has been removed. This setter was initially exposed by accident. It is not possible to update the version change directly. It is updated automatically on the backend.

Update metadata method removed on project and feature set

The Python CLI method update_metadata and the Scala CLI setter update_metadata have been removed from both the project and feature set. To update the metadata simply call the setter:

Previously, this was the call to update the feature set description:

feature_set.description = "new_description"
feature_set.update_metadata()

Now, this is the call to achieve the same and to update the metadata on the client and backend:

feature_set.description = "new_description"

GRPC project API changes

We have removed the UpdateProjectMetadata call for the GRPC API and exposed specific API calls for each field which can be modified on the project.

Previously, to update the project description and locked using the Scala API:

project.description = "new_description"
project.locked = true
val request = UpdateProjectMetadataRequest(project = Some(project))
blockingStub.updateProjectMetadata(request)

Now, to achieve the same, you use the following code for each field you need to update:

blockingStub.updateProjectDescription(ProjectStringFieldUpdateRequest(project.id,  "new_description"))
blockingStub.updateProjectLocked(ProjectBooleanFieldUpdateRequest(project.id, true))

Previously, it was possible to accidentally modify fields which were not exposed for modification because the API transferred the full project object, but that is no longer possible with the new API.

GRPC feature set API changes

We have removed the UpdateFeatureSetMetadata call for the GRPC API and exposed specific API calls for each field which can be modified on the project.

Previously, to update a feature set description and feature status using the Scala API, you would use the following API:

featureSet.description = "new_description"
featureSet.features.find(_.name == "feature_name").get.status = "new_status"
val request = UpdateFeatureSetMetadataRequest(featureSet = Some(featureSet))
blockingStub.updateFeatureSetMetadata(request)

Now, to achieve the same, you use the following code for each field you need to update:

val featureSetHeader = FeatureSetHeader(projectId, internalFeatureSetId, internalFeatureSetVersion)
val descriptionUpdateRequest = FeatureSetStringFieldUpdateRequest(Some(featureSetHeader), "new_feature_set_description")
blockingStub.updateFeatureSetDescription(descriptionUpdateRequest)

val statusUpdateRequest = FeatureStringFieldUpdateRequest(Some(featureSetHeader), featureName, "new_status")
blockingStub.updateFeatureStatus(statusUpdateRequest)

Previously, it was possible to accidentally modify fields which were not exposed for modification because the API transferred the full feature set object, but that is no longer possible with the new API.


Feedback