Migration guide
From 0.19.1 to 0.19.2
Helm argument
core.config.databaseNamehas been removed without replacement. Database must be included in the PostgreSQL JDBC connection stringHelm argument
core.config.dbConnectionStringhas been renamed tocore.database.dsn. This parameter expects PostgreSQL JDBC connection stringHelm arguments
core.database.usernameandcore.database.passwordhave been removed. Username and password (if applicable) must be passed tocore.database.dsn. Please inspect PostgreSQL connection string format for more details.The following Helm parameters were removed:
core.config.spark.userNameAttribute- from now on there is no possibility to select which attribute will be set to label
The following k8 labels on spark jobs are renamed:
job-idtofeaturestore.h2o.ai/job-idjob-nametofeaturestore.h2o.ai/job-nameprojecttocloud.h2o.ai/workspacefeature-settofeaturestore.h2o.ai/feature-set-idandfeaturestore.h2o.ai/feature-set-versionfeature-viewtofeaturestore.h2o.ai/feature-view-idandfeaturestore.h2o.ai/feature-view-versionmldatasettofeaturestore.h2o.ai/mldataset-iduser-nametocloud.h2o.ai/creator
From 0.18.0 to 0.19.0
- Starting from 0.19.0 feature name cannot contain `
- Starting from 0.19.0 feature in partition_by cannot be nested or have complex type (struct, array)
- Starting from 0.19.0 api
GetUserByMailis deleted - In Helm, extra Spark options in property
sparkoperator.config.spark.extraOptionsshould be passed as array elements instead as single value
From 0.16.0 to 0.17.0
- Starting from 0.17.0 methods feature_sets.register, feature_set.flow use enum FeatureSetFlow instead of string
- To enable the
pg_trgmextension, which is required by the Azure platform, you can follow the steps outlined in the Azure extensions documentation
From 0.15.0 to 0.16.0
Starting from 0.16.0 Azure Gen2 Dependencies jar doesn't contain the transitive dependencies. Please refer to Spark dependencies to see which dependencies must be present on your local Spark cluster to support retrieval of data using Spark frames.
The following Helm parameters were renamed:
global.cache.usernametoglobal.storage.usernameglobal.cache.passwordtoglobal.storage.passwordglobal.config.cacheBackendtoglobal.config.storageBackend
From 0.14.0 to 0.15.0
- Kafka related Helm properties
global.config.messaging.kafka.topicsConfig.[topic-name].retentionMs,global.config.messaging.kafka.topicsConfig.[topic-name].retentionMinutesandglobal.config.messaging.kafka.topicsConfig.[topic-name].retentionHoursare replaced by singleglobal.config.messaging.kafka.topicsConfig.[topic-name].retentionPolicy. Policy is specified by duration format defined in ISO 8601-1 standard. - Added new fields
feature_set_idandfeature_set_versionand markedfeature_setas deprecated in proto messageIngestResponse.feature_setfield will be deleted in next major version 1.0.0. - Added new field
project_idand markedprojectfield as deprecated proto messages:ProjectPermissionRequest,DeleteProjectRequest,GetFeatureSetRequest.projectfield will be deleted in next major version 1.0.0. - Added new fields
feature_set_idand markedfeature_setas deprecated in proto messages:ListJobsRequest,GetRecommendationRequest,FeatureSetPermissionRequest,ListFeatureSetsVersionRequest,DeleteFeatureSetRequest.feature_setfield will be deleted in next major version 1.0.0. - Added new fields
feature_set_idandfeature_set_versionand markedfeature_setas deprecated in proto messages:GetIngestHistoryRequest,StartRevertIngestJobRequest,StartIngestJobRequest,RetrieveRequest,StartMaterializationOnlineRequest,CreateNewFeatureSetVersionRequest.feature_setfield will be deleted in release 1.0.0 - GRPC method
ListTokenshas been deprecated and replaced byListPersonalAccessTokenswhich uses pagination. The former method will be removed in release 1.0.0 - In Scala and Python client,
client.auth.pats.list()now returns iterator instead of list.
From 0.13.0 to 0.14.0
- Deprecated behaviour starting preview job has been removed.
- Bearer token prefix is now required in Authorization header.
- All deprecated arguments in release 0.12.0 are now removed.
- All deprecated updated API methods are removed.
- Deprecated owner field has been removed.
- On GRPC level, FeatureSetHeader has been replaced by
featureSetIdandfeatureSetVersionfields in the following messages:OnlineRetrieveRequest,OnlineIngestRequestandGetFeatureSetsLastMinorForCurrentMajorRequest. - Helm properties
disable-api.deletionanddisable-api.role-assigmenthave been removed. New Helm propertyprohibited.cli.methodshas been introduced. This property allows the admin to specify list of methods to be disabled from CLI, such as:ai.h2o.featurestore.api.v1.CoreService/DeleteFeatureSet,ai.h2o.featurestore.api.v1.CoreService/DeleteProject. - GRPC method
listFeaturehas been renamed tolistFeatures. - Event method
GetFeatureSetLastMinorhas been renamed toGetFeatureSet.
From 0.12.0 to 0.12.1
In Scala CLI, the arguments tags,
filterBuilder and jsonQuery in
featureSets.list and argument filterBuilder
in projects.listFeatureSets method are deprecated and will
be removed in 0.14.0. If you need to filter the listed feature sets,
please use Scala filtering capabilities on received
FeatureSet iterator.
In Python CLI, the arguments tags, filters
in feature_sets.list and argument filters in
projects.list_feature_sets method are deprecated and will
be removed in 0.14.0. If you need to filter the listed feature sets,
please use Python filtering capabilities on received
FeatureSet iterator (such as list comprehensions).
On GRPC API, the argument query in
ListFeatureSetsPageRequest is deprecated and will be
removed in 0.14.0. If you need to filter the feature sets, please filter
those on the received end of your application.
From 0.11.0 to 0.12.0
From version 0.12.0, it is recommended to add prefix "Bearer" to Authorization header. Handling Authorization header without that prefix will be removed in 0.14.0
Java GRPC API methods were previously generated into a single class. With version 0.12.0 the API is split into multiple classes. If you are using Java GRPC api, you will need to update the imports on your application.
From 0.10.0 to 0.11.0
Deprecated GRPC methods:
ListFeatureSetsandListFeatureSetsAcrossProjectshave been removed. Please useListFeatureSetsPageinstead.ListProjectshas been removed. Please useListProjectsPageinstead.UpdateFeatureSetPrimaryKeywill be removed in 0.14.0 without replacement. Changing the primary key is now only possible during the creation of a new feature set version.The following will be removed in 0.14.0, so please use
UpdateProjectinstead:UpdateProjectCustomDataUpdateProjectDescriptionUpdateProjectSecretUpdateProjectLocked
The following will be removed in 0.14.0, so please use
UpdateFeatureSetinstead:UpdateFeatureSetTagsUpdateFeatureSetDataSourceDomainsUpdateFeatureSetDescriptionUpdateFeatureSetTypeUpdateFeatureSetApplicationNameUpdateFeatureSetApplicationIdUpdateFeatureSetDeprecatedUpdateFeatureSetProcessIntervalUpdateFeatureSetProcessIntervalUnitUpdateFeatureSetFlowUpdateFeatureSetStateUpdateFeatureSetSecretUpdateFeatureSetCustomDataUpdateTimeToLiveOfflineIntervalUpdateTimeToLiveOfflineIntervalUnitUpdateTimeToLiveOnlineIntervalUpdateTimeToLiveOnlineIntervalUnitUpdateFeatureSetOnlineNamespaceUpdateFeatureSetOnlineTopicUpdateFeatureSetOnlineConnectionTypeUpdateFeatureSetLegalApprovedUpdateFeatureSetLegalApprovedNotes
The following will be removed in 0.14.0, so please use
UpdateFeatureinstead:UpdateFeatureStatusUpdateFeatureTypeUpdateFeatureImportanceUpdateFeatureDescriptionUpdateFeatureSpecialUpdateFeatureAnomalyDetectionUpdateFeatureCustomDataUpdateFeatureClassifiers
UpdateProjectOwnerwill be removed in 0.14.0. Please useAddProjectPermissionorRemoveProjectPermissioninstead.UpdateFeatureSetOwnerwill be removed in 0.14.0. Please useAddFeatureSetPermissionorRemoveFeatureSetPermissioninstead.
In both the Scala and Python CLI, the setter for the primary key is deprecated and will be removed in 0.14.0. Changing the primary key is now only possible using a new argument exposed on the create new version API call.
Deprecated classifierName field has been removed from
CreateRecommendationClassifierRequest GRPC API.
Deprecated preview on the retrieve holder has been removed. Please use
fs.get_preview() instead.
Deprecated secondary_key field has been removed from feature set. All
secondary keys are pushed into primary_key field.
Deprecated owner field will be removed from project and feature set in
0.14.0 on API and also on proto entities. Please use owners instead.
Starting with release 0.11.0, the retrieve method starts respecting
minor versions of feature sets. That means that running retrieve on
version 1.3 retrieves the data up to version
1.3. This ensures proper consistency for external tools
depending on a specific feature set version. The data are also immutable
in case of reverts. Meaning that previously, when you reverted an
ingest, the data retrieved for that feature set were different after
that retrieve operation.
The consistent retrieval works as explained above for all ingestions and reverts called starting with version 0.11.0. Retrieving feature set prior version 0.10.0 leads to the original behaviour.
From 0.9.0 to 0.10.0
The collection of historical policies has been removed and migrated to a
new permissions collection. This collection contains information about
previous permission updates. If a permission has been replaced by a new
higher permission, its state is PROMOTED. If the permission has been
removed, its state is REVOKED.
ref.preview() has been deprecated and replaced with the new API
command fs.get_preview(). This preview is computed and stored during
the first ingestion. Until the upcoming Feature Store release of
0.14.0, the get_preview() method will compute the missing preview
and store it on the backend. We highly recommend that you run this
method on prior existing feature sets before 0.14.0 to make sure
that the preview is populated.
From 0.8.0 to 0.9.0
The optional arguments start_date_time and end_date_time for the
Python CLI have been removed from the ingest / ingest_async methods
as they are no longer needed.
The optional arguments startDateTime and endDateTime for the Scala
CLI have been removed from the ingest / ingestAsync methods as they
are no longer needed.
From 0.6.0 to 0.8.0
GRPC method listProjects is now deprecated. Please switch to the
listProjectsPage API which uses pagination. While we don\'t plan to
remove the original methods to preserve backwards compatibility, we
strongly suggest using the paginated variant.
GRPC methods ListFeatureSets and ListFeatureSetsAcrossProjects are
now deprecated. Please switch to the ListFeatureSetsPage API which
uses pagination and replaces both of the former methods. While we don\'t
plan to remove the original methods to preserve backwards compatibility,
we strongly suggest using the paginated variant.
The list methods in Python and Scala for projects and feature sets now return iterators instead of full collections starting from version 0.7.0.
partitionPattern is now deprecated and has been removed on folder data
sources.
From 0.5.0 to 0.6.0
All Proto and GRPC classed have been moved from package
ai.h2o.featurestore.core to package ai.h2o.featurestore.api.v1.
Please update your code using our GRPC API by updating your imports.
The GenerateToken RPC call now accepts a Proto timestamp as an expiration date instead of string representation.
From 0.4.0 to 0.5.0
The environment variables required to pass AWS credentials have been changed to a more generic name to support AWS S3 and S3 compatible sources like Minio, Google Cloud, etc.
Previously, you needed to set the following environment variables to read data from AWS:
export AWS_ACCESS_KEY=my aws key
export AWS_SECRET_KEY=my secret
export AWS_REGION=my region
export AWS_ROLE_ARN=my role
Now, to achieve the same, you set the following variables:
export S3_ACCESS_KEY=my aws key
export S3_SECRET_KEY=my secret
export S3_REGION=my region
export S3_ROLE_ARN=my role
We have also renamed the AWS credentials pass on the clients from
AWSCredentials to S3Credentials.
Derived feature sets
In 0.5.0, we introduced derived feature sets. Derived data sources (e.g., SparkPipeline, DriverlessAIMOJO, JoinFeatureSets) are now deprecated and will be removed 0.6.0. As such, if you want to ingest new data to your feature sets that are using those derived data sources, you must migrate to derived feature sets instead.
To migrate to a derived feature set, a new version needs to be created using that derived schema with the selected transformation. Once this new version is created, ingestion is automatically triggered. This action will write all data from the parent feature set(s) with the applied transformation. The following example shows how to do this using the Python client:
import featurestore.transformations as t
spark_pipeline_transformation = t.SparkPipeline("...")
spark_pipeline_schema = client.extract_derived_schema([parent_feature_set], spark_pipeline_transformation)
derived_feature_set = feature_set_to_be_derived.create_new_version(schema=spark_pipeline_schema)
To allow automatic ingestion on the derived feature set that uses
DriverlessAIMOJO, the new parameter
sparkoperator.driverlessAiLicenseKey needs to be added to the Helm
values. It should contain your license to Driverless AI (which is kept
in k8 secrets).
From 0.2.0 to 0.3.0
Prior to version 0.3.0, the partition pattern accepted date{} syntax
in the folder's data sources. This has been removed as it is now
obsolete due to several optimizations of the internal code.
Please update all your existing partition patterns and update the
date{..} by .*.
Feature set ingest API changes
- Python
- Scala
- GRPC
Previously, when ingesting data from data sources that periodically change using the Python CLI, you would use the following API call:
fs.ingest(ingest_source, new_version_on_schema_change=True)
Now, to achieve the same, you use the following API:
new_schema = client.extract_from_source(ingest_source)
if not fs.schema.is_compatible_with(new_schema, compare_data_types=False):
patched_schema = fs.schema.patch_from(new_schema, compare_data_types=False)
new_feature_set = fs.create_new_version(schema=patched_schema, reason="schema changed before ingest")
new_feature_set.ingest(ingest_source)
else:
fs.ingest(ingest_source)
Previously, when ingesting data from data sources that periodically change using the Scala CLI, you would use the following API call:
fs.ingest(ingestSource, newVersionOnSchemaChange=true)
Now, to achieve the same, you use the following API:
val newSchema = client.extractSchemaFromSource(ingestSource)
if (!fs.schema().isCompatibleWith(newSchema, compareDataTypes=false) {
val patchedSchema = fs.schema().patchFrom(newSchema, compareDataTypes=false)
val newFeatureSet = fs.createNewVersion(schema=patchedSchema, reason="schema changed before ingest")
newFeatureSet.ingest(ingestSource)
} else {
fs.ingest(ingestSource)
}
Previously, when ingesting data from data sources that periodically change using the GRPC API, you would use the following API call:
val startIngestJobRequest = StartIngestJobRequest(featureSet = Some(featureSet), newVersionOnSchemaChange=true)
blockingStub.startIngestJob(startIngestJobRequest)
Now, to achieve the same, you use the following API:
val request = FeatureSetSchemaCompatibilityRequest(featureSet = Some(featureSet), newSchema = newSchema, compareDataTypes = false)
val response = blockingStub.isFeatureSetSchemaCompatible(request)
if (!response.isCompatible) {
val schemaPatchRequest = FeatureSetSchemaPatchRequest(featureSet = Some(featureSet), newSchema = newSchema, compareDataTypes = false)
val schemaPatchResponse = blockingStub.patchFeatureSetSchema(schemaPatchRequest)
val patchedSchema = schemaPatchResponse.schema
val createNewVersionRequest = CreateNewFeatureSetVersionRequest(featureSet = Some(featureSet), schema = patchedSchema, reason = "")
val createNewVersionResponse = blockingStub.createNewFeatureSetVersion(createNewVersionRequest)
val newFeatureSet = createNewVersionResponse.getFeatureSet
val startIngestJobRequest = StartIngestJobRequest(featureSet = Some(newFeatureSet), ...)
blockingStub.startIngestJob(startIngestJobRequest)
}
Feature set schema API changes
- Python
- Scala
Previously, when loading a schema from a feature set using the Python CLI, you would use the following API call:
schema = feature_set.get_schema()
Now, to achieve the same, you use the following API:
schema = feature_set.schema.get()
Previously, when loading a schema from a feature set using the Scala CLI, you would use the following API call:
val schema = feature_set.getSchema()
Now, to achieve the same, you use the following API:
val schema = feature_set.schema().get()
From 0.1.3 to 0.2.0
Prior to version 0.2.0, the feature type was determined as part of the statistics computation. Now, in version version 0.2.0, you can specify the feature type using the schema API. The feature type can be specified explicitly or can be left empty (i.e. the backend will automatically discover it).
We have removed the Undefined feature type because each feature now is
correctly assigned its feature type after being registered or creating a
new version. We have also introduced the Composite feature type; it is
used for features containing nested features.
We have stopped the backend from automatically marking specific textual
features with the Categorical feature type since the logic behind it
was not solid. Now, if you want to mark the feature type as
Categorical, please specify that during registration explicitly using
the schema API.
For more information, please see the schema_api{.interpreted-text
role="ref"}.
From 0.1.1 to 0.1.2
The Custom Resource Definition (CRD) has been changed.
The Python CLI method from_string has been renamed to create_from in
the schema.
The Scala CLI argument maskedFeatures from the register feature set
call has been removed. Please use the schema API to describe which
features should be masked. For example:
- Python
- Scala
schema["my_feature_name"].special_data.pci = True
project.register_feature_set(schema, "feature_set_name")
schema("my_feature_name").specialSata.pci = true
project.registerFeatureSet(schema, "feature_set_name")
The feature set type and feature type on the GRPC API has been migrated from strings to enums. This allows for better validation.
A new parameter, jobsCredentialsKey, was added to Helm values. Please
make sure to provide this. Supported sizes for this variable are 16, 24,
and 32 (in bytes)
core:
...
core:
salt: Yy7c8pzqSJXw6LHbUnhQ1234
jobsCredentialsKey: Yy7c8pzqSJXw6LHbUnhQ1234
From 0.1.0 to 0.1.1
Version change setter for feature removed
The Python CLI setter version_change and the Scala CLI setter
versionChange on the feature has been removed. This setter was
initially exposed by accident. It is not possible to update the version
change directly. It is updated automatically on the backend.
Update metadata method removed on project and feature set
The Python CLI method update_metadata and the Scala CLI setter
update_metadata have been removed from both the project and feature
set. To update the metadata simply call the setter:
Previously, this was the call to update the feature set description:
- Python
- Scala
feature_set.description = "new_description"
feature_set.update_metadata()
featureSet.description = "new_description"
featureSet.updateMetadata()
Now, this is the call to achieve the same and to update the metadata on the client and backend:
- Python
- Scala
feature_set.description = "new_description"
featureSet.description = "new_description"
GRPC project API changes
We have removed the UpdateProjectMetadata call for the GRPC API and
exposed specific API calls for each field which can be modified on the
project.
Previously, to update the project description and locked using the Scala API:
project.description = "new_description"
project.locked = true
val request = UpdateProjectMetadataRequest(project = Some(project))
blockingStub.updateProjectMetadata(request)
Now, to achieve the same, you use the following code for each field you need to update:
blockingStub.updateProjectDescription(ProjectStringFieldUpdateRequest(project.id, "new_description"))
blockingStub.updateProjectLocked(ProjectBooleanFieldUpdateRequest(project.id, true))
Previously, it was possible to accidentally modify fields which were not exposed for modification because the API transferred the full project object, but that is no longer possible with the new API.
GRPC feature set API changes
We have removed the UpdateFeatureSetMetadata call for the GRPC API and
exposed specific API calls for each field which can be modified on the
project.
Previously, to update a feature set description and feature status using the Scala API, you would use the following API:
featureSet.description = "new_description"
featureSet.features.find(_.name == "feature_name").get.status = "new_status"
val request = UpdateFeatureSetMetadataRequest(featureSet = Some(featureSet))
blockingStub.updateFeatureSetMetadata(request)
Now, to achieve the same, you use the following code for each field you need to update:
val featureSetHeader = FeatureSetHeader(projectId, internalFeatureSetId, internalFeatureSetVersion)
val descriptionUpdateRequest = FeatureSetStringFieldUpdateRequest(Some(featureSetHeader), "new_feature_set_description")
blockingStub.updateFeatureSetDescription(descriptionUpdateRequest)
val statusUpdateRequest = FeatureStringFieldUpdateRequest(Some(featureSetHeader), featureName, "new_status")
blockingStub.updateFeatureStatus(statusUpdateRequest)
Previously, it was possible to accidentally modify fields which were not exposed for modification because the API transferred the full feature set object, but that is no longer possible with the new API.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai