Migration guide
From 0.19.1 to 0.19.2
Helm argument
core.config.databaseName
has been removed without replacement. Database must be included in the PostgreSQL JDBC connection stringHelm argument
core.config.dbConnectionString
has been renamed tocore.database.dsn
. This parameter expects PostgreSQL JDBC connection stringHelm arguments
core.database.username
andcore.database.password
have been removed. Username and password (if applicable) must be passed tocore.database.dsn
. Please inspect PostgreSQL connection string format for more details.The following Helm parameters were removed:
core.config.spark.userNameAttribute
- from now on there is no possibility to select which attribute will be set to label
The following k8 labels on spark jobs are renamed:
job-id
tofeaturestore.h2o.ai/job-id
job-name
tofeaturestore.h2o.ai/job-name
project
tocloud.h2o.ai/workspace
feature-set
tofeaturestore.h2o.ai/feature-set-id
andfeaturestore.h2o.ai/feature-set-version
feature-view
tofeaturestore.h2o.ai/feature-view-id
andfeaturestore.h2o.ai/feature-view-version
mldataset
tofeaturestore.h2o.ai/mldataset-id
user-name
tocloud.h2o.ai/creator
From 0.18.0 to 0.19.0
- Starting from 0.19.0 feature name cannot contain `
- Starting from 0.19.0 feature in partition_by cannot be nested or have complex type (struct, array)
- Starting from 0.19.0 api
GetUserByMail
is deleted - In Helm, extra Spark options in property
sparkoperator.config.spark.extraOptions
should be passed as array elements instead as single value
From 0.16.0 to 0.17.0
- Starting from 0.17.0 methods feature_sets.register, feature_set.flow use enum FeatureSetFlow instead of string
- To enable the
pg_trgm
extension, which is required by the Azure platform, you can follow the steps outlined in the Azure extensions documentation
From 0.15.0 to 0.16.0
Starting from 0.16.0 Azure Gen2 Dependencies jar doesn't contain the transitive dependencies. Please refer to Spark dependencies to see which dependencies must be present on your local Spark cluster to support retrieval of data using Spark frames.
The following Helm parameters were renamed:
global.cache.username
toglobal.storage.username
global.cache.password
toglobal.storage.password
global.config.cacheBackend
toglobal.config.storageBackend
From 0.14.0 to 0.15.0
- Kafka related Helm properties
global.config.messaging.kafka.topicsConfig.[topic-name].retentionMs
,global.config.messaging.kafka.topicsConfig.[topic-name].retentionMinutes
andglobal.config.messaging.kafka.topicsConfig.[topic-name].retentionHours
are replaced by singleglobal.config.messaging.kafka.topicsConfig.[topic-name].retentionPolicy
. Policy is specified by duration format defined in ISO 8601-1 standard. - Added new fields
feature_set_id
andfeature_set_version
and markedfeature_set
as deprecated in proto messageIngestResponse
.feature_set
field will be deleted in next major version 1.0.0. - Added new field
project_id
and markedproject
field as deprecated proto messages:ProjectPermissionRequest
,DeleteProjectRequest
,GetFeatureSetRequest
.project
field will be deleted in next major version 1.0.0. - Added new fields
feature_set_id
and markedfeature_set
as deprecated in proto messages:ListJobsRequest
,GetRecommendationRequest
,FeatureSetPermissionRequest
,ListFeatureSetsVersionRequest
,DeleteFeatureSetRequest
.feature_set
field will be deleted in next major version 1.0.0. - Added new fields
feature_set_id
andfeature_set_version
and markedfeature_set
as deprecated in proto messages:GetIngestHistoryRequest
,StartRevertIngestJobRequest
,StartIngestJobRequest
,RetrieveRequest
,StartMaterializationOnlineRequest
,CreateNewFeatureSetVersionRequest
.feature_set
field will be deleted in release 1.0.0 - GRPC method
ListTokens
has been deprecated and replaced byListPersonalAccessTokens
which uses pagination. The former method will be removed in release 1.0.0 - In Scala and Python client,
client.auth.pats.list()
now returns iterator instead of list.
From 0.13.0 to 0.14.0
- Deprecated behaviour starting preview job has been removed.
- Bearer token prefix is now required in Authorization header.
- All deprecated arguments in release 0.12.0 are now removed.
- All deprecated updated API methods are removed.
- Deprecated owner field has been removed.
- On GRPC level, FeatureSetHeader has been replaced by
featureSetId
andfeatureSetVersion
fields in the following messages:OnlineRetrieveRequest
,OnlineIngestRequest
andGetFeatureSetsLastMinorForCurrentMajorRequest
. - Helm properties
disable-api.deletion
anddisable-api.role-assigment
have been removed. New Helm propertyprohibited.cli.methods
has been introduced. This property allows the admin to specify list of methods to be disabled from CLI, such as:ai.h2o.featurestore.api.v1.CoreService/DeleteFeatureSet,ai.h2o.featurestore.api.v1.CoreService/DeleteProject
. - GRPC method
listFeature
has been renamed tolistFeatures
. - Event method
GetFeatureSetLastMinor
has been renamed toGetFeatureSet
.
From 0.12.0 to 0.12.1
In Scala CLI, the arguments tags
,
filterBuilder
and jsonQuery
in
featureSets.list
and argument filterBuilder
in projects.listFeatureSets
method are deprecated and will
be removed in 0.14.0. If you need to filter the listed feature sets,
please use Scala filtering capabilities on received
FeatureSet
iterator.
In Python CLI, the arguments tags
, filters
in feature_sets.list
and argument filters
in
projects.list_feature_sets
method are deprecated and will
be removed in 0.14.0. If you need to filter the listed feature sets,
please use Python filtering capabilities on received
FeatureSet
iterator (such as list comprehensions).
On GRPC API, the argument query
in
ListFeatureSetsPageRequest
is deprecated and will be
removed in 0.14.0. If you need to filter the feature sets, please filter
those on the received end of your application.
From 0.11.0 to 0.12.0
From version 0.12.0, it is recommended to add prefix "Bearer" to Authorization header. Handling Authorization header without that prefix will be removed in 0.14.0
Java GRPC API methods were previously generated into a single class. With version 0.12.0 the API is split into multiple classes. If you are using Java GRPC api, you will need to update the imports on your application.
From 0.10.0 to 0.11.0
Deprecated GRPC methods:
ListFeatureSets
andListFeatureSetsAcrossProjects
have been removed. Please useListFeatureSetsPage
instead.ListProjects
has been removed. Please useListProjectsPage
instead.UpdateFeatureSetPrimaryKey
will be removed in 0.14.0 without replacement. Changing the primary key is now only possible during the creation of a new feature set version.The following will be removed in 0.14.0, so please use
UpdateProject
instead:UpdateProjectCustomData
UpdateProjectDescription
UpdateProjectSecret
UpdateProjectLocked
The following will be removed in 0.14.0, so please use
UpdateFeatureSet
instead:UpdateFeatureSetTags
UpdateFeatureSetDataSourceDomains
UpdateFeatureSetDescription
UpdateFeatureSetType
UpdateFeatureSetApplicationName
UpdateFeatureSetApplicationId
UpdateFeatureSetDeprecated
UpdateFeatureSetProcessInterval
UpdateFeatureSetProcessIntervalUnit
UpdateFeatureSetFlow
UpdateFeatureSetState
UpdateFeatureSetSecret
UpdateFeatureSetCustomData
UpdateTimeToLiveOfflineInterval
UpdateTimeToLiveOfflineIntervalUnit
UpdateTimeToLiveOnlineInterval
UpdateTimeToLiveOnlineIntervalUnit
UpdateFeatureSetOnlineNamespace
UpdateFeatureSetOnlineTopic
UpdateFeatureSetOnlineConnectionType
UpdateFeatureSetLegalApproved
UpdateFeatureSetLegalApprovedNotes
The following will be removed in 0.14.0, so please use
UpdateFeature
instead:UpdateFeatureStatus
UpdateFeatureType
UpdateFeatureImportance
UpdateFeatureDescription
UpdateFeatureSpecial
UpdateFeatureAnomalyDetection
UpdateFeatureCustomData
UpdateFeatureClassifiers
UpdateProjectOwner
will be removed in 0.14.0. Please useAddProjectPermission
orRemoveProjectPermission
instead.UpdateFeatureSetOwner
will be removed in 0.14.0. Please useAddFeatureSetPermission
orRemoveFeatureSetPermission
instead.
In both the Scala and Python CLI, the setter for the primary key is deprecated and will be removed in 0.14.0. Changing the primary key is now only possible using a new argument exposed on the create new version API call.
Deprecated classifierName
field has been removed from
CreateRecommendationClassifierRequest
GRPC API.
Deprecated preview
on the retrieve holder has been removed. Please use
fs.get_preview()
instead.
Deprecated secondary_key
field has been removed from feature set. All
secondary keys are pushed into primary_key
field.
Deprecated owner
field will be removed from project and feature set in
0.14.0 on API and also on proto entities. Please use owners
instead.
Starting with release 0.11.0, the retrieve
method starts respecting
minor versions of feature sets. That means that running retrieve
on
version 1.3
retrieves the data up to version
1.3
. This ensures proper consistency for external tools
depending on a specific feature set version. The data are also immutable
in case of reverts. Meaning that previously, when you reverted an
ingest, the data retrieved for that feature set were different after
that retrieve operation.
The consistent retrieval works as explained above for all ingestions and reverts called starting with version 0.11.0. Retrieving feature set prior version 0.10.0 leads to the original behaviour.
From 0.9.0 to 0.10.0
The collection of historical policies has been removed and migrated to a
new permissions
collection. This collection contains information about
previous permission updates. If a permission has been replaced by a new
higher permission, its state is PROMOTED
. If the permission has been
removed, its state is REVOKED
.
ref.preview()
has been deprecated and replaced with the new API
command fs.get_preview()
. This preview is computed and stored during
the first ingestion. Until the upcoming Feature Store release of
0.14.0, the get_preview()
method will compute the missing preview
and store it on the backend. We highly recommend that you run this
method on prior existing feature sets before 0.14.0 to make sure
that the preview is populated.
From 0.8.0 to 0.9.0
The optional arguments start_date_time
and end_date_time
for the
Python CLI have been removed from the ingest
/ ingest_async
methods
as they are no longer needed.
The optional arguments startDateTime
and endDateTime
for the Scala
CLI have been removed from the ingest
/ ingestAsync
methods as they
are no longer needed.
From 0.6.0 to 0.8.0
GRPC method listProjects
is now deprecated. Please switch to the
listProjectsPage
API which uses pagination. While we don\'t plan to
remove the original methods to preserve backwards compatibility, we
strongly suggest using the paginated variant.
GRPC methods ListFeatureSets
and ListFeatureSetsAcrossProjects
are
now deprecated. Please switch to the ListFeatureSetsPage
API which
uses pagination and replaces both of the former methods. While we don\'t
plan to remove the original methods to preserve backwards compatibility,
we strongly suggest using the paginated variant.
The list methods in Python and Scala for projects and feature sets now return iterators instead of full collections starting from version 0.7.0.
partitionPattern
is now deprecated and has been removed on folder data
sources.
From 0.5.0 to 0.6.0
All Proto and GRPC classed have been moved from package
ai.h2o.featurestore.core
to package ai.h2o.featurestore.api.v1
.
Please update your code using our GRPC API by updating your imports.
The GenerateToken RPC call now accepts a Proto timestamp as an expiration date instead of string representation.
From 0.4.0 to 0.5.0
The environment variables required to pass AWS credentials have been changed to a more generic name to support AWS S3 and S3 compatible sources like Minio, Google Cloud, etc.
Previously, you needed to set the following environment variables to read data from AWS:
export AWS_ACCESS_KEY=my aws key
export AWS_SECRET_KEY=my secret
export AWS_REGION=my region
export AWS_ROLE_ARN=my role
Now, to achieve the same, you set the following variables:
export S3_ACCESS_KEY=my aws key
export S3_SECRET_KEY=my secret
export S3_REGION=my region
export S3_ROLE_ARN=my role
We have also renamed the AWS credentials pass on the clients from
AWSCredentials
to S3Credentials
.
Derived feature sets
In 0.5.0, we introduced derived feature sets. Derived data sources (e.g., SparkPipeline, DriverlessAIMOJO, JoinFeatureSets) are now deprecated and will be removed 0.6.0. As such, if you want to ingest new data to your feature sets that are using those derived data sources, you must migrate to derived feature sets instead.
To migrate to a derived feature set, a new version needs to be created using that derived schema with the selected transformation. Once this new version is created, ingestion is automatically triggered. This action will write all data from the parent feature set(s) with the applied transformation. The following example shows how to do this using the Python client:
import featurestore.transformations as t
spark_pipeline_transformation = t.SparkPipeline("...")
spark_pipeline_schema = client.extract_derived_schema([parent_feature_set], spark_pipeline_transformation)
derived_feature_set = feature_set_to_be_derived.create_new_version(schema=spark_pipeline_schema)
To allow automatic ingestion on the derived feature set that uses
DriverlessAIMOJO, the new parameter
sparkoperator.driverlessAiLicenseKey
needs to be added to the Helm
values. It should contain your license to Driverless AI (which is kept
in k8 secrets).
From 0.2.0 to 0.3.0
Prior to version 0.3.0, the partition pattern accepted date{}
syntax
in the folder's data sources. This has been removed as it is now
obsolete due to several optimizations of the internal code.
Please update all your existing partition patterns and update the
date{..}
by .*
.
Feature set ingest API changes
- Python
- Scala
- GRPC
Previously, when ingesting data from data sources that periodically change using the Python CLI, you would use the following API call:
fs.ingest(ingest_source, new_version_on_schema_change=True)
Now, to achieve the same, you use the following API:
new_schema = client.extract_from_source(ingest_source)
if not fs.schema.is_compatible_with(new_schema, compare_data_types=False):
patched_schema = fs.schema.patch_from(new_schema, compare_data_types=False)
new_feature_set = fs.create_new_version(schema=patched_schema, reason="schema changed before ingest")
new_feature_set.ingest(ingest_source)
else:
fs.ingest(ingest_source)
Previously, when ingesting data from data sources that periodically change using the Scala CLI, you would use the following API call:
fs.ingest(ingestSource, newVersionOnSchemaChange=true)
Now, to achieve the same, you use the following API:
val newSchema = client.extractSchemaFromSource(ingestSource)
if (!fs.schema().isCompatibleWith(newSchema, compareDataTypes=false) {
val patchedSchema = fs.schema().patchFrom(newSchema, compareDataTypes=false)
val newFeatureSet = fs.createNewVersion(schema=patchedSchema, reason="schema changed before ingest")
newFeatureSet.ingest(ingestSource)
} else {
fs.ingest(ingestSource)
}
Previously, when ingesting data from data sources that periodically change using the GRPC API, you would use the following API call:
val startIngestJobRequest = StartIngestJobRequest(featureSet = Some(featureSet), newVersionOnSchemaChange=true)
blockingStub.startIngestJob(startIngestJobRequest)
Now, to achieve the same, you use the following API:
val request = FeatureSetSchemaCompatibilityRequest(featureSet = Some(featureSet), newSchema = newSchema, compareDataTypes = false)
val response = blockingStub.isFeatureSetSchemaCompatible(request)
if (!response.isCompatible) {
val schemaPatchRequest = FeatureSetSchemaPatchRequest(featureSet = Some(featureSet), newSchema = newSchema, compareDataTypes = false)
val schemaPatchResponse = blockingStub.patchFeatureSetSchema(schemaPatchRequest)
val patchedSchema = schemaPatchResponse.schema
val createNewVersionRequest = CreateNewFeatureSetVersionRequest(featureSet = Some(featureSet), schema = patchedSchema, reason = "")
val createNewVersionResponse = blockingStub.createNewFeatureSetVersion(createNewVersionRequest)
val newFeatureSet = createNewVersionResponse.getFeatureSet
val startIngestJobRequest = StartIngestJobRequest(featureSet = Some(newFeatureSet), ...)
blockingStub.startIngestJob(startIngestJobRequest)
}
Feature set schema API changes
- Python
- Scala
Previously, when loading a schema from a feature set using the Python CLI, you would use the following API call:
schema = feature_set.get_schema()
Now, to achieve the same, you use the following API:
schema = feature_set.schema.get()
Previously, when loading a schema from a feature set using the Scala CLI, you would use the following API call:
val schema = feature_set.getSchema()
Now, to achieve the same, you use the following API:
val schema = feature_set.schema().get()
From 0.1.3 to 0.2.0
Prior to version 0.2.0, the feature type was determined as part of the statistics computation. Now, in version version 0.2.0, you can specify the feature type using the schema API. The feature type can be specified explicitly or can be left empty (i.e. the backend will automatically discover it).
We have removed the Undefined
feature type because each feature now is
correctly assigned its feature type after being registered or creating a
new version. We have also introduced the Composite
feature type; it is
used for features containing nested features.
We have stopped the backend from automatically marking specific textual
features with the Categorical
feature type since the logic behind it
was not solid. Now, if you want to mark the feature type as
Categorical
, please specify that during registration explicitly using
the schema API.
For more information, please see the schema_api
{.interpreted-text
role="ref"}.
From 0.1.1 to 0.1.2
The Custom Resource Definition (CRD) has been changed.
The Python CLI method from_string
has been renamed to create_from
in
the schema.
The Scala CLI argument maskedFeatures
from the register feature set
call has been removed. Please use the schema API to describe which
features should be masked. For example:
- Python
- Scala
schema["my_feature_name"].special_data.pci = True
project.register_feature_set(schema, "feature_set_name")
schema("my_feature_name").specialSata.pci = true
project.registerFeatureSet(schema, "feature_set_name")
The feature set type and feature type on the GRPC API has been migrated from strings to enums. This allows for better validation.
A new parameter, jobsCredentialsKey
, was added to Helm values. Please
make sure to provide this. Supported sizes for this variable are 16, 24,
and 32 (in bytes)
core:
...
core:
salt: Yy7c8pzqSJXw6LHbUnhQ1234
jobsCredentialsKey: Yy7c8pzqSJXw6LHbUnhQ1234
From 0.1.0 to 0.1.1
Version change setter for feature removed
The Python CLI setter version_change
and the Scala CLI setter
versionChange
on the feature has been removed. This setter was
initially exposed by accident. It is not possible to update the version
change directly. It is updated automatically on the backend.
Update metadata method removed on project and feature set
The Python CLI method update_metadata
and the Scala CLI setter
update_metadata
have been removed from both the project and feature
set. To update the metadata simply call the setter:
Previously, this was the call to update the feature set description:
- Python
- Scala
feature_set.description = "new_description"
feature_set.update_metadata()
featureSet.description = "new_description"
featureSet.updateMetadata()
Now, this is the call to achieve the same and to update the metadata on the client and backend:
- Python
- Scala
feature_set.description = "new_description"
featureSet.description = "new_description"
GRPC project API changes
We have removed the UpdateProjectMetadata
call for the GRPC API and
exposed specific API calls for each field which can be modified on the
project.
Previously, to update the project description and locked using the Scala API:
project.description = "new_description"
project.locked = true
val request = UpdateProjectMetadataRequest(project = Some(project))
blockingStub.updateProjectMetadata(request)
Now, to achieve the same, you use the following code for each field you need to update:
blockingStub.updateProjectDescription(ProjectStringFieldUpdateRequest(project.id, "new_description"))
blockingStub.updateProjectLocked(ProjectBooleanFieldUpdateRequest(project.id, true))
Previously, it was possible to accidentally modify fields which were not exposed for modification because the API transferred the full project object, but that is no longer possible with the new API.
GRPC feature set API changes
We have removed the UpdateFeatureSetMetadata
call for the GRPC API and
exposed specific API calls for each field which can be modified on the
project.
Previously, to update a feature set description and feature status using the Scala API, you would use the following API:
featureSet.description = "new_description"
featureSet.features.find(_.name == "feature_name").get.status = "new_status"
val request = UpdateFeatureSetMetadataRequest(featureSet = Some(featureSet))
blockingStub.updateFeatureSetMetadata(request)
Now, to achieve the same, you use the following code for each field you need to update:
val featureSetHeader = FeatureSetHeader(projectId, internalFeatureSetId, internalFeatureSetVersion)
val descriptionUpdateRequest = FeatureSetStringFieldUpdateRequest(Some(featureSetHeader), "new_feature_set_description")
blockingStub.updateFeatureSetDescription(descriptionUpdateRequest)
val statusUpdateRequest = FeatureStringFieldUpdateRequest(Some(featureSetHeader), featureName, "new_status")
blockingStub.updateFeatureStatus(statusUpdateRequest)
Previously, it was possible to accidentally modify fields which were not exposed for modification because the API transferred the full feature set object, but that is no longer possible with the new API.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai