Version: 1.2.0

Retrieve API

To retrieve the data, first run:

Python
Scala

ref = fs.retrieve(start_date_time=None, end_date_time=None)

val ref = fs.retrieve(startDateTime="", endDateTime="")

Parameters explanation:

Python
Scala

If start_date_time and end_date_time are empty, all ingested data are fetched. Otherwise, these parameters are used to retrieve only a specific range of ingested data. For example, when ingested data are in a time range between T1 <= T2, start_date_time can have any value T3 and end_date_time can have any value T4, where T1 <= T3 <= T4 <= T2.

If startDateTime and endDateTime are empty, all ingested data are fetched. Otherwise, these parameters are used to retrieve only a specific range of ingested data. For example, when ingested data are in a time range between T1 <= T2, startDateTime can have any value T3 and endDateTime can have any value T4, where T1 <= T3 <= T4 <= T2.

This call returns immediately with a retrieve holder allowing you to use multiple approaches on how to retrieve the data. Based on the input parameters, the specific call for data retrieval searches the cache and tries to find the ingested data.

note

When utilizing Snowflake as the backend storage for your data, it's important to understand how nested features are stored and retrieved. This note provides insights into the storage and retrieval process, differentiating between Snowflake and Delta Lake storage.

Nested features are stored as VARIANT data type. For example, in a column named 'Person,' a nested feature might be stored as follows:

{ "Age": 5, "Name": "John"}

Retrieval in Spark or Parquet File

When a user retrieves data stored in Snowflake as a backend using Spark or as a Parquet file, the structure is retained.
In the retrieved data, the nested feature appears as a JSON string within the designated column and row.

Understanding the nuances of how Snowflake and Delta Lake handle nested features is crucial for seamless data storage, retrieval, and compatibility with different data processing tools. Whether it's the JSON format in Snowflake or the hierarchical column structure in Delta Lake, this information ensures efficient utilization of your chosen backend storage solution.

Downloading the files from Feature Store

You can download the data to your local machine by:

Blocking approach:

Python
Scala

dir = ref.download()

val dir = ref.download()

Non-Blocking approach:

Python
Scala

future = ref.download_async()

val future = ref.downloadAsync()

note

More information about asynchronous methods is available at Asynchronous methods.

This will download all produced data files (parquet) into a newly created directory.

Obtaining data as a Spark Frame

You can also read the data from the retrieve call directly as a Spark frame:

Python
Scala

ref = my_feature_set.retrieve()
data_frame = ref.as_spark_frame(spark_session)

val ref = myFeatureSet.retrieve()
val dataFrame = ref.asSparkFrame(sparkSession)

Read more about Spark Dependencies in the Spark dependencies section.

Retrieving from online

To retrieve data from the online Feature Store, run:

Python
Scala

json = feature_set.retrieve_online(key)

json = featureSet.retrieveOnline(key)

The key represents a specific primary key value for which the entry is obtained.

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Retrieve API

Downloading the files from Feature Store​

Obtaining data as a Spark Frame​

Retrieving from online​

Downloading the files from Feature Store

Obtaining data as a Spark Frame

Retrieving from online