Spark dependencies
If you want to interact with Feature Store from a Spark session, several dependencies need to be added on the Spark Classpath. Supported Spark versions are 3.5.x.
Using S3 as the Feature Store storage:
io.delta:delta-spark_2.12:3.0.0
org.apache.hadoop:hadoop-aws:${HADOOP_VERSION}
HADOOP_VERSION
is the hadoop version your Spark is built for.
Version of delta-spark library needs to match your Spark version.
Version 3.0.0
can be used by Spark 3.5.
Using Azure Gen2 as the Feature Store storage:
io.delta:delta-spark_2.12:3.0.0
featurestore-spark-dependencies.jar
org.apache.hadoop:hadoop-azure:${HADOOP_VERSION}
HADOOP_VERSION
is the hadoop version your Spark is built for.
Version of delta-spark library needs to match your Spark version.
Version 3.0.0
can be used by Spark 3.5.
The Spark dependencies jar can be downloaded from the Downloads page.
Using Snowflake as the Feature Store storage:
net.snowflake:spark-snowflake_${SCALA_VERSION}:2.12.0-spark_3.4
SCALA_VERSION
is the scala version used.
Version of spark-snowflake library needs to match your Spark version.
Version 2.12.0-spark_3.4
can be used by Spark 3.4.
General configuration
Spark needs to be started with the following configuration to ensure that the time travel queries are correct:
spark.sql.session.timeZone=UTC
spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
In case of running Databricks 11.3 and higher, following options need to be set as well:
databricks.loki.fileSystemCache.enabled=false
If you do not have Apache Spark started, please start it first.
- Submit and view feedback for this page
- Send feedback about H2O Feature Store to cloud-feedback@h2o.ai