Version: 0.19.3

Feature set schedule API

You can schedule an ingestion job from Feature Store by using API available on the feature sets.

Schedule a new task

To create new scheduled task, you first need to obtain the feature set.

Python
Scala

fs.schedule_ingest("task_name", source, schedule = "0 2 * * *", description = "", credentials = None, allowed_failures = 2)

schedule argument is in cron format (e.g., 0 2 * * * will execute task every day at 2 am).

allowed_failures argument determines how many times the task can fail till a next failure will deschedule the task in order to save resources. A negative number has the meaning that any number of failures is allowed. Default value is 2.

fs.scheduleIngest("task_name", source, schedule = "0 2 * * *", description = "", credentials = null, allowedFailures = 2)

schedule argument is in cron format (e.g., 0 2 * * * will execute task every day at 2 am).

allowedFailures argument determines how many times the task can fail till a next failure will deschedule the task in order to save resources. A negative number has the meaning that any number of failures is allowed. Default value is 2.

note

Scheduling ingestion task is allowed from all data sources except Spark dataframe. Data source used for scheduling must be stored in permanent accessible locations, which is not true for Spark Dataframes as they live in memory of some Spark session.

To list scheduled tasks

List methods do not return tasks directly. Instead, it returns an iterator which obtains the tasks sets lazily.

Python
Scala

fs.schedule.tasks()

fs.schedule().tasks()

Obtaining a task

Python
Scala

task = fs.schedule.get("task_id")

val task = fs.schedule().get("task_id")

Examining task executions

Basic information about the task executions can be obtained by asking for executions history. It will provide the start/end times of scheduled task runs and a final (job) status. A special status 'Created' is delivered in case the scheduled task started, but not yet finished. An accompanied job id information can be utilized to get access to a job that fulfilled the execution in the past.

Python
Scala

for execution_record in task.execution_history():
  print(execution_record)

task.executionHistory().foreach(print _)

Obtaining a lazy ingest task

The lazy ingest task allows you to schedule the ingestion of the data for a feature set to a later time, rather than ingesting the data immediately. Each major version of a feature set can contain only one lazy ingest task. To obtain it, run:

Python
Scala

task = fs.schedule.get_lazy_ingest_task()

val task = fs.schedule().getLazyIngestTask()

Deleting task

Python
Scala

task = fs.schedule.get("task_id")
task.delete()

val task = fs.schedule().get("task_id")
task.delete()

Updating task fields

To update the field, simply call the setter of that field, for example:

Python
Scala

task = fs.schedule.get("task_id")
task.description = "new description"
task.schedule = "0 6 * * *"

val task = fs.schedule().get("task_id")
task.description = "new description"
task.schedule = "0 6 * * *"

Controlling task liveness

In case a task was scheduled with some defined limit on failures and the failures actually occurred then the task gets automatically paused by Feature Store in order to save resources. To check whether the task was paused or not use following call:

Python
Scala

task = fs.schedule.get("task_id")
task.is_paused()

val task = fs.schedule().get("task_id")
task.isPaused()

A task can be paused even manually if a user decides so.

Python
Scala

task.pause()

task.pause()

A paused task can be rescheduled again by calling a resume() method. The resume method can take an optional argument that enables to set a new limit on allowed failures. If the value isn't provided then existing limit stays without change.

Python
Scala

task.resume(allowed_failures = None)

task.resume(allowedFailures: Option[Int] = None)

To check a current limit on allowed failures see

Python
Scala

task.allowed_failures

task.allowedFailures

Starting lazy ingest task

If lazy ingest task exist on feature set it will be run automatically on first retrieve. The user has the option to run it:

Python
Scala

fs.schedule.start_lazy_ingest_task()

fs.schedule().startLazyIngestTask()

note

In case some ingest was executed on feature set version, lazy ingest task will not run.

Timezone configuration for task

By default, Feature Store clients pick the system timezone. It is possible to change the timezone such as:

Python
Scala

import os, time
os.environ['TZ'] = 'UTC-05:00'

TimeZone.setDefault(TimeZone.getTimeZone("UTC-05:00"))

note

Supported timezone format is UTC-XX:XX, UTC+XX:XX or timezones supported by Python and Scala.

Feedback

Submit and view feedback for this page
Send feedback about H2O Feature Store to cloud-feedback@h2o.ai

Feature set schedule API

Schedule a new task​

To list scheduled tasks​

Obtaining a task​

Examining task executions​

Obtaining a lazy ingest task​

Deleting task​

Updating task fields​

Controlling task liveness​

Starting lazy ingest task​

Timezone configuration for task​

Schedule a new task

To list scheduled tasks

Obtaining a task

Examining task executions

Obtaining a lazy ingest task

Deleting task

Updating task fields

Controlling task liveness

Starting lazy ingest task

Timezone configuration for task