Time-series forecasting is one of the most common and important tasks in business analytics. The goal of time-series forecasting is to forecast the future values of that series using historical data. Time-series forecasting uses models to predict future values based on previously observed values, also known as extrapolation.

Driverless AI has its own recipes for time-series forecasting that combines advanced time-series analysis and H2O's own Kaggle Grand Masters' time-series recipes. In this tutorial we will walk through the process of creating a time series experiment and compare the results to a pre-loaded time series experiment based on the same dataset just higher experiment settings.

Note: We recommend that you go over the entire tutorial first to review all the concepts, that way, once you start the experiment, you will be more familiar with the content.

You can get more information about getting a Driverless AI environment or trial from the following:

If you are not familiar with Driverless AI please review and do this tutorial:

About the Dataset

This dataset contains information about a global retail store. It includes historical data for 45 of its stores located in different regions of the United States from 02-05-2010 to 11-01-2012. Each numbered store contains a number of departments, the stores specific markdowns(promotional) events they have throughout the year which typically happen before prominent holidays such as the Superbowl, Labor Day, Thanksgiving and Christmas. Additional information included are the weekly sales, dates of those sale, the fuel price in the region, consumer price index and unemployment rate. The dataset was used in a Kaggle in 2014 competition with the goal of helping this retail store forecast sales of their stores.

[1] Our training dataset is a synthesis of the csv data sources provided for the Kaggle Store Sales Forecasting competition. The three datasets were train.csv, stores.csv and features.csv. The train.csv has the store number, department, date, weekly sales and whether or not that day was a holiday. The stores.csv had the types of stores and their size while the features.csv which had additional demographic information about the specific region the store was located in.

The training dataset in this tutorial contains 73,165 rows and a total of 11 features (columns) and is about 5 MB. The test dataset contains about 16,000 rows and a total of 11 features (columns) and is about 1 MB.

Datasets Overview

If you are using Aquarium as your environment then the following labs, Test Drive and Introduction to Driverless AI, will have this tutorials training and test subsets of the Retail Store Forecasting dataset preloaded for you. The datasets will be located on Datasets Overview page. You will also see two extra data sets, which you can ignore for now as they are used for another tutorial.

NOTE: To learn how to add the two datasets from the Driveless AI file system see Appendix A: Add the Datasets.

1. Verify that both dataset are on the Datasets Overview, your screen should look similar to the page below:

retail-store-train-test-datasets

2. Click on the walmart_tts_small_train.csv file, then on Details.

retail-store-train-detail-selection

3. Let's take a quick look at the columns of the training set:

retail-store-train-detail-page

Things to Note:

4. Continue scrolling the current page to see more columns (image is not included)

5. Return to the Datasets Page

Launch Experiment

As mentioned on the objectives, this tutorial includes a pre-ran experiment that has been linked to the Projects Workspace. Projects is a feature introduced in Driveless AI 1.7.0 and it is a workspace for managing datasets and experiments related to a specific business problem or use case. The Projects page allows for easy comparisons of performance and results and identify the best solution for your problem. See Deeper Dive and Resources at the end of this task for additional information on the Projects Workspace.

2. Select Projects , an image similar to the one below will appear:

projects-page

Things to Note:

  1. Projects: Projects menu option
  2. Pre-created Project which includes:

    • Name : Project name (Time Series Tutorial)
    • Description: Optional (N/A)
    • Train Datasets: Number of train datasets (1)
    • Valid Datasets: Number of validation datasets (0)
    • Test Datasets: Number of test datasets (1)
    • Experiments: Number of experiments (1)
  3. Additional options for the created project:

    • Open
    • Rename
    • Delete
  4. +New Project: Option to create a new project

3. Open the Time Series Tutorial, an image similar to the one below will appear:

projects-page-time-series

The project "Time Series Tutorial" has the pre-ran time series experiment linked, this includes:
All the datasets used in the pre-ran experiment
Completed Experiment

4. Select New Experiment , located on the top-right corner of the page.

projects-new-experiment

5. Select Not Now on the First time Driverless AI, Click Yes to get a tour!. A similar image should appear, then select Click to select or import a dataset...

new-project-training-data

6. Select the walmart_tts_small_train.csv dataset:

new-project-select-train-dataset

Name your experiment: Time Series Forecasting

7. A similar experiment page will appear:

retail-store-predict

On task 2, we will explore and update the Time Series Experiment Settings.

References

[1] Walmart Recruiting - Stores Sales Forecasting

Deeper Dive and Resources

H2O - Projects Workspace

In this task, we are going to update the experiment settings. Unlike the other experiments covered in this tutorial series, the experiment settings layout for time series are slightly different, and there is an additional component, time. The following experiment settings will be adjusted to run through the mechanics of running a time series experiment

Experiment settings to be updated:

Below are high level descriptions of the Driveless AI settings that will be updated for this time series tutorial. To learn more about each scorer see the Deeper Dive and Resources at the end of this task.

Test Dataset
The test dataset is used for testing the modeling pipeline and creating test predictions. The test set is never used during training of the modeling pipeline. (Results are the same whether a test set is provided or not.) If a test dataset is provided, then test set predictions will be available at the end of the experiment. Adding the test dataset will also hint Driverless AI of the expected horizon and gap. Driverless AI measures the length of the Test dataset as well as the timing of when it commences (compared to the end of the training data) to decide on these values.

Weight Column
Column that indicates the observation weight (a.k.a. sample or row weight), if applicable. This column must be numeric with values >= 0. Rows with higher weights have higher importance. The weight affects model training through a weighted loss function and affects model scoring through weighted metrics. The weight column is not used when making test set predictions, but a weight column (if specified) is used when computing the test score during training.

Time Column
Column that provides a time order (time stamps for observations), if applicable. Can improve model performance and model validation accuracy for problems where the target values are auto-correlated with respect to the ordering (per time-series group).

The values in this column must be a datetime format understood by pandas.to_datetime(), like "2017-11-29 00:30:35" or "2017/11/29", or integer values. If [AUTO] is selected, all string columns are tested for potential date/datetime content and considered as potential time columns. If a time column is found, feature engineering and model validation will respect the causality of time. If [OFF] is selected, no time order is used for modeling and data may be shuffled randomly (any potential temporal causality will be ignored).

Time Groups

Time Groups are categorical columns in the data that can significantly help predict the target variable in time series problems. Examples of time groups would be a combination of customer and product (assuming each has its own history), where you might want to see if a customer wants to buy one of your specific products. You can look into the direct time series and view how many times a customer has bought that particular product in the past time points. The two time groups (customer and product) or multiple time series can be blended together in Driveless AI.

Scorers

A scorer is a function that takes actual and predicted values for a dataset and returns a number. Looking at this single number is the most common way to estimate the generalization performance of a predictive model on unseen data by comparing the model's predictions on the dataset with its actual values. For a given scorer, Driverless AI optimizes the pipeline to end up with the best possible score for this scorer. We highly suggest experimenting with different scorers and to study their impact on the resulting models[1].

The scores available in Driverless AI are:

Forecast Horizon

Amount of time periods to predict

It is important to note that the following settings are essential for the development of a good model. For best model results, it is recommended to use the default settings given by Driverless AI. Please keep in mind that using an environment like Test Drive will limit you to a two-hour lab session. The default settings can lead to a run time of more than two hours.

Accuracy

Accuracy in time series forecasting determines the number of time-based validation splits. It also controls whether sampling will be used, the types of main machine learning models as well as the type of features included.

Time
It controls how long (as in how many iterations) Driverless AI will spend on trying to find:

  1. The best time series features
  2. Best models
  3. Best hyper parameters for these models

Interpretability

Controls the complexity of the models and features allowed within the experiments (e.g. higher interpretability will generally block complicated features and models).

Now we will update the experiment settings for our retail sales dataset.

retail-store-experiment-settings

1. Select Test Dataset, then select walmart_tts_small_test.csv

add-test-set

2. To start the time series experiment you need to select Time Column, then select Date.

Note: The date will be defined in the time field, when this is done then Time Series will be enabled, then the Time Series Settings will appear on the top-right side of the page.

add-time-column

3. Select Weight Column, then select sample_weight

add-weight-column

4. Select Target Column, then select Weekly_Sales

add-target-column

Under Time Series Settings located on the top-right side:

5. Select Time Groups Columns, then select the columns below, followed by: Done.

add-time-group-columns

6. Select Forecast Horizon, make sure the Forecast Horizon is 26 weeks and the gap between Train/Test Period is 0 weeks.

forecast-horizon-and-gap

7. Under Experiment Settings, click on Scorer,

expert-settings-scorer

then select R2 as the scorer:

add-scorer-r2

8. Under Experiment Settings, update Accuracy, Time and Interpretability to values below, then click on Launch Experiment:

Note:: These settings were selected to conform to the Aquarium/Test Drive Environment. The goal is to walk-through the mechanics of setting up a time series experiment. Having an interpretability of 10 means that we want a simple model that will be easy to interpret.

experiment-settings-6-1-5

9. Now review your experiment settings page and make sure it looks similar to the image below, after, select Launch Experiment.

experiment-page-launch-experiment

References

[1] H2O's Driveless AI Scorer Tips

Deeper Dive and Resources

Time Series

Time series is a collection of observations generated sequentially through time. In time series, data is ordered with respect to time, and it is expected that successive observations are dependent, an example is ocean tides[1].

Characteristics of time series data:

The plots below are examples of non-stationary time series where the time series dataset shows seasonality and trends.

time-series-seasonal-and-linear

Time Series Analysis

Time series analysis helps answer questions such as what is the causal effect on a variable Y of a change in X overtime? The goal is to understand the dataset to build mathematical models that provide plausible interpretations of the problem domain. In time-series analysis you are trying to determine the components of the dataset in terms of seasonal patterns, trends, relation to external factors. Models are developed to best capture or describe an observed time series in order to understand the underlying causes[1].

Time Series Forecasting

Time-series forecasting is one of the most common and important tasks in business analytics. The goal of time-series forecasting is to forecast the future values of that series using historical data. Time-series forecasting uses models to predict future values based on previously observed values, also known as extrapolation.

Here is a short list of the many real-world applications of time-series:

Time Series Forecasting in Driveless AI

Driverless AI has its own recipes for time-series forecasting that combines advanced time-series analysis and H2O's own Kaggle Grand Masters' time-series recipes.

These are the key features/recipes that make the automation possible:

Driverless AI uses GBMs, GLMs and neural networks with a focus on time-series-specific feature engineering. The feature engineering includes:

The guiding principle for properly modeling a time series forecasting problem is to use the historical data in the model training dataset such that it mimics the data/information environment at scoring time (i.e. deployed predictions). Specifically, you want to partition the training set to account for:

  1. The information available to the model when making predictions
  2. The length of predictions to make.
    Given a training dataset, gap and prediction length are parameters that determine how to split the training dataset into training samples and validation samples.

Gap: is the amount of missing time bins between the end of a training set and the start of test set (with regards to time). For example:

time-series-gap

Quite often, it is not possible to have the most recent data available when applying a model (or it is costly to update the data table too often); hence models need to be built accounting for a "future gap". For example if it takes a week to update a certain data table, ideally we would like to predict "7 days ahead" with the data as it is "today"; hence a gap of 7 days would be sensible. Not specifying a gap and predicting 7 days ahead with the data as it is 7 days ahead is unrealistic (and cannot happen as we update the data on a weekly basis in this example).

Similarly, gap can be used for those who want to forecast further in advance. For example, users want to know what will happen in 7 days in the future, they will set the gap to 7 days.

Horizon (or prediction length) is the period that the test data spans for (for example, one day, one week, etc.). In other words it is the future period that the model can make predictions for.

time-series-horizon

The periodicity of updating the data may require model predictions to account for significant time in the future. In an ideal world where data can be updated very quickly, predictions can always be made having the most recent data available. In this scenario there is no need for a model to be able to predict cases that are well into the future, but rather focus on maximizing its ability to predict short term. However this is not always the case, and a model needs to be able to make predictions that span deep into the future because it may be too costly to make predictions every single day after the data gets updated.
In addition, each future data point is not the same. For example, predicting tomorrow with today's data is easier than predicting 2 days ahead with today's data. Hence specifying the horizon can facilitate building models that optimize prediction accuracy for these future time intervals.

Groups

Time-series has multiple groups, which combines multiple time-series together. Groups are categorical columns in the data that can significantly help predict the target variable in time series problems. For example, one may need to predict sales, given information about stores and products or just stores or just products. Being able to identify that the combination of store and products can lead to very different sales is key for predicting the target variable, as a big store or a popular product will have higher sales than a small store and/or with unpopular products.

For example, if we don't know that the store is available in the data, and we try to see the distribution of sales along time (with all stores mixed together), it may look like the chart below:
time-series-sales-per-day-all-groups
Note the format Date(Time), Group(Groups) and Target(Sales) plus other independent features. This is the ideal format the data needs to be in or order for Driveless AI Time-Series to work.

Lag

The primary generated time series features are lag features, which are a variable's past values. At a given sample with time stamp t, features at some time difference T(lag) in the past are considered. For example, if the sales today are 300, and sales of yesterday are 250, then the lag of one day for sales is 250. Lags can be created on any feature as well as on the target.

time-series-lag

Note: The top section is the original dataset with training data, the gap and the period we want to predict is also known as the test.

As previously noted, the training dataset is appropriately split such that the amount of validation data samples equals that of the testing dataset samples. If we want to determine valid lags, we must consider what happens when we will evaluate our model on the testing dataset. Essentially, the minimum lag size must be greater than the gap size.

Aside from the minimum useable lag, Driverless AI attempts to discover predictive lag sizes based on auto-correlation."Lagging" variables are important in time series because knowing what happened in different time periods in the past can greatly facilitate predictions for the future.

Validation Schemas

Driveless AI uses the most recent training data as the validation data. Data can be validated by the following validation schemas:

Below is an example of a time series dataset, we will use it to showcase some of the validation schemas:

validation-schema-dataset

Time Split

The number of time splits is highly dependent on the value of accuracy set on the experiment page. If the accuracy is set low when setting up the experiment, then Driveless AI selects a single time split which in turn will only generate one model for validation. A single time split takes the most recent data and makes it the validation data. The validation data will be the same size as the forecast horizon and it will include a gap if there was a gap.

Single Time Split

validation-schema-time-split

When accuracy is set to higher values, then the number of time splits increases and Driveless AI does a more thorough cross validation and we start generating multiple folds with a rolling window. A rolling window means that we keep shifting the validation set to the past and we use again any data before that for a training dataset, this process will be done multiple times. For example when Accuracy is set to 10, then the number of time splits increases to 6, this means there will be more rolling windows. The number of rolling windows is a factor of accuracy.

Multi window

validation-schema-multi-window

Time Series Feature Engineering

The following are the types of time series families that Driveless AI creates:

Date Decomposition extracts:

feature-engineering-date-decomposition

Lags : If you wanted to predict target we can use the values of yesterday(lag1), two days ago(lag2), three days ago(lag3) as features.

feature-engineering-lags

Windows: Another family of features are windows, windows are combinations of different lags. For example we can take an average or a moving average of three lags together such as lag1, lag2 and lag3. It is good to be able to see the difference between a standard average and a weighted moving average where the highest observation has the highest weight than the other one, the idea being that what happened most recently will have a bigger impact on the target compared to events that happened further away in the past. We can also do this by applying exponential smoothing, where we apply an exponential decay of .95 (hyper parameter a), where we give the most recent observation higher importance than the one that is further in the past.

Windows can be also used to obtain other descriptive statistics such as:

feature-engineering-windows

Interactions : Interactions are interactions between lag values, these are also features that are created in order to deseasonalize the data to focus more on the differences between the data points than then trend. For example, calculating the difference between lag1 and lag2 ( Diff1 = lag1 - lag2) or looking proportionally how the target is changing in the past (Div1= lag1/lag2).

feature-engineering-interactions

Trends: Trends or correlation is used as another feature where we take the lag values and plot them against time and observe the trend created(R2 value). Linear regression can also be used where the coefficient or slope is taken and then it is used as a feature to solve the trend/tendency of the time series to go up or down.

feature-engineering-trends

Target transformations: Driveless AI also does target transformation so that instead of modeling on the target(label), we can model on the square root of the target. For example when using RMSLE as the scorer, Driveless AI converts the target to the log of the target.
Other transformations include:
Square Root
Log

feature-engineering-target-transformations

References

[1] Applied Time Series and Box-Jenkins Models by Walter Vandaele page 3-5

Deeper Dive and Resources

At the end of the experiment, a similar page will appear:

experiment-results-summary-page

Things to Note:

  1. Status: Complete
    • Deploy To Cloud
    • Interpret this Model - Launches Model Interpretation on time series data for multiple groups
    • Diagnose Model on New Dataset... - allows you to view model performance for multiple scorers based on existing model and dataset
    • Score on another Dataset - After you generate a model, you can use that model to make predictions on another dataset
    • Transform Another Dataset.. - Not available for Time Series experiments
    • Download Predictions
      • Training Predictions - In csv format, available if a validation set was NOT provided
      • Test Set Predictions - In csv format, available if a validation set was provided
    • Download Python Scoring Pipeline - A standalone Python scoring pipeline for H2O Driverless AI
    • Build MOJO Scoring Pipeline - A standalone Model Object, Optimized scoring pipeline
    • Download Experiment Summary - An experiment summary is available for each completed experiment as zip file
    • Download Logs
    • Download Autoreport
  2. Iteration Data - Validation
    • Validation Score - 0.7642
    • Model Type: XGBoostGBM
    • Variable Importance
  3. Summary:
  4. Summary: See image below:

experiment-results-summary

experiment-results-actual-vs-predicted

experiment-results-residuals

Deeper Dive and Resources

1. On the Status: Complete Options: select Interpret this Model

interpret-this-model

2. While the model is being interpreted an image similar to the one below will appear:

mli-interpret-model

3. Once the "MLI Experiment is Finished" page comes up, select Yes, and an image similar to the one below will appear:

mli-time-series-explanations-and-debugging-1

mli-time-series-explanations-and-debugging-2

Things to Note:

  1. MLI TS HELP
    • Help Panel : This panel describes how to read and use the Time Series MLI page.
    • Hide Panel : To hide Help Panel, click on Hide Panel
    • Add Panel : add a new MLI Time Series panel. This allows you to compare different groups in the same model and also provides the flexibility to do a "side-by-side" comparison between different models.
    • MLI TS Docs : A link to the "Machine Learning Interpretability with Driverless AI" booklet.
  2. Time Series Model
    • Download Logs : Download a zip file of the logs that were generated during this interpretation
    • Show Summary : Button provides details about the experiment settings that were used
    • Download Group Metrics : retrieve the averages of each group's scorer, as well as each group's sample size.
    • Input Box : this box lists the ID of the current model. The ID value can be changed to view other models. This can be done by adding a panel and searching in the input box for the new ID.
    • Time Series Plot : If the test set includes actual results, then a time series plot will be displayed
  3. Groups Test Metrics
    • Top Group Test Metrics : Top group matrix based on the scorer that was used in the experiment
    • Bottom Group Test Metrics : Bottom group matrix based on the scorer that was used in the experiment
    • Group Search : Entry field for selecting the groups to view. A graph of Actual vs Predicted values for the group will appear. This graph can be downloaded to your local machine.

4. Read the MLI TS Help panel to get a better idea on how to run the MLI on Time Series data for multiple groups, then click on Hide Help Panel.

5. Under Time Series Model click Show Summary, an image similar to the one below will appear:

mli-time-series-show-summary

Note: This is a summary of the experiment settings, it comes in handy when you want to compare the MLI settings/results to the MLI settings/results of another model for dataset side by side.

6. Select Hide Summary

7. Hover over the Forecast Horizon of the R2 Time Series Plot.
Note: R2 or the coefficient of determination is mainly used to analyze how well a variable can predict another one. In other words, it is a statistical measure of how well the regression line approximates the real values. It represents the strength of the relationship between two time series or variables.The values observed in that range are the percentage of change of variable x that can be explained by changes in variable y. The values of R2 range between 0 and 1, where 1 indicates that the values in this range can be entirely explained by pre-existing values.

mli-time-series-r2-plot

8. Under Top Groups Test Metrics and Bottom Group Test Metrics, which Department(s) and Store(s) had the top R2 values? How about the Department(s) and Store(s) with the lowest R2?

Note: Columns is the number of unique cases in that time series composed of department and store appear of test data.

mli-time-series-group-test-metrics

9. On the Group Search box:

  1. Enter the following Dept and Store numbers: 3,12
  2. (Dept, Store) options will appear below the Group Search box, select 3,12
  3. A plot similar to the one below will appear with the actual and predicted values plotted for Department 3, Store 12:

mli-group-dept-3-12-actual-vs-predicted

  1. Hover over to the Forecast Horizon and note the Actual plot in yellow and the Predicted plot in white. While there hover over the peak point of the plot then compare the actual vs predicted values generated by the model for store 3,12.
  2. This is the option to download the plot
  3. From the Actual vs Predicted chart find the peak point and double click on it, a local Shapley value will appear right below the plot:

mli-group-dept-3-12-peak-point-shapley-value

At exactly the peak, it is clear that the lag of 52 weeks is the most important feature that drives this prediction that high.

  1. While at the Actual vs Predicted chart find a point somewhere at the plateau and double click on it, a local Shapley value will appear right below the plot:

mli-group-dept-3-12-plateau-point-shapley-value

  1. Explore other Departments and Stores Actual vs Predicted charts by clearing the "3, 12" value and entering another Department and Store in the Group Search box.

10. Go to the top of the page and:

  1. Select Add Panel
  2. On the new panel, click on the Select a model interpretation, then select the Time Series Model named : Time Series Forecasting - Experiment 2: dahecaga. This will bring in the pre-ran experiment's MLI results. Click on Show Summary for both experiments to compare experiment settings:

Note: the Driveless AI Experiment Runtime for both experiments. The pre-ran experiment took more than seven hours to run.

mli-new-experiment-and-preran-experiment

  1. For the pre-ran experiment, enter Department 3, Store 12 and find the peak point as well as the Shapley values associated with the peak point. Compare the values of the experiment you ran to the pre-ran experiment:

mli-new-experiment-and-preran-experiment-2

When looking at both MLI results, we can see that for the pre-ran experiment the Shapley value that had the most importance for the peak value was 33 EWMA Lag or the Exponentially Weighted Moving Average, which calculates the exponentially moving average of a target or feature lag, compared to the lag of 52 weeks for the new experiment. The feature that we see in the pre-ran experiment is a weighted moving average of what happened in various weeks over a course of 2 years; this is a more complex feature than the 52 weeks lag, and that is expected because we built a more complex model from the pre-ran experiment. Although the 52 weeks lag would help make the prediction for a peak value more accurate, our more complex model is trained to be able to predict any point in time, compared to our simple model which would make predictions based on the 1 year lag. Note that the 52 lag is indeed, one of the important variables in the complex model, but is not the most important one.

  1. Find the shapley values for a point on the plateau for the pre-ran experiment and compare the values between the pre-ran experiment and the new experiment MLI results.

Deeper Dive and Resources

Now we are going to take a look at the pre-ran Time-Series experiment and compare the results of the new experiment through the Projects Workspace:

1. Click on H2O.ai located at the top-left side of the MLI page, this will take you back to the Datasets Overview page.

2. Select Projects, then click on the Time Series Tutorial Project.

3. On the experiments section of the Projects page click on the pre-ran time-series experiment with name Time Series Forecasting - Experiment 2. The following image should appear:

pre-ran-experiment-settings-10-6-6

This experiment was run in another environment with similar parameters except for the following settings:

The above settings are recommended settings for timeseries problems, notice the high accuracy, time and lower interpretability compared to the settings from task 2. Time-series experiments are very special cases as a result it is highly encouraged that the experiments are run with the default settings given by Driverless AI.

For a time-series experiment an Accuracy of 10 is highly encouraged because it forces many time splits (time splits are critical for stability and prevents overfitting) and allows for multiple window validation. If you must run a time-series experiment with anything lower than a 10, the lowest recommended setting for accuracy is a 5.

Time is more flexible and can be ran with the Driveless AI default value, or the lowest time value being 3. Regarding interpretability, use default results, for good results use interpretability values of either 5 or 6, anything less than 5 will tend to overfit.

Summary of suggested settings for Time-Series Experiments:

Accuracy

10

5

Time

Default

3

Interpretability

Default

5

One important thing to note is why we changed the Scorer that Driveless AI suggested initially from RMSE to R2 . Even though Driveless AI suggested RMSE as the scorer, we updated the scorer to R2 because for this particular dataset it's easier to generate similar results across different experiments since we can expect less fluctuation and more stability in terms of the results.

4. Click on < located at the top-left side of the Experiments page, this will take you back to the Project Time Series Tutorial page.

5. On the experiments section of the Projects page:

  1. Click on the pre-ran time-series experiment with name Time Series Forecasting - Experiment 2 and the name of the time-series experiment you ran for task 2
  2. Then select Compare 2 Items

comparing-two-items

6. A page with your experiment results and the results for the pre-ran experiment will show. An image similar to the one below will appear:

comparing-two-items-2

comparing-two-items-3

Things to Note:

  1. The experiment with the lower settings had less features scored compared to the pre-ran experiment. This means that Driveless AI tested 45 features from which only 7 were found useful compared to the pre-ran experiment which tested 2700 features and found 18 features useful for feature engineering. At higher settings, Driveless AI does a more thorough evaluation.
  2. The lower settings experiment had an R2 value of .95146 compared to .95852 for the pre-ran experiment.
  3. The variables under variable importance for the low settings are very simple lags compared to the pre-ran experiment that has very sophisticated variables.
  4. On the Actual vs Predicted plots, the pre-ran experiment shows the points less dispersed compared to the low settings experiment. This translates to higher accuracy on the predictions.

8. We have two models, a complex model, and a simple model. The complex model performed better than the simple model, but yielded some features that are not very easy to interpret, thus making the model less interpretable. On the other hand, we have a simple model that produced intuitive features but had a lower score than the complex model. Choosing the "best" or most accurate model depends on the specific application, and one has to decide if they want:

Or

This decision needs to be made according to each particular case.

9. You have a finished model that you are satisfied with, what is next? What if you wanted to make predictions outside of the 26 week forecast horizon?

Some of the options are:

Learn more about Driveless AI's Test Augmentation by visiting H2O's documentation site.

Deeper Dive and Resources

Add the Datasets

Import H2O's training and test subsets of the Retail Store Forecasting dataset to the Datasets Overview Page.
1. Select +Add Dataset(or Drag and Drop) then click on File System

retail-store-add-dataset
2. Type the following path into the search bar: "/data/TimeSeries/walmart/"
3. Select the following sets from the list:
walmart_tts_small_test.csv
walmart_tts_small_train.csv

retail-store-import-datasets

4. Click to Import Selection

5. Verify that both dataset were added to the Datasets Overview, your screen should look similar to the page below:

retail-store-train-test-datasets

Check out Driverss AI next tutorial Natural Language Processing Tutorial - Sentiment Analysis

Where you will learn: