Machine Learning Model Deployment is the process of making your model available in production environments, so it can be used to make predictions for other software systems. Before model deployment, feature engineering occurs in preparing data that will later be used to train a model. Driverless AI Automatic Machine Learning (AutoML) combines the best feature engineering and one or more machine learning models into a scoring pipeline. The scoring pipeline is used to score or predict data when given new test data. The scoring pipeline comes in two flavors. The first scoring pipeline is a Model Object, Optimized(MOJO) Scoring Pipeline, a standalone, low-latency model object designed to be easily embeddable in production environments. The second scoring pipeline is a Python Scoring Pipeline, which has a heavy footprint that is all Python and uses the latest libraries of Driverless AI to allow for executing custom scoring recipes.
For this self-paced course, we will continue making use of the prebuilt experiment: Model_deployment_HydraulicSystem. The Driverless AI experiment is a classifier model that classifies whether the cooling condition of a Hydraulic System Test Rig is 3, 20, or 100. By looking at the cooling condition, we can predict whether the Hydraulic Cooler operates close to total failure, reduced efficiency, or full efficiency.
Hydraulic Cooling Condition | Description |
3 | operates at close to total failure |
20 | operates at reduced efficiency |
100 | operates at full efficiency |
The Hydraulic System Test Rig data for this self-paced course comes from the UCI Machine Learning Repository: Condition Monitoring of Hydraulic Systems Data Set. The data set was experimentally obtained with a hydraulic test rig. This test rig consists of a primary working and a secondary cooling-filtration circuit connected via the oil tank [1]. The system cyclically repeats constant load cycles (duration 60 seconds) and measures process values such as pressures, volume flows, and temperatures. The condition of four hydraulic components (cooler, valve, pump, and accumulator) is quantitatively varied. The data set contains raw process sensor data (i.e., without feature extraction), structured as matrices (tab-delimited) with the rows representing the cycles and the columns the data points within a cycle. Hydraulic System Test Rigs are used to test Aircraft Equipment components, Automotive Applications, and more [2]. A Hydraulic Test Rig can test a range of flow rates that can achieve different pressures with the ability to heat and cool while simulating testing under different conditions [3]. Testing the pressure, the volume flow, and the temperature is possible by Hydraulic Test Rig sensors and a digital display. The display panel alerts the user when certain testing criteria are met while displaying either a green or red light [3]. Further, a filter blockage panel indicator is integrated into the panel to ensure the Hydraulic Test Rig's oil is maintained [3]. In the case of predicting cooling conditions for a Hydraulic System, when the cooling condition is low, our prediction will tell us that the cooling of the Hydraulic System is close to total failure, and we may need to look into replacing the cooling filtration solution soon.
By the end of this self-paced course, you will predict the cooling condition for a Hydraulic System Test Rig by deploying an embeddable MOJO Scoring Pipeline into Java Runtime using Java, Sparkling Water, and PySparkling.
Figure 1: Hydraulic System Cylinder Diagram
[1] Condition monitoring of hydraulic systems Data Set
[2] SAVERY - HYDRAULIC TEST RIGS AND BENCHES
[3] HYDROTECHNIK - Flow and Temperature Testing Components
# Create a directory where the mojo-pipeline folder will be stored
mkdir $HOME/dai-mojo-java/
Download MOJO Scoring Pipeline
1. If you have not downloaded the MOJO Scoring Pipeline, consider the following steps:
When finished, come back to this self-paced course.
2. Move the mojo.zip file to the dai-mojo-java/
folder and then extract it:
cd $HOME/dai-mojo-java/
# Depending on your OS, sometimes the mojo.zip is unzipped automatically and therefore, instead of mojo.zip, write mojo-pipeline for the first command. If it's mojo-pipeline no need to execute the unzip command.
mv $HOME/Downloads/mojo.zip .
unzip mojo.zip
3. Download and install Anaconda:
# Download Anaconda (Note: the command is for a Linux environment)
wget https://repo.anaconda.com/archive/Anaconda3-2020.02-Linux-x86_64.sh
# Install Anaconda (Note: the command is for a Linux environment)
bash Anaconda3-2020.02-Linux-x86_64.sh
# (Mac)) To Download and Install Anaconda follow the steps on this link: https://docs.anaconda.com/anaconda/install/mac-os/
4. Create virtual environment and install required packages:
# Install Python 3.6.10 and create virtual environment
conda create -y -n model-deployment python=3.6
# Activate the virtual environment
conda activate model-deployment
# Install Java
conda install -y -c conda-forge openjdk=8.0.192
# Install Maven
conda install -y -c conda-forge maven
# Install NumPy
pip install numpy
5. Set the Driverless AI License Key as a temporary environment variable:
Note: If you don't have a license, you can obtain one through our 21-day trial license option. Note: Aquarium will not contain a Driverless AI License Key. Through the 21-day trial license option, you will be able to obtain a temporary Driverless AI License Key necessary for this self-paced course.
# Set Driverless AI License Key
export DRIVERLESS_AI_LICENSE_KEY="{license-key}"
1.Download and install Spark if not already installed from Sparks Download page.
2. Point SPARK_HOME to the existing installation of Spark and export variable MASTER.
# (Make sure that Spark is unzipped and that the path points to spark-3.0.1-bin-hadoop2.7.)
export SPARK_HOME="/path/to/spark/installation"
# To launch a local Spark cluster.
export MASTER="local[*]"
3. Download Sparkling Water and then move Sparkling Water to the HOME folder and extract it:
cd $HOME
mv $HOME/Downloads/sparkling-water-3.30.1.2-1-3.0.zip .
unzip sparkling-water-3.30.1.2-1-3.0.zip
cd sparkling-water-3.30.1.2-1-3.0
After downloading the MOJO scoring pipeline, the mojo-pipeline folder comes with many files. The files needed to execute the MOJO scoring pipeline are as follows: pipeline.mojo, mojo2-runtime.jar, and example.csv. As well, in the mojo-pipeline folder, we can find the following file that helps run the pipeline relatively quickly: run_example.sh. Further, the mojo-pipeline folder contains the pipeline.mojo file, which is the standalone scoring pipeline in MOJO format. This pipeline file contains the packaged feature engineering pipeline and the machine learning model. Further, the folder also includes a jar name mojo2-runtime.jar, the MOJO Java API. And to test our code, the file has a CSV name example.csv containing sample test data.
If you have gone through the earlier scoring pipeline deployment self-paced courses, you have seen how we deploy the MOJO Scoring Pipeline to a server or serverless instance. Some clients interact with the server to trigger it to execute the MOJO to make predictions. An alternative way to deploy the MOJO Scoring Pipeline is to embed it directly into the Java Runtime Environment where your application is running. So if you are building a Java application using an Integrated Development Environment (IDE) or a text editor, you can import the MOJO Java API. Then use it to load the MOJO, put your test data into a MOJO frame, perform predictions on the data, and return the results.
You will execute the MOJO scoring pipeline in the Java Runtime Environment using Java, PySparkling, and Sparkling Water.
You will run the run_example.sh script that came with the mojo-pipeline folder. This script requires the mojo file, CSV file, and license file. It runs the Java ExecuteMojo example program, and the mojo makes predictions for a batch of Hydraulic cooling conditions.
Since we already have our license file path specified as an environment variable, we will pass in the path to the following three files: run_example.sh, pipeline.mojo, and example.csv. Right after, we will run them to get our predictions.
cd $HOME/dai-mojo-java/mojo-pipeline/
bash run_example.sh pipeline.mojo example.csv
This classification output is the batch scoring done for our Hydraulic System cooling condition. You should receive classification probabilities for cool_cond_y.3, cool_cond_y.20, and cool_cond_y.100. The 3 means the Hydraulic cooler is close to operating at total failure, 20 means it is operating at reduced efficiency, and 100 means operating at full efficiency.
The results will give you a probability (a decimal value) for cool_cond_y.3, cool_cond_y.20, and cool_cond_y.100. After converting each decimal value to a percentage, note that the highest percentage per row will determine the type of cool_cond_y for that row.
Similarly, we could execute run_example.sh without passing arguments to it by creating temporary environment variables for the mojo pipeline file and an example CSV file path.
export MOJO_PIPELINE_FILE="$HOME/dai-mojo-java/mojo-pipeline/pipeline.mojo"
export EXAMPLE_CSV_FILE="$HOME/dai-mojo-java/mojo-pipeline/example.csv"
Now execute the run_example.sh, and you should get similar results as above.
bash run_example.sh
Likewise, we can also execute the ExecuteMojo Java application directly as below and get similar results as above:
java -Dai.h2o.mojos.runtime.license.key=$DRIVERLESS_AI_LICENSE_KEY -cp mojo2-runtime.jar ai.h2o.mojos.ExecuteMojo $MOJO_PIPELINE_FILE $EXAMPLE_CSV_FILE
Start PySparkling to enter the PySpark interactive terminal:
cd $HOME/sparkling-water-3.30.0.6-1-3.0
# Note: You might get the following error when executing the below command:
# "colorama" package is not installed, please install it as: pip install colorama
# "requests" package is not installed, please install it as: pip install requests
# "tabulate" package is not installed, please install it as: pip install tabulate
# "future" package is not installed, please install it as: pip install future
# If you get the above error, you need to install the above packages. Right after, try the below command again.
./bin/pysparkling --jars $DRIVERLESS_AI_LICENSE_KEY
Now that we are in the PySpark interactive terminal, we will import some dependencies:
# First, specify the dependency
import os.path
from pysparkling.ml import H2OMOJOPipelineModel,H2OMOJOSettings
We configured the H2O MOJO Settings to ensure the output columns are appropriately named. Now we will load the MOJO scoring pipeline:
# The 'namedMojoOutputColumns' option ensures the output columns are named properly.
settings = H2OMOJOSettings(namedMojoOutputColumns = True)
homePath = os.path.expanduser("~")
# Load the pipeline. 'settings' is an optional argument.
mojo = H2OMOJOPipelineModel.createFromMojo(homePath + "/dai-mojo-java/mojo-pipeline/pipeline.mojo", settings)
Next, load the example CSV data as a Spark's DataFrame
# Load the data as Spark's Data Frame
dataFrame = spark.read.csv(homePath + "/dai-mojo-java/mojo-pipeline/example.csv", header=True)
Finally, we will run batch scoring on the Spark DataFrame using mojo transform. Right after, we will get the scored data for the cool hydraulic condition:
# Run the predictions. The predictions contain all the original columns plus the predictions added as new columns
predictions = mojo.transform(dataFrame)
# Get the predictions for desired cols using array with selected col names
predictions.select([mojo.selectPredictionUDF("cool_cond_y.3"), mojo.selectPredictionUDF("cool_cond_y.20"), mojo.selectPredictionUDF("cool_cond_y.100")]).collect()
# Quit PySparkling
quit()
The MOJO predicted the Hydraulic System cooling condition for each row within the batch of Hydraulic System test data we provided. You should receive classification probabilities for cool_cond_y.3, cool_cond_y.20, and cool_cond_y.100. The 3 means the Hydraulic cooler is close to operating at total failure, 20 means it is operating at reduced efficiency, and 100 means operating at full efficiency.
Accordingly, that is how you execute the MOJO scoring pipeline to do batch scoring using PySparkling.
Start Sparkling Water to enter Spark's interactive terminal:
cd $HOME/sparkling-water-3.30.0.6-1-3.0
./bin/sparkling-shell --jars $DRIVERLESS_AI_LICENSE_KEY
Now that we are in the Spark interactive terminal, we will import some dependencies:
// First, specify the dependency
import ai.h2o.sparkling.ml.models.{H2OMOJOPipelineModel,H2OMOJOSettings}
Now configure the H2O MOJO Settings to ensure the output columns are correctly named. As well, load the MOJO scoring pipeline:
// The 'namedMojoOutputColumns' option ensures the output columns are named properly.
val settings = H2OMOJOSettings(namedMojoOutputColumns = true)
val homePath = sys.env("HOME")
// Load the pipeline. 'settings' is an optional argument.
val mojo = H2OMOJOPipelineModel.createFromMojo(homePath + "/dai-mojo-java/mojo-pipeline/pipeline.mojo", settings)
Next, load the example CSV data as a Spark's DataFrame.
// Load the data as a Spark's DataFrame
val dataFrame = spark.read.option("header", "true").csv(homePath + "/dai-mojo-java/mojo-pipeline/example.csv")
Finally, we will run batch scoring on the Spark DataFrame using mojo transform; with it, we will get the scored data for cool efficiency:
// Run the predictions. The predictions contain all the original columns plus the predictions.
val predictions = mojo.transform(dataFrame)
# Get the predictions for desired cols sep by comma with selected col names
predictions.select(mojo.selectPredictionUDF("cool_cond_y.3"), mojo.selectPredictionUDF("cool_cond_y.20"), mojo.selectPredictionUDF("cool_cond_y.100")).show()
# Quit Sparkling Water
:quit
The MOJO predicted the Hydraulic System cooling condition for each row within the batch of Hydraulic System test data we provided. You should receive classification probabilities for cool_cond_y.3, cool_cond_y.20, and cool_cond_y.100. The 3 means the Hydraulic cooler is close to operating at total failure, 20 means it is operating at reduced efficiency, and 100 means operating at full efficiency.
With the above in mind, that is how you execute the MOJO scoring pipeline to do batch scoring using Sparkling Water.
The mojo can also predict a Hydraulic System cooling condition for each individual Hydraulic System row of test data. Moving forward, we will build a Java program to execute the mojo to do interactive scoring on individual Hydraulic System rows.
Create a MojoDeployment folder and go into it:
cd $HOME/dai-mojo-java
mkdir MojoDeployment
cd MojoDeployment
Make sure the java runtime file mojo2-runtime.jar and pipeline.mojo is located in this folder:
cp $HOME/dai-mojo-java/mojo-pipeline/mojo2-runtime.jar .
cp $HOME/dai-mojo-java/mojo-pipeline/pipeline.mojo .
In the H2O documentation Driverless AI MOJO Scoring Pipeline - Java Runtime, they give us a Java code example to predict a CAPSULE value from an individual row of data; we need to modify this code for our Hydraulic System data. Create a Java file called ExecuteDaiMojo.java.
Based on our Hydraulic System example.csv data, we can take the header row and a row of data to replace the data in rowBuilder in the Java code example. So, the Java code example becomes:
import java.io.IOException;
import ai.h2o.mojos.runtime.MojoPipeline;
import ai.h2o.mojos.runtime.frame.MojoFrame;
import ai.h2o.mojos.runtime.frame.MojoFrameBuilder;
import ai.h2o.mojos.runtime.frame.MojoRowBuilder;
import ai.h2o.mojos.runtime.lic.LicenseException;
import ai.h2o.mojos.runtime.utils.CsvWritingBatchHandler;
import com.opencsv.CSVWriter;
import java.io.BufferedWriter;
import java.io.OutputStreamWriter;
import java.io.Writer;
public class ExecuteDaiMojo {
public static void main(String[] args) throws IOException, LicenseException {
// Load model and csv
String homePath = System.getProperty("user.home");
final MojoPipeline model = MojoPipeline.loadFrom(homePath + "/dai-mojo-java/mojo-pipeline/pipeline.mojo");
// Get and fill the input columns
final MojoFrameBuilder frameBuilder = model.getInputFrameBuilder();
final MojoRowBuilder rowBuilder = frameBuilder.getMojoRowBuilder();
rowBuilder.setValue("psa_bar", "155.6405792236328");
rowBuilder.setValue("psb_bar", "104.91106414794922");
rowBuilder.setValue("psc_bar", "0.862698495388031");
rowBuilder.setValue("psd_bar", "0.00021100000594742596");
rowBuilder.setValue("pse_bar", "8.370246887207031");
rowBuilder.setValue("psf_bar", "8.327606201171875");
rowBuilder.setValue("motor_power_watt", "2161.530029296875");
rowBuilder.setValue("fsa_vol_flow", "2.0297765731811523");
rowBuilder.setValue("fsb_vol_flow", "8.869428634643555");
rowBuilder.setValue("tsa_temp", "35.32681655883789");
rowBuilder.setValue("tsb_temp", "40.87480163574219");
rowBuilder.setValue("tsc_temp", "38.30345153808594");
rowBuilder.setValue("tsd_temp", "30.47344970703125");
rowBuilder.setValue("pump_eff", "2367.347900390625");
rowBuilder.setValue("vs_vib", "0.5243666768074036");
rowBuilder.setValue("cool_eff_pct", "27.3796");
rowBuilder.setValue("cool_pwr_kw", "1.3104666471481323");
rowBuilder.setValue("eff_fact_pct", "29.127466201782227");
frameBuilder.addRow(rowBuilder);
// Create a frame which can be transformed by MOJO pipeline
final MojoFrame iframe = frameBuilder.toMojoFrame();
// Transform input frame by MOJO pipeline
final MojoFrame oframe = model.transform(iframe);
// `MojoFrame.debug()` can be used to view the contents of a Frame
// oframe.debug();
// Output prediction as CSV
final Writer writer = new BufferedWriter(new OutputStreamWriter(System.out));
final CSVWriter csvWriter = new CSVWriter(writer, '\n', '"', '"');
CsvWritingBatchHandler.csvWriteFrame(csvWriter, oframe, true);
}
}
Paste the above Java code to your ExecuteDaiMojo.java. Move the java file to the MojoDeployment folder.
Now we have our Java code, let's compile it:
javac -cp mojo2-runtime.jar -J-Xms2g ExecuteDaiMojo.java
Now that the ExecuteDaiMojo.class has been generated, run this Java program to execute the MOJO:
java -Dai.h2o.mojos.runtime.license.file=$DRIVERLESS_AI_LICENSE_KEY -cp .:mojo2-runtime.jar ExecuteDaiMojo
Note: Windows users run
java -Dai.h2o.mojos.runtime.license.file=license.sig -cp .;mojo2-runtime.jar ExecuteDaiMojo
Note:
The MOJO predicted the cooling condition for the individual row of Hydraulic System test data we passed to it. You should receive classification probabilities for cool_cond_y.3, cool_cond_y.20, and cool_cond_y.100. The 3 means the Hydraulic cooler is close to operating at total failure, 20 means it is operating at reduced efficiency, and 100 means operating at full efficiency.
So that is how you execute the MOJO scoring pipeline to do interactive scoring using Java directly.
You could do something that helps you in your daily life or job. Maybe you could reproduce the steps we did above, but for a new experiment or dataset. In that case, you could either decide to do batch scoring, interactive scoring, or both.
Another challenge could be to use the existing MOJO scoring pipeline we executed. Instead of using the examples, we shared above, integrate the scoring pipeline into an existing Java, Python, or Scala program.