IBM API Hub
Automated AI for Decision-Making

By IBM Research Automated Decision Optimization

Automation for data-driven/knowledge-driven dynamic optimization problems including reinforcement learning (RL).


Getting access to API

Overview

This tutorial will help you go over a series of steps to use the "Automated AI For Decision-Making" service.

Step 1: Authentication

Each API request requires authentication information in the request header. The authentication credentials include Client ID and Client secret. You can get these by clicking Get trial subscription button in the top-right corner, registering a free trial using your IBMid. Once your account is registered, navigate to APIs in your account dashboard. The dashboard can be accessed by clicking on the user profile logo at the top right of the current page. Find the subscription to Automated AI for Decision-Making and click on it. You can expand the dropdown and see those two credentials in the Key management session. Note that it might take up to a 1 hour to see the subscribed product in the APIs page.

Step 2: Connection Check

To verify if your credentials are valid and the AutoDO service is connected, you can start with /health-check endpoint. Please go to Health Check tab in the left sidebar, click on Try this API button under the code snippet. Fill in the X-IBM-Client-Id and X-IBM-Client-Secret fields with Client-ID and Client secret you get from Step 1, and click on Run request. If the service is connected and credentials are correct, a response with the following message will be returned:

{
  "status": "ok",
  "message": "AutoDO API server is up and running"
}

You can insert the code snippets of different languages into your program by selecting your desired language in the Code snippet session.

Step 3 : Using the AutoDO API

The following python example will describe accessing each piece of the API. Further details can be found in the left-hand menu under Automated Decision Optimization -> AutoDO

NOTE: the following script is not meant to be used as-is in its entirety. If you did so, it would create a new submission, show it in "queued" status and then proceed to delete it. In typical usage, the user would poll for completion, then get the results, then (optionally) delete the submission. You can also set this up to be run in separate Jupyter notebook cells, and not proceed to the next cell until the job has completed.

# We'll use the requests module to send API requests to the server, and JSON to parse the results
import requests
import json

# We need the URL endpoint of the AUTODO service
AUTODO_API = "https://dev.api.ibm.com/autodo/test/autodo"

# Your email address is used to filter out your submissions.   In the future, we plan to provide status
# updates and final results via email as well
email = "mummert@us.ibm.com"

# we need to identify ourselves here
headers = { "Content-type": "application/json",
            "X-IBM-Client-Id": "YOUR-CLIENT-ID-HERE",
            "X-IBM-Client-Secret": "YOUR-CLIENT-SECRET-HERE" }

# ** List My Submissions ** in the left-hand menu
# First, we'll check to see if we have any previous submissions.
# The first line makes the API request, and the subsequent two lines retrieve the list and prints it.
req = requests.get(f"{AUTODO_API}/submissions?email={email}", headers=headers)
all_submissions = req.json()
print(json.dumps(all_submissions, indent=2))

# ** Create Basic Submission **
# Second, we make a new submission using one of the named environments.   We include our email so
# our submission can be tracked, and we specify whether this environment has discrete or continuous
# actions.   A successful response will return a submission UUID and initial **queued** job state.
params = { "email": email,
           "env": "CartPole-v0",
           "action_type": "discrete" }
req = requests.post(f"{AUTODO_API}/submissions", headers=headers, data=json.dumps(params))
submission = req.json()
submission_uuid = submission["submission_uuid"]          # we need this uuid for subsequent APIs
print(json.dumps(submission, indent=2))

# ** Get Submission Status **
# Third, as these may be long-running asynchronous jobs, we will occasionally poll for status, waiting for job to
# go into the **Running** state and finally into the **Completed** state.
req = requests.get(f"{AUTODO_API}/submissions/{submission_uuid}", headers=headers)
info = req.json()
print(json.dumps(info, indent=2))

# ** Get Submission Results **
# Once the job has completed successfully, you can retrieve the results which consists of the top-k models, and a Python script
# to load and evaluate the models locally.   As these models may be quite large, the results are provided as a downloadable URL.
req = requests.get(f"{AUTODO_API}/submissions/{submission_uuid}/results",headers=headers)
print(json.dumps(req.json(), indent=2))



# In addition to the Online RL submission shown above, we also support Offline/BatchRL, AutoMDP, and SafeRL environments.
# Each of these will be submitted with a JSON file that describes the data, and each of the individual state/action/rewards columns.
# 



# ** BatchRL Submission **
# the parameters can also optionally include: 1) *agents*: a list of agents to use, 2) *top_k*: number of models to return, 
#                                                                                   3) *no_more_than*: return no more than this many models for any specific agent
params["email"] = email
 multipart_form_data = {
     "json_conf_file": ("json_conf_file", open("your_config_file.json", "rb")),
     "csv_data_file": ("csv_data_file", open("your_data_file.csv, "rb")),
      "params": (None, json.dumps(params)),
 }
req = requests.post(
       f"{AUTODO_API}/batchrl_submissions",
       files=multipart_form_data,
       headers=self.credentials)



# AutoMDP 
# API is similar to BatchRL in that you provide a configuration file and a csv data file
# This runs an MDP-based algorithm using CPLEX to generate models/policies
# params can optionally include top_k
params["email"] = email
 multipart_form_data = {
     "json_conf_file": ("json_conf_file", open("your_config_file.json", "rb")),
     "csv_data_file": ("csv_data_file", open("your_data_file.csv, "rb")),
      "params": (None, json.dumps(params)),
 }
req = requests.post(
       f"{AUTODO_API}/automdp_submissions",
       files=multipart_form_data,
       headers=self.credentials)



# ** SafeRL Submission **
# Again, API is similar to the above.   
# The data must contain a state to be used as a constraint
# Currently, a constrained_TD3_plus_BC agent is run to generate the models, but 
# this may change or be supplemented in the future
params["email"] = email
 multipart_form_data = {
     "json_conf_file": ("json_conf_file", open("your_config_file.json", "rb")),
     "csv_data_file": ("csv_data_file", open("your_data_file.csv, "rb")),
      "params": (None, json.dumps(params)),
 }
req = requests.post(
       f"{AUTODO_API}/saferl_submissions",
       files=multipart_form_data,
       headers=self.credentials)


# ** Delete Submission **
# Finally, if you want to **CANCEL** or **REMOVE** your job and any generated artifacts, you can call the DELETE API
req = requests.delete(f"{AUTODO_API}/submissions/{submission_uuid}", headers=headers)
print(json.dumps(req.json(), indent=2))

The above script shows you how to setup the necessary request headers and the usage of the primary API calls.

There are two additional calls available as well.

When making a submission, you can also specify a subset of the available agents by adding and agents list to the submitted params. The following code shows how to obtain the list of agents, and what the param block would look like. Note, that even if using a single agent, it must be a list.

# ** List Available Agents**
# Return the current list of supported agents and the action-types each supports.
req = requests.get(f"{AUTODO_API}/agents", headers=headers)
print(json.dumps(req.json(), indent=2))

# To specify a specific agent(s) in ** Create Basic Submission **,  modify the params as follows:
params = { "email": email,
           "env": "CartPole-v0",
           "action_type": "discrete",
           "agents": ["AGENT-NAME-1", "AGENT-NAME-2"] }



Finally, if you want to evaluate your own custom environment, there is an API to do that in Create Custom Submission. However, at this time, all custom environments must be vetted by our team due to security implications in running arbitrary user-code on our backend. Please contact our team at dharmash@us.ibm.com for assistance.

For completeness, the custom submission is accessed like so:

# ** Create Custom Submission **
# In addition to the params, we also need to submit the file with our request.  To do so,
# we will use a multi-part form.   The following code sets up the form-data, and makes the request
params = { "action_type": "discrete", "email": email }
path = "local/path/to/your/file"
multipart_form_data = { "env": ("env", open(path, "rb")),
                        "params": (None, json.dumps(params)) }
req = requests.post(f"{AUTODO_API}/code_submissions", files=multipart_form_data, headers=headers)
submission = req.json()
print(json.dumps(submission, indent=2))



We are in the process of creating publicly available notebooks with code examples. See the section below for using the autodo_client python module which simplifies interaction with APIHub.

Offline/Batch Reinforcement Learning

For some applications, a reliable simulator of the real environment may not exist, or it may be cost-prohibitive to build a reliable simulator. Offline RL (or batch RL) is a class of techniques that allow us to apply RL in such cases where online RL cannot be applied due to the lack of reliable simulators.

Unlike online RL, offline RL does not require a simulator but instead assumes the existence of the data that record the interaction of typically unknown agents with the real environment. More specifically, the data should include the information about what action was taken at what state and, as a result of taking that action, how much reward was obtained and where the state transitions into. Note that the data may be collected by multiple agents or over multiple episodes (where an episode begins when an agent starts interacting with an environment and ends for example when it completes a task). Also, the data may contain only a part of an episode or multiple parts of multiple episodes. Namely, the data is a list of a tuple of state, action, reward, next state, and whether an episode terminates at this “next state.”

Given the data, AutoDO searches for the top-k agents along with their hyperparameters, similar to online RL. AutoDO searches over the algorithms implemented in d3rlpy (https://github.com/takuseno/d3rlpy), which are specifically designed for offline RL. Despite the difference in the input and learning algorithms between offline RL and online RL, the agents trained with offline RL can be used exactly in the exactly same manner as the agents trained by online RL.

Offline Markov Decision Process (MDP)

Classical MDP Approach

One of the approaches that this asset addresses off-line problems is a "classical" MDP( https://books.google.co.il/books?hl=en&lr=&id=VvBjBAAAQBAJ&oi=fnd&pg=PT9&dq=puterman&ots=rslzANRZRP&sig=U-VtDte8Eaeg-hXDulwyDwiqcoI&redir_esc=y#v=onepage&q=puterman&f=false) approach. This approach is a model-based approach. MDP receives a data set and a json file that describes variables/features such as name, type and optionally range and binning intervals. As well information about which variables represent states, actions, and reward is provided in the same json file. For state variable, information such as whether a state variable is controllable (actions taken can change value of this state variable) or uncontrollable (action independent) should be provided as well. If such information is unknown or unavailable the state variable should be marked as controllable.

Receiving this input, the MDP will next estimate a transition probability matrix following a paper of Zadorojniy et al. https://link.springer.com/article/10.1007/s10479-016-2146-z. To apply a classical MDP to large scale problems, the asset first reduces the dimensionality of the model to few state variables (i.e., 3-5). This is done by applying a machine learning (ML) algorithm (currently a decision tree (DT) is used). The purpose of this algorithm is to train on the available data when state samples are inputs and actions are the labels to sort a list of state variables according to their importance with respect to the decisions made. Since there are many possibilities of building an MDP with respect to number of state variables and binning of state and action variables, a searcher(i.e., HyperOpt)is used. When the number of possible configurations is small (e.g., less than 100) all of them will be checked, otherwise 100 of them will be sampled. The searcher is configured through the .yaml file. For example: number of states variables is 2, 3 or 4, number of state and action variables bins is 2, 3, 4 or 5. For configuration of 4 state variables, the 4 most important variables will be chosen from the list that was built by the ML algorithm. Lastly, the output of the system in terms of the top K best MDP solutions and/or configurations is returned. For the MDP instance quality evaluation a fitted q-evaluation (FQE) algorithm is used. As higher the mean reward of FQE, the better the configuration at evaluation, and top K best configurations together with their solutions are returned. Since configuration generation and testing are performed automatically, we call this approach as AutoMDP.

  • System ingests input data with annotated states, action, rewards, binning strategy for state and action.

  • A predictive model is built to order and later select most important state variables and reduce the search space in the subsequent steps

  • Search algorithm perform hyperparameter tuning and optimization on the search space, iteratively generating and evaluating MDP instances

  • Top k MDP solutions and/or configurations are selected as output.


Safe Reinforcement Learning

Standard RL (either online or offline) seeks to learn policies that maximize the expected cumulative reward. In inventory management, for example, one would want to maximize the expected cumulative profit, taking into account sales revenue as well as inventory and other costs. Even when there are multiple objectives, if one can define the weighted sum of those multiple objectives as a single objective (e.g., representing the monetary profit), one can learn policies with standard RL to maximize that single objective.

In some applications, one may also have constraints, which cannot be simply added into the single objective. For example, there may be a service level agreement, and the cost is incurred when the service level does not meet the agreed level. Due to the nonlinearity, the cost associated with the violation of such service level agreement cannot be simply added into the single objective and instead should be treated as a separate constraint.

Safe RL seeks to learn policies that maximize the expected cumulative reward while satisfying possibly multiple safety constraints. Each safety constraint is associated with a safety value as well as a safety threshold and requires that the expected cumulative safety value is at least (or at most) the safety threshold.

In AutoDO, only offline RL supports safe RL. For offline safe RL, the user needs to prepare the data that includes the columns corresponding to safety values, in addition to those columns that are included in the data for offline RL. That is, the data should be a list of a tuple of state, action, reward, safety values, next state, and whether an episode terminates at this “next state.” The user should also specify the safety threshold for each safety constraint.

Given the data for offline safe RL, AutoDO searches for the top-k agents along with their hyperparameters, similar to offline RL. Since there are safety constraints in addition to the objective, there is a choice of how to define top-k agents. In AutoDO, the user specifies the weight of each safety constraint, and the agents are ordered based on the sum of the value of the objective function minus the weighted amount of safety constraint violations.

Using the AutoDO client module

AutoDO Client Module for Python

The python module autodo_client.py can be downloaded here.

AutoDO-API-client.ipynb is an example Jupyter notebook utilizing the above client.


Open API Specification

About specification
API Version

1.0.28

Legend
Technologies
Industries