Moving AI from the Data Center to Edge or Fog Computing

Extending AI possibilities beyond data center

The need for autonomous systems and systems that can augment human work is rapidly increasing in all industries that are driven by exciting and powerful edge or fog computing solutions. The availability of chipper sensors and a wide variety of types of information (such as video, sound, pressure, light, ultrasonic, and so on) that can provide very accurate measurements at a very high precision/ resolution along with network latency constraints in sending all this data to a data center also drive this need.

Looking at our history, the transformation of the Industry 2.0 (based on the division of labor and use of electricity) to Industry 3.0 has been done based on electronics and IT automation of the production. All the electronic equipment that we use today will need to have some sort of intelligence to filter the information gathered and to provide better recommendations to make initial decisions to augment the human work. This transformation can be achieved with embedded AI in those systems (for example, medical devices, vehicles, drones, satellites, and so on).

Many questions related to this transformation arise, such as:

  • How will I deploy and manage the algorithms?
  • How will I collect relevant data from those systems?
  • How will I optimize those algorithms for edge devices?
  • How will I be able to explain algorithm recommendations and decision for audit purposes, improvements (that is, corner cases)?
  • How will I manage the lifecycle (training in the data center and deploy on edge or fog computing)?

Intelligence deals with information and therefore following the data path to create a sustainable edge or fog computing AI architecture is the key.

Figure 1. AI architecture concept from data center to edge/fog

Figure 1

View larger image

We are interested in an architecture that can easily move the trained models from the data center or cloud to edge computing or fog computing for inference, but at the same time, collects the metadata created during the inference process for audit purposes or identification of corner cases. All this needs to be done in a secure way and under a certain governance model that spans across all four following scenarios.

Figure 2. Inference scenarios

Figure 2

View larger image

For proof of concept, let’s take IBM® Maximo Visual Inspection (earlier known as PowerAI Vision) and see how this architecture can be applied in detecting and counting flying drones in a specific area.

In this example, we will:

  • Import the data set.
  • Train the model.
  • Test the inference process on-premises running Maximo Visual Inspection on an IBM Power® AC922 server.
  • Deploy the model for inference outside the data center in a Jetson TX2 system.

Importing and labeling data

You can use any existing data set that is already labeled. The import process takes a few minutes to complete, depending on the size of the data set. In my case, this data set contains images of over 1600 labeled drones.

For information about creating and labeling a data set, see Creating and working with datasets and Labeling datasets in the IBM Knowledge Center.

Figure 3. Labeled data set with drones

Figure 3

View larger image

Training the model

Currently, Maximo Visual Inspection provides for object detection Faster R-CNN, Tiny YOLO2, and custom model options for training.

You can find instructions at Training your model in IBM Knowledge Center.

After many iterations, we were able to obtain a good model with the help of the hyperparameters that are available to be tuned in the Advanced options of Maximo Visual Inspection, such as learning rate and weight decay. We were able to achieve a good IoU and mean average precision (mAP) as you can see in the following figure provided by Maximo Visual Inspection.

Figure 4. Maximo Visual Inspection trained model statistics

Figure 4a

Figure 4b

Deploying and testing the model

In order to validate the Faster R-CNN model, we want to test this model with the testing images we have. Therefore, we used Maximo Visual Inspection inference capabilities to deploy the model in the inference section, allocating the right GPU in the Power AC922 system, and creating an API endpoint to be used. In addition, we are able to count the number of drones that we detect in an image. This same inference process can be used for videos.

Figure 5. Maximo Visual Inspection server – model inference and APIs

Figure 5

View larger image

Exporting the model

After successful testing, the model with several images in the Maximo Visual Inspection inference section, the model can be exported as a ZIP file from the model section by selecting the trained model and clicking the Export button. The model ZIP file, 23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.zip, can then be transferred to a Jetson TX2 through Secure Copy Protocol (SCP).

JETSON TX2 inference

NVIDIA Jetson TX2 or TX2i is an embedded system-on-module (SoM) with dual-core NVIDIA Denver2 + quad-core ARM Cortex-A57, 8 GB 128-bit LPDDR4 and integrated 256-core Pascal GPU providing 1TFLOPS of FP16 compute performance in less than 8 watts of power.

This SoM is equipped with 32 GB eMMC and 4kp60 H.264/H.265 encode/decoder.

Figure 6. NVIDIA Jetson TX2 and TX2i

Figure 6

For the proof of concept, I used the Jetson TX2 developer kit with JetPack 3.3 that includes support for TensorRT 4.0, CUDA 9.0, cuDNN v7.1.5, and multimedia APIs. For Maximo Visual Inspection models, we need to run Caffe, TensorFlow, or YOLO2 on the TX2, depending on how we do the training and based on what models (embedded or user provided) are selected.

In this case, I used Caffe on the TX2. There are many guides available with build instructions (especially on JetsonHacks), but because I trained an FRCNN model, we needed to install:

Because I want to have the ability to run different frameworks such as TensorFlow, Caffe, or YOLO without taking care of libraries versions conflicts, I decided to create Docker containers and run the inference process there. In addition, using Docker containers makes updating the frameworks and libraries easier by enabling a Docker push from the private or public repository.

Figure 7. Jetson TX2 development kit

Figure 7

I have created two sample containers that you can use on your TX2 by running the following commands on the TX2:

$ docker pull ticlazau/aarch64-caffe-frcnn
or
$ docker pull ticlazau/aarch64-tensorflow

To verify the Docker image, you need to issue the command, docker images, and check the image name and size.

    REPOSITORY           TAG       IMAGE ID        CREATED      SIZE
  aarch64-caffe-frcnn   latest   2ad0f96817f4    8 days ago    5.79 GB

For this container to work, before running, you need to specify the Python path (/root/project/py-faster-rcnn/caffe-fast-rcnn/python).

In addition, you need to establish a policy around persistent data for the Docker container such as passing a trained model, obtaining inference logs, and so on. In my case, in my home folder, I created a folder called projects and then created subfolders with various projects name (trained models). Each subfolder contains the extracted exported model:

  • .json file
  • caffemodel file
  • classname file
  • prototxt file (that is, ~nvidia/project/DroneDetection)

To run the aarch64-caffe-frcnn container with GPU support in Jetson TX2, we need to specify the following Linux® devices to Docker:

  • nvhost-ctrl
  • nvhost-ctrl-gpu
  • nvhost-prof-gpu
  • nvmap
  • nvhost-gpu
  • nvhost-as-gpu

For example:

  $ docker run -e PYTHONPATH=:/root/project/py-faster-rcnn/caffe-fast-rcnn/python -e LD_LIBRARY_PATH=:/usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu/tegra:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/root/opencv-3.4.1/build/lib --net=host -v /usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu -v /usr/local/cuda/lib64:/usr/local/cuda/lib64 --device=/dev/nvhost-ctrl --device=/dev/nvhost-ctrl-gpu --device=/dev/nvhost-prof-gpu --device=/dev/nvmap --device=/dev/nvhost-gpu --device=/dev/nvhost-as-gpu --device=/dev/video0:/dev/video0 -v /home/nvidia/project/DroneDetection:/data -it aarch64-caffe-frcnn:latest bash

To run an inference in the container, we need a simple Python script that reads the trained model and related object names from the JSON file and starts the inference process.

Figure 8. Python script

Figure 8

The VisionDetection class is responsible to parse the model information based on the user input arguments, run the inference on Caffe with or without GPU acceleration, and detect the objects in the image (for example: python aivision_tx2_objdetect.py –network_parameters).

Run single drone detection

 $ python tools/aivision_tx2_objdetect.py --model_file=/data/deploy/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.caffemodel --net_file=/data/deploy/test.prototxt --json_file=/data/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2-prop.json --label_file=/data/deploy/classname.txt /data/TEST-IMAGES/test3.jpg
Figure 9. Inference results

Figure 9

Run multiple drone detection

  $ python tools/aivision_tx2_objdetect.py --model_file=/data/deploy/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.caffemodel --net_file=/data/deploy/test.prototxt --json_file=/data/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2-prop.json --label_file=/data/deploy/classname.txt /data/TEST-IMAGES/test1.jpg
Figure 10. Inference results

Figure 10

Looking to the inference results, we can produce the coordinates in the image where the objects have been identified (rather than only an image output with boundary boxes). This can present the potential to introduce real-time object detection, tracking, and 3D positioning system for a multiple camera deployment in a specific area. In addition, these coordinates can coordinate robot arm movements for a high-powered laser pointer.

This can be implemented with a Jetson TX2 camera or a USB camera with or without motion detection. In the opinion of the author for such cases implementing motion detection and use the inference process only when need to save power on Jetson TX2. This can be accomplished by adding a software element to the gstreamer pipeline to detect when there is movement in the frames given by the camera.

You can set TX2 to max performance mode before doing the inference with:

$sudo nvpmodel -m 0 && sudo ~/jetson_clocks.sh

As a result, the performance gains in executing the inference will improve and more probability will benefit when using video streaming.

real 0m7.334s vs real 0m8.383s
user 0m5.612s vs user 0m5.812s
sys 0m1.888s vs sys 0m2.432s
Figure 11. Performance change

Figure 11

View larger image

As of today, the same process and technique can be applied for classification with Maximo Visual Inspection.

Conclusion

In conclusion, we have a way to accelerate the deep learning adoption in computer vision when we deploy on edge or fog computing. We can help companies to accelerate AI adoption in the robotic arms, microscopes, and other devices much faster than ever before.

Acknowledgment

Special thanks to Carl Bender and Srinivas Chitiveli.