Extending AI possibilities beyond data center
The need for autonomous systems and systems that can augment human work is rapidly increasing in all industries that are driven by exciting and powerful edge or fog computing solutions. The availability of chipper sensors and a wide variety of types of information (such as video, sound, pressure, light, ultrasonic, and so on) that can provide very accurate measurements at a very high precision/ resolution along with network latency constraints in sending all this data to a data center also drive this need.
Looking at our history, the transformation of the Industry 2.0 (based on the division of labor and use of electricity) to Industry 3.0 has been done based on electronics and IT automation of the production. All the electronic equipment that we use today will need to have some sort of intelligence to filter the information gathered and to provide better recommendations to make initial decisions to augment the human work. This transformation can be achieved with embedded AI in those systems (for example, medical devices, vehicles, drones, satellites, and so on).
Many questions related to this transformation arise, such as:
- How will I deploy and manage the algorithms?
- How will I collect relevant data from those systems?
- How will I optimize those algorithms for edge devices?
- How will I be able to explain algorithm recommendations and decision for audit purposes, improvements (that is, corner cases)?
- How will I manage the lifecycle (training in the data center and deploy on edge or fog computing)?
Intelligence deals with information and therefore following the data path to create a sustainable edge or fog computing AI architecture is the key.
Figure 1. AI architecture concept from data center to edge/fog
We are interested in an architecture that can easily move the trained models from the data center or cloud to edge computing or fog computing for inference, but at the same time, collects the metadata created during the inference process for audit purposes or identification of corner cases. All this needs to be done in a secure way and under a certain governance model that spans across all four following scenarios.
Figure 2. Inference scenarios
For proof of concept, let’s take IBM® Maximo Visual Inspection (earlier known as PowerAI Vision) and see how this architecture can be applied in detecting and counting flying drones in a specific area.
In this example, we will:
- Import the data set.
- Train the model.
- Test the inference process on-premises running Maximo Visual Inspection on an IBM Power® AC922 server.
- Deploy the model for inference outside the data center in a Jetson TX2 system.
Importing and labeling data
You can use any existing data set that is already labeled. The import process takes a few minutes to complete, depending on the size of the data set. In my case, this data set contains images of over 1600 labeled drones.
Figure 3. Labeled data set with drones
Training the model
Currently, Maximo Visual Inspection provides for object detection Faster R-CNN, Tiny YOLO2, and custom model options for training.
You can find instructions at Training your model in IBM Knowledge Center.
After many iterations, we were able to obtain a good model with the help of the hyperparameters that are available to be tuned in the Advanced options of Maximo Visual Inspection, such as learning rate and weight decay. We were able to achieve a good IoU and mean average precision (mAP) as you can see in the following figure provided by Maximo Visual Inspection.
Figure 4. Maximo Visual Inspection trained model statistics
Deploying and testing the model
In order to validate the Faster R-CNN model, we want to test this model with the testing images we have. Therefore, we used Maximo Visual Inspection inference capabilities to deploy the model in the inference section, allocating the right GPU in the Power AC922 system, and creating an API endpoint to be used. In addition, we are able to count the number of drones that we detect in an image. This same inference process can be used for videos.
Figure 5. Maximo Visual Inspection server – model inference and APIs
Exporting the model
After successful testing, the model with several images in the Maximo Visual Inspection inference section, the model can be exported as a ZIP file from the model section by selecting the trained model and clicking the Export button. The model ZIP file, 23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.zip, can then be transferred to a Jetson TX2 through Secure Copy Protocol (SCP).
JETSON TX2 inference
NVIDIA Jetson TX2 or TX2i is an embedded system-on-module (SoM) with dual-core NVIDIA Denver2 + quad-core ARM Cortex-A57, 8 GB 128-bit LPDDR4 and integrated 256-core Pascal GPU providing 1TFLOPS of FP16 compute performance in less than 8 watts of power.
This SoM is equipped with 32 GB eMMC and 4kp60 H.264/H.265 encode/decoder.
Figure 6. NVIDIA Jetson TX2 and TX2i
For the proof of concept, I used the Jetson TX2 developer kit with JetPack 3.3 that includes support for TensorRT 4.0, CUDA 9.0, cuDNN v7.1.5, and multimedia APIs. For Maximo Visual Inspection models, we need to run Caffe, TensorFlow, or YOLO2 on the TX2, depending on how we do the training and based on what models (embedded or user provided) are selected.
In this case, I used Caffe on the TX2. There are many guides available with build instructions (especially on JetsonHacks), but because I trained an FRCNN model, we needed to install:
- OpenCV 3.4.x
- py-faster-rcnn from https://github.com/rbgirshick/py-faster-rcnn
- Caffe from http://github.com/BVLC/caffe.git
Because I want to have the ability to run different frameworks such as TensorFlow, Caffe, or YOLO without taking care of libraries versions conflicts, I decided to create Docker containers and run the inference process there. In addition, using Docker containers makes updating the frameworks and libraries easier by enabling a Docker push from the private or public repository.
Figure 7. Jetson TX2 development kit
I have created two sample containers that you can use on your TX2 by running the following commands on the TX2:
$ docker pull ticlazau/aarch64-caffe-frcnn
$ docker pull ticlazau/aarch64-tensorflow
To verify the Docker image, you need to issue the command, docker images, and check the image name and size.
REPOSITORY TAG IMAGE ID CREATED SIZE aarch64-caffe-frcnn latest 2ad0f96817f4 8 days ago 5.79 GB
For this container to work, before running, you need to specify the Python path (/root/project/py-faster-rcnn/caffe-fast-rcnn/python).
In addition, you need to establish a policy around persistent data for the Docker container such as passing a trained model, obtaining inference logs, and so on. In my case, in my home folder, I created a folder called projects and then created subfolders with various projects name (trained models). Each subfolder contains the extracted exported model:
- .json file
- caffemodel file
- classname file
- prototxt file (that is, ~nvidia/project/DroneDetection)
To run the aarch64-caffe-frcnn container with GPU support in Jetson TX2, we need to specify the following Linux® devices to Docker:
$ docker run -e PYTHONPATH=:/root/project/py-faster-rcnn/caffe-fast-rcnn/python -e LD_LIBRARY_PATH=:/usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu/tegra:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/root/opencv-3.4.1/build/lib --net=host -v /usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu -v /usr/local/cuda/lib64:/usr/local/cuda/lib64 --device=/dev/nvhost-ctrl --device=/dev/nvhost-ctrl-gpu --device=/dev/nvhost-prof-gpu --device=/dev/nvmap --device=/dev/nvhost-gpu --device=/dev/nvhost-as-gpu --device=/dev/video0:/dev/video0 -v /home/nvidia/project/DroneDetection:/data -it aarch64-caffe-frcnn:latest bash
To run an inference in the container, we need a simple Python script that reads the trained model and related object names from the JSON file and starts the inference process.
Figure 8. Python script
VisionDetection class is responsible to parse the model information based on the user input arguments, run the inference on Caffe with or without GPU acceleration, and detect the objects in the image (for example:
python aivision_tx2_objdetect.py –network_parameters).
Run single drone detection
$ python tools/aivision_tx2_objdetect.py --model_file=/data/deploy/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.caffemodel --net_file=/data/deploy/test.prototxt --json_file=/data/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2-prop.json --label_file=/data/deploy/classname.txt /data/TEST-IMAGES/test3.jpg
Figure 9. Inference results
Run multiple drone detection
$ python tools/aivision_tx2_objdetect.py --model_file=/data/deploy/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.caffemodel --net_file=/data/deploy/test.prototxt --json_file=/data/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2-prop.json --label_file=/data/deploy/classname.txt /data/TEST-IMAGES/test1.jpg
Figure 10. Inference results
Looking to the inference results, we can produce the coordinates in the image where the objects have been identified (rather than only an image output with boundary boxes). This can present the potential to introduce real-time object detection, tracking, and 3D positioning system for a multiple camera deployment in a specific area. In addition, these coordinates can coordinate robot arm movements for a high-powered laser pointer.
This can be implemented with a Jetson TX2 camera or a USB camera with or without motion detection. In the opinion of the author for such cases implementing motion detection and use the inference process only when need to save power on Jetson TX2. This can be accomplished by adding a software element to the gstreamer pipeline to detect when there is movement in the frames given by the camera.
You can set TX2 to max performance mode before doing the inference with:
$sudo nvpmodel -m 0 && sudo ~/jetson_clocks.sh
As a result, the performance gains in executing the inference will improve and more probability will benefit when using video streaming.
real 0m7.334s vs real 0m8.383s user 0m5.612s vs user 0m5.812s sys 0m1.888s vs sys 0m2.432s
Figure 11. Performance change
As of today, the same process and technique can be applied for classification with Maximo Visual Inspection.
In conclusion, we have a way to accelerate the deep learning adoption in computer vision when we deploy on edge or fog computing. We can help companies to accelerate AI adoption in the robotic arms, microscopes, and other devices much faster than ever before.
Special thanks to Carl Bender and Srinivas Chitiveli.