Extending the AI possibilities beyond the Data Center
The need for autonomous systems and systems that can augment human work is rapidly increasing in all industries that are driven by exciting, powerful edge or fog computing solutions. The availability of chipper sensors and a wide variety of types of information (video, sound, pressure, light, ultrasonic, etc) that can provide very accurate measurements at a very high precision/ resolution along with network latency constraints in sending all this data to a data center also drive this need.
Looking at our history, the transformation of the Industry 2.0 (based on the division of labor and use of electricity) to Industry 3.0 has been done based on electronics and IT automation of the production. All the electronic equipment that we use today will need to have some sort of intelligence to filter the information gathered and to provide better recommendations to make initial decisions to augment the human work. This transformation can be achieved with embedded AI in those systems (medical devices, vehicles, drones, satellites, etc).
Many questions related to this transformation arise, such as:
- How will I deploy and manage the algorithms
- How will I collect relevant data from those systems
- How will I optimize those algorithms for edge devices
- How will I be able to explain algorithm recommendations and decision for audit purposes, improvements (i.e. corner cases)
- How will I manage the lifecycle (training in the data center and deploy on edge or fog computing)
Intelligence deals with information and therefore following the data path to create a sustainable Edge or Fog Computing AI architecture is the key.
We are interested in an architecture that can easily move the trained models from the data center or Cloud to Edge Computing or Fog Computing for Inference, but at the same time, collects the metadata created during the inference process for audit purposes or identification of corner cases. All this needs to be done in a secure way and under a certain governance model that spans across all four scenarios presented below.
For proof of concept, letâ€™s take IBM PowerAI Vision and see how this architecture can be applied in detecting and counting flying drones in a specific area.
In this example, we will
- Import the dataset
- Train the model
- Test the inference process on-prem running PowerAI Vision on an IBM AC922 system
- Deploy the model for inference outside the data center in a Jetson TX2
Importing and labeling data
You can use any existing dataset that is already labeled. The import process takes a few minutes to complete, depending on the size of the dataset. In my case, this dataset contains images of over 1600 labeled drones.
Training the model
Currently, PowerAI Vision provides for object detection Faster R-CNN, Tiny YOLO2, and custom model options for training.
You can find instructions at Training your model in the IBM Knowledge Center.
After many iterations, we were able to obtain a good model with the help of the hyperparameters that are available to be tuned in the Advanced options of PowerAI Vision, such as learning rate and weight decay. We were able to achieve a good IoU and mAP as you can see in below image provided by PowerAI Vision.
Deploying and testing the model
In order to validate the Faster R-CNN model, we want to test this model with the testing images we have. Therefore, we used IBM PowerAI Vision inference capabilities to deploy the model in the inference section, allocating the right GPU in the AC922 system, and creating an API endpoint to be used. In addition, we are able to count the number of drones that we detect in an image. This same inference process can be used for videos.
Exporting the model
After successful testing the model with several images in the PowerAI Vision inference section, the model can be exported as a zip file from the model section by selecting the trained model and clicking on the export button. The model zip file 23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.zip can then be transferred to a Jetson TX2 via secure copy protocol.
JETSON TX2 inference
NVIDIA Jetson TX2 or TX2i is an embedded system-on-module (SoM) with dual-core NVIDIA Denver2 + quad-core ARM Cortex-A57, 8GB 128-bit LPDDR4 and integrated 256-core Pascal GPU providing 1TFLOPS of FP16 compute performance in less then 8 watts of power.
This SoM is equipped with 32GB eMMC and 4kp60 H.264/H.265 encode/decoder.
For the proof of concept, I used the Jetson TX2 developer kit with JetPack 3.3 that includes support for TensorRT 4.0, CUDA 9.0, cuDNN v7.1.5, and multimedia APIs. For PowerAI Vision models, we need to run Caffe, TensorFlow, or YOLO2 on the TX2, depending on how we do the training and based on what models (embedded or user provided) are selected.
In this case, I used Caffe on the TX2. There are many guides available with build instructions (especially on jetsonhacks), but because I trained an FRCNN model, we needed to install:
- OpenCV 3.4.x
- py-faster-rcnn from github.com/rbgirshick/py-faster-rcnn
- Caffe from http://github.com/BVLC/caffe.git
Because I want to have the ability to run different frameworks such as TensorFlow, Caffe, or YOLO without taking care of libraries versions conflicts, I decided to create Docker containers and run the inference process there. In addition, using Docker containers makes updating the frameworks and libraries easier by enabling a docker push from the private or public repository.
I have created two sample container that you can use on your TX2 by running the following commands on the TX2:
$ docker pull ticlazau/aarch64-caffe-frcnn
$ docker pull ticlazau/aarch64-tensorflow
To verify the docker image, you need to issue the command â€śdocker imagesâ€ť and check the image name and size.
REPOSITORY TAG IMAGE ID CREATED SIZE
aarch64-caffe-frcnn latest 2ad0f96817f4 8 days ago 5.79 GB
For this container to work, you will need to specify the python path before running located in /root/project/py-faster-rcnn/caffe-fast-rcnn/python
In addition, you need to establish a policy around persistent data for the Docker container such as passing a trained model, obtaining inference logs, etc. In my case, in my home folder I created a folder called â€śprojectsâ€ť and then created subfolders with various projects name (trained models). Each subfolder contains the unzipped exported model:
- .json file
- caffemodel file
- classname file
- prototxt file (i.e. ~nvidia/project/DroneDetection)
To run the aarch64-caffe-frcnn container with GPU support in Jetson TX2, we need to specify the following Linux devices to Docker:
$ docker run -e PYTHONPATH=:/root/project/py-faster-rcnn/caffe-fast-rcnn/python -e LD_LIBRARY_PATH=:/usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu/tegra:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/root/opencv-3.4.1/build/lib --net=host -v /usr/lib/aarch64-linux-gnu:/usr/lib/aarch64-linux-gnu -v /usr/local/cuda/lib64:/usr/local/cuda/lib64 --device=/dev/nvhost-ctrl --device=/dev/nvhost-ctrl-gpu --device=/dev/nvhost-prof-gpu --device=/dev/nvmap --device=/dev/nvhost-gpu --device=/dev/nvhost-as-gpu --device=/dev/video0:/dev/video0 -v /home/nvidia/project/DroneDetection:/data -it aarch64-caffe-frcnn:latest bash
To run an inference in the container, we need a simple python script that reads the trained model and related object names from the json file and starts the inference process.
The class VisionDetection is responsible to parse the model information based on the user input arguments, run the inference on Caffe with or without GPU acceleration, and detect the objects in the image. (for example:
python aivision_tx2_objdetect.py â€“network_parameters)
To run single DRONE detection:
$ python tools/aivision_tx2_objdetect.py --model_file=/data/deploy/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.caffemodel --net_file=/data/deploy/test.prototxt --json_file=/data/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2-prop.json --label_file=/data/deploy/classname.txt /data/TEST-IMAGES/test3.jpg
To run multiple DRONE detection:
$ python tools/aivision_tx2_objdetect.py --model_file=/data/deploy/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2.caffemodel --net_file=/data/deploy/test.prototxt --json_file=/data/23a89bf9-e4f9-4444-8ecf-d26f6d2a3aa2-prop.json --label_file=/data/deploy/classname.txt /data/TEST-IMAGES/test1.jpg
Looking to the inference results, we can produce the coordinates in the image where the objects have been identified (rather than only an image output with boundary boxes). This can present the potential to introduce real-time object detection, tracking, and 3D positioning system for a multiple camera deployment in a specific area. In addition, these coordinates can coordinate robo arm movements for a high-powered laser pointer.
This can be implemented with Jetson TX2 camera or USB Camera with or without motion detection. In the opinion of the author for such cases implementing motion detection and use the inference process only when need to save power on Jetson TX2. This can be accomplished by adding a software element into gstreamer pipeline to detect when there is movement in the frames given by the camera.
If you set TX2 to max performance mode before doing the inference with:
$sudo nvpmodel -m 0 && sudo ~/jetson_clocks.sh
The performance gains in executing the inference will improve and more probability will benefit when using video streaming.
real 0m7.334s vs real 0m8.383s
user 0m5.612s vs user 0m5.812s
sys 0m1.888s vs sys 0m2.432s
As of today, the same process and technique can be applied for classification with IBM PowerAI Vision.
In conclusion, we have a way to accelerate the deep learning adoption in computer vision when we deploy on edge or fog computing. We can help companies to accelerate AI adoption in the robotic arms, microscopes, and other devices much faster than ever before.
Special thanks to: Carl Bender and Srinivas Chitiveli