
In this article, Daisuke Takahashi, a data scientist at HACARUS explains the process of Human Part Segmentation using a high performance edge device; the Jetson Nano Developer Kit.
What is the Jetson Nano Developer Kit?
The Jetson Nano is a purpose built edge device by NVIDIA that allows for GPU based operations in a light and cost effective package. Much like a standard computer it allows you to run various applications. For this use case we installed a Linux OS on a microSD. Since the Jetson Nano is equipped with not only a CPU but also a GPU, it can perform deep learning inference at high speed.
We at Hacarus have written a blog on the topic of Sparse Modeling with the Jetson Nano Developer Kit, so please check it out if you are keen to learn more.
What is Human Part Segmentation?
Human Part Segmentation is a technology that classifies a person’s head, arms, legs, etc. from images at a pixel level. The image on the right below is the result of inferring the image on the left.
In the image on the right, which is the result of the inference, each body part is painted in a different color.
Overview of the process
We aimed to infer Human Part Segmentation at the highest speed possible with the Jetson Nano. To that goal, we conducted research concerning lightweight and high-speed models, and then moved on to model learning and inference.
Research and Selection of Lightweight and High-speed Models
There are not many papers on lightweight and high-speed Human Part Segmentation, and even if you look at the leaderboards on Human Part Segmentation on “Papers with Code”, a site that compiles papers that include usable code, there are only 7 published.
On the other hand, it is relatively easy to find lightweight and high-speed models for Semantic Segmentation. In fact, when you search “Papers with Code” for a paper concerning lightweight and fast semantic segmentation models, you can find nearly 40 papers. This is understood to be due to research that is actively conducted in light of the high demand for high-speed segmentation technology for autonomous driving ([1]). Therefore, among the real-time semantic segmentations from “Papers with Code”, we decided to use what is called HarDNet ([2]), which is the fastest and most accurate.
HarDNet is a network based on DenseNet, designed to keep access to DRAM as low as possible and to maintain accuracy.
GPU(ms) | Mobile GPU(ms) | Top1
Acc |
|
HarDNet 39DS | 17.8 | 32.5 | 72.1 |
MobileNetV2 | 23.7 | 37.9 | 72.0 |
HarDNet 68DS | 31.7 | 52.6 | 74.3 |
MNetV2 1.4x | 33.0 | 57.8 | 74.7 |
Reference:HarDNet: A Low Memory Traffic Network
The table above shows the classification accuracy and inference speed in ImageNets such as HarDNet and MobileNet. The Mobile GPU (ms) column shows the Jetson Nano’s inference speed. You can see that HarDNet is faster while still maintaining accuracy.
When using HarDNet, we referred to the following Pytorch implementation: https://github.com/PingoLH/FC HarDNet
Learning the HarDNet based on Multi Human Parsing v2.0
Models that have already undergone learning are also posted in the GitHub repository mentioned above. However, the dataset used for learning was not for Human Part Segmentation. Therefore, we decided to conduct the learning again with a dataset called Multi Human Parsing v2.0.
Multi Human Parsin v2.0 contains an image of a human and an image with each part of the body masked. By using masked images as training data, segmentation learning can be done.
Utilizing the learned HarDNet to perform Human Part Segmentation
We utilised the learned model to perform an inference test for the data set included in Multi Human Parsing v2.0. The results are shown below:
You can see that the hands, legs, face, neck, etc. are correctly segmented. Learning with HarDNet has gone well.
Furthermore, regarding the inference speed, when using the Jetson Nano Development Kit for inference, we were able to achieve around 20 fps for an image size of 141 * 282.
Summary
Using a neural network called HarDNet, we inferred Human Part Segmentation with the Jetson Nano Development Kit. We were able to confirm that even in the case of segmentation, which is a heavier process than image classification, near-real-time inference is possible.
References
[1] Speeding up semantic segmentation for autonomous driving.(2016)
[2] HarDNet: A Low Memory Traffic Network(2019)