This is Haruyuki Tago, Edge Evangelist at HACARUS’ Tokyo R&D center.
In this series of articles, I will share some insights from my decades of experience in the semiconductor industry and comment on various AI industry-related topics from my unique perspective.
In today’s volume, I will explain a little about sparse modeling and deep learning methods and compare their carbon footprints. Recently, SDGs have become a hot topic in today’s business environment, especially in the AI industry. When looking at carbon emissions, deep learning is at a disadvantage due to its high energy consumption .
In the 18th volume of this web-article series, I investigated the energy consumption of AI learning for a model that performed image defect detection . In that example, the energy consumption of sparse modeling was found to be about 1/100 when compared to deep learning.
In the latter half of this volume, I will focus on the carbon footprint of edge AI during inference. For these devices, AI is installed in terminals, including IoT devices and sensors, and the terminals perform the learning and inference. Based on the data collected at the terminal, the inference is processed and decisions are made instantly. This method is different from when you send data to the cloud and have AI located there to perform the processing.
In this experiment, I ensured that sparse modeling and deep learning were both tested under the same hardware and software environments.
When selecting a computing device for this experiment, I decided to go with the SENSPIDER from Macnica. The edge computing device is 150mm wide, 85 mm deep, and 100mm high. Once installed, it is so compact that you may forget it is there . The SENSPIDER is synonymous with a compact all-in-one monitoring system that includes an integrated sensor amplifier, data logger, computing terminal (PC, etc.), sensor power supply, and external communication interface. This 5-in-1 solution is easy to use, just connect it to the vibration sensor and you are ready to go .
Figure 1 shows the experimental flow using vibration data that was collected from a PC cooling fan. The objective of this task is to determine if the fan vibrations are “normal” or “abnormal”.
Moving on to the AI methods, sparse modeling (SpM) and autoencoder (AE) were used, where AE is a type of deep learning. The first step was to create an AI model for both SpM and AE using normal data (shown on the left side of Figure 2 labeled “training”).
The next step, commonly referred to as the learning or training stage, is performed by extracting features from the normal data. In this paper, we won’t focus on or go into detail about this step. Instead, we will move on to the inference step shown on the right side of Figure 2. This step, also called the prediction phase, is where the actual data is judged to determine the vibration status. The outputs of both the SpM and AE models are also compared to human inputs to determine the accuracy of the models. Looking again at the right side of Figure 2, we can see the detection accuracy and false alarm rate for both models.
As mentioned above, a major aspect of this experiment was to use identical hardware and software for both AI models using the same input data. The only differences were the AI models themselves. Although they had unoptomized parameters, I believe that this experiment still shows us a fairly accurate comparison of energy consumption.
Figure 3 shows one example of the vibration data collected from a PC cooling fan. This continuously sampled data is divided into 21.33ms segments (red box) and AI is applied to each segment.
Sparse Modeling Inference Flow
When it comes to sparse modeling, only normal data was analyzed and characteristic waveform fragments were extracted. Put together, these segments formed what is called a dictionary. This is shown in the “Directory Example” of Figure 3 and X0, X1, and X2 in Figure 4.
During the inference phase, the input data was analyzed and approximated using a combination of dictionaries in linear combination. Looking at Figure 4, an example of this reconstruction is shown as a0 x X0 + a1 x X1 + a2 x X2.
In a case where the input data is close to the trained “good” data, the combination of dictionaries produces an output that is close to the input data. This means that the squared error becomes small and the motor vibrations are deemed to be normal.
On the other hand, if abnormal input data is introduced, the output will be poorly approximated because it contains waveforms that are not in the dictionary. This difference creates a large squared error and the data is judged as abnormal.
Autoencoder Inference Flow
The autoencoder consists of three layers, which are the encoder layer, the hidden layer, and the decoder layer, as shown in figure 5. Input data is applied to the encoder layer while the hidden layer holds the characteristics of the normal data. The decoder layer produces the output data.
When training an autoencoder, the learning is unsupervised and the goal is to create an output that mimics the input data (Figure 5). Another interesting feature of this network is that the number of hidden layer nodes (NH) is smaller than the number of encoder layer nodes (NEnc) and encoder layer nodes (NDec).
For this process, simply copying from input to output is not possible. At this step, it appears that a network is formed that extracts only the important information needed for the restoration of the sample data back into the original state. While this does efficiently generate a matching output, it also eliminates the fluctuations among the normal data and superimposed noise.
Let’s now look at the inference phase. When the normal data is input (Figure 5, left), the average normal data is output, and the difference between the two (squared error) is calculated. For cases where this value is small, like on the left side of Figure 5, the data can be judged as normal. However, for cases where this value is large (Figure 5, right), the output is determined to be abnormal .
In this experiment, we looked at the inference runtime, memory usage, detection accuracy, and false alarm rate for the SpM and AE AI models . Starting with the runtime of each model per inference, the sparse modeling model was 3.1ms and 87.3 ms for the autoencoder model. In other words, sparse modeling was approximately 28 times faster than the autoencoder. Next, sparse modeling also came out ahead for detection accuracy with a rate of 95.9%, while the autoencoding model reached a rate of 90.0%. Finally, the false alarm rate was 8.2% for sparse modeling and 4.5% for the autoencoder (Figure 6).Digging deeper into these results, we will continue to reference Figure 6. Here, the sparse modeling runtime was shorter than that of the autoencoder. However, it is important to note that this is not always the case and it depends on the nature of the input data.
For this experiment, the input data shown in Figure 3 had a relatively large difference between normal and abnormal data. Even when using a small dictionary size for sparse modeling, the state of the vibration data can be determined fairly accurately with a short execution time.
However, this isn’t always true. for cases where the difference between normal and abnormal data is relatively small, the dictionary size must be increased accordingly. This also will increase the execution time of the AI model.
Another key point is that due to their design, deep learning models have higher expressive power than sparse modeling models. Due to this, if the application requires multiple states of data to be modeled by a single AI program, I believe that an autoencoder using deep learning may be more suitable.
To finish this article, I want to consider the ratio of energy consumed per inference. This also relates closely to the carbon footprint ratio as well. As shown in Figure 7, the energy consumption ratio is roughly equivalent to the inference execution time ratio.
- Two types of AI algorithms were implemented into the SENSPIDER edge computing device by Macnica. This device is an all-in-one edge terminal that integrates a sensor amplifier, data logger, computing terminal (PC, etc.), power supply, and external communication interface. This device comes in a compact housing that is 1500mm wide, 85 mm deep, and 100 mm high.
- The models were used to determine if vibration data was considered normal or abnormal. From the analysis, the detection accuracy was 95.9% for sparse modeling and 90.0% for the autoencoder. Additionally, the false alarm rate was 8.2% for sparse modeling and 4.5% for the autoencoder.
- Another test result we looked at was the per-inference execution time, which was 3.1 ms for sparse modeling and 87.3 ms for the autoencoder. This means that sparse modeling was approximately 28 times faster. The ratio of inference runtime roughly becomes the ratio of carbon footprint, so the carbon footprint of sparse modeling is about 1/28 that of the autoencoder. In a world where SDGs are becoming increasingly important, this is a positive of sparse modeling.
 MACNICA, メーカー様向け SENSPIDER組み込み開発PoCキット
 藤原健真，課題解決に効く，次世代AI活用術，クロスメディア・パブリッシング，2021年，ISBN 978-4-295-40611-2
 田胡，『半導体業界の第一人者，AI業界を行く！』 Vol.18: 少量データ型AI開発でカーボンフットプリントが100分の１に／深層学習型に比べ「脱炭素」実現
 マクニカ，インダストリアル IoT（IIoT）市場向け製品「SENSPIDER」を開発
 MACNICA スマートセンシングから始まるスマートメンテナンス
 MathWorks オートエンコーダ（自己符号化器）とは
 神奈川県DXプロジェクト推進事業成果報告会 動画
(Experiment results can be found around 8 minutes and 40 seconds)