
Video: Smart sound analysis edge device analyzes model motor rotation sound
Hello everyone, this is Haruyuki Tago, Edge Evangelist at HACARUS’ Tokyo R&D center.
In this series of articles, I will share some insights from my decades of experience in the semiconductor industry and I will comment on various AI industry-related topics from my unique perspective.
In this blog, by using the EFR32xG24 Dev Kit Board as an edge device, I will try to analyze the frequency of the model motor rotational sound, which is called the Smart Sound Analysis edge device, focusing on processing time and battery life.
1. Usage Scenarios
The Smart Sound Analysis Edge Device analyzes the sound produced by the motors or robots at a frequency to detect whether it is normal or abnormal, and sends it to the cloud through an IoT gateway.
Because the device is small (30.4mm x 51.0mm) including a microphone, it operates on a single coin battery, and does not require wiring work. This is an image of attaching a smart sound analysis edge device to a motor or robot, etc. in an existing facility (Figure 1).
When uploading raw sound data picked up by the microphone to the cloud, it requires a data transfer rate of 16 kB/s (sampling frequency 8 kHz, 2 bytes / sample condition).
On the other hand, if you perform sound frequency analysis on an edge device, calculate spectral data, and send it to the cloud, the data transfer rate will take only 320 bytes per second.
It is 50 times less. Of course, it depends on the nature of the target sound so it cannot be used for constantly fluctuating sounds such as music, but I think that there are cases where it can be used if it is an almost steady state sound such as motor rotation sound. In addition, if the smart sound analysis edge device compares the current spectrum with the normal sound spectrum and notifies the cloud only when abnormal sounds occur, the amount of data will be further dramatically reduced, and the burden of cloud processing will also be reduced.
Figure 1 Usage scene of smart sound analysis edge device
2. System Configuration and the Scope of this blog
Figure 2 shows the overview of the system of the Smart Sound Analysis edge device. The processing period per measurement (Processing duration in Figure 2) and the integrated current value consumed by the edge device are actually measured. The integrated current value is the same unit as the battery capacity, so it is easy to calculate the battery life.
Measure the integrated current value of the sound analysis unit (IS) and the integrated current value of the Bluetooth communication unit (IBT) independently. Estimate the battery life together with the measurement period determined by the user. Communication with the IoT gateway and cloud processing are excluded from the scope of this experiment.
Figure 2 System overview of smart sound analysis edge device
Figure 3 shows the processing contents during normal sound learning and testing. This experiment focuses on (1) processing period and integrated current value measurement of “Sound capture and frequency analysis” (L1 and T2 in Figure 3) and (2) measuring the integrated current value of Bluetooth communication(T6).
Figure 3 Edge device processing
3. Edge device used
Silicon Labs’ xG24-DK2601B EFR32xG24 Dev Kit [1] (hereinafter also referred to as MCU board) for edge devices.The MCU board approximate specifications are as follows.
Mounted MCU Chip: EFR32MG24B310F1536IM48 2.4GHz Wireless SoC [2]
32-bit ARM® Cortex-M33® core with maximum operating frequency 78 MHz, program memory: 1536 kB Flash, Data memory: 256 kB RAM, AI/ML Hardware accelerator, advanced security (Secure Vault™). Bluetooth maximum transmit output +10 dBm, with CR2032 holder. Works with one CR2032 Coin Battery [13] (capacity 225mAh). External batteries can be connected to the External Battery Connector, e.g. CR123A[14] (capacity 1150mAh) which will increase its operating time by about 5 times that of the CR2032.
On-board Sensor:
Temperature and relative humidity sensor (Si7021 RHT Sensor), Inertial sensor, Stereo microphones, Pressure sensor, ambient light sensor sensor), Hall-effect sensor, user LED and push button (Figures 4, 5)
The MCU board photo is shown in Figure 4, the demo application video of the on-board sensor is shown in Figure [3], and the screenshot is shown in Figure 5.
Left: Photo of Figure 4 EFR32xG24 Dev Kit, Right: Demonstration application of sensor mounted on Figure 5
4. Laboratory equipment
The program development of the Silicon Labs’ xG24-DK2601B development kit is basically performed by connecting to the host Windows PC from a USB Micro-B connector and installing it by using Silicon Labs’ Simplicity Studio [4] (free) installed on the Windows PC.
However, in this article, in order to measure the current consumption and power consumption of the MCU board, we will access the Windows PC via the company’s BR4001 motherboard.
We use a model motor as a sound source and change the motor rotation speed by inserting a resistor between the power supplies. The MCU board is set a few centimeters away from the model motor to pick up the motor sound. (Figure 6)
Figure 6 Experimental setup
5. Audio Feature Generator
It is the heart of frequency analysis, a function that captures sound from a microphone on the board and outputs the frequency analysis results [5]. Figure 7 shows the processing steps in the Audio Feature Library, where the first sound acquired from the microphone is subjected to FFT (Fast Fourier Transform) processing to obtain a frequency spectrum. The frequency [Hz] of a sound is the physical quantity corresponding to the pitch of the sound.
Here, we will devise a way to make it closer to human auditory characteristics. Human hearing has the property of being sensitive to frequency changes in low-frequency sounds and insensitive to frequency changes in high-frequency sounds. To simulate this, we apply a Mel Filterbank to the FFT output.[7][8]
As shown in Figure 7 at the bottom right it is a group of bandpass filters with a triangular passband, and the lower the frequency, the narrower the passband (the bottom of the triangle is shorter), and the higher the frequency, the wider the passband.
In speech recognition tasks, the mel filter bank seems to be widely used because it can obtain good results. Finally, as the amplitude is logarithmically compressed, the Log-scaled Mel filter bank output, which is also known as the mer spectrum. This seems to increase the volume dynamic range by making small sounds louder and louder sounds smaller.
Silicon Labs provides a demo program for speech recognition using the Audio Feature Generator [5] and TensorFlow Lite for microcontrollers [6]. For example, “Voice Control Light” [10] recognizes the utterance, “ON” and “OFF” and flashes the LED. I also tried it and it worked well.
As a familiar example of the Mel scale, the frequency of a piano keyboard[9] is shown in Figure 8; the ratio of frequencies of adjacent keyboards is constant at 1.509. On the other hand, the difference in frequency is not constant, and the higher the note, the greater the difference. The frequencies of adjacent keyboards are an isometric series, which sounds natural to the human ear.
Left: Figure 7 Audio Feature Generator Flow, Right: Figure 8 Piano keyboard frequency
6. Experiment 1: Audio Feature Generator
Figure 9 shows the Mel spectrum output of Audio Feature Generator received by a serial terminal on a Windows PC and graphically illustrated. The horizontal axis of the graph is the mel spectral component, and the vertical axis is the spectrum intensity. The Audio Feature Generator is continuously operating and repeats the output for each AFG cycle.
Figure 9 Audio Feature Generator のメルスペクトル出力
One Audio Feature Generator output consists of 32 spectral outputs. The horizontal axis in Figure 9 is the index value (0~31) of the spectral output, corresponding to the frequency components from 100 Hz to 4000 Hz. Figure 9 contains three outputs (AFG cycles), the vertical axis is the spectrum intensity. If you look at the spectrum, you can see that there is the strong fundamental frequency of motor rotation around 100 Hz, and harmonics are also included.
Figure 10 shows the current consumption waveform when the Audio Feature Generator function is repeated 10 times repeatedly in the experimental program. In order to average and normalize the frequency spectrum, I thought that it was necessary to measure about 10 times. It took 0.199s seconds per Audio Feature Generator and the 10 processing times were 1.99s.
The integrated current value per Audio Feature Generator obtained from the detailed current waveform accelerated to 10ms is 0.294mA*s, and 2.94mA*s for 10 Audio Feature Generator iterations.This is IS (Figure 2).
Figure 10 Current consumption when calling the Audio Feature Generator function 10 times
7. Experiment 2: Bluetooth communication
The program structure is briefly explained, as shown in Figure 11 “Bluetooth SoC Empty program” structure shows the C program language structure of the so-called Bare metal environment that does not use the embedded OS. Bluetooth SoC Empty”, which is the simplest program. In the upper left main(void), after initialization (Line No. 2~3), the system process and the user process are called, and then a loop (Line No. 4~ that goes to sleep to reduce power consumption. 8) is an event-driven program that does nothing in the user process (app_process_action()) (i.e. Empty).
In the actual application program, the measurement program for what you want to send via Bluetooth communication, such as spectrum analysis, vibration measurement, temperature measurement, etc., enters here. Events such as timer interrupts and external interrupts are defined in system processes. One of them is Bluetooth advertising behavior. This is an outbound action where the MCU board (peripheral in BT terminology) announces its presence to the IoT gateway (central in BT terminology) and makes a connection request. The Advertise Interval standard is set to 0.02s ~ 10.0s. For example, if you set the advertising interval to 1s, a timer interrupt will occur every 1s and the Bluetooth stack event handler will be called (Figure 11 center).
Figure 11 ”Bluetooth SoC Empty” program structure and EFR connect screen
Depending on the content of the event message, one action is selected and executed from among (1) Bluetooth initialization and start of advertising, (2) connection, and (3) disconnection. Silicon Labs ”EFR connect” [12] is used to confirm the operation of the SoC Empty program. This is a smartphone application that detects Bluetooth devices, displays information, connects, and disconnects. EFR connect replaces Figure 2 IoT gateway. A screenshot of EFR connect connecting with “SoC Empty” is shown on the right side of Figure 11. It can be confirmed that Bluetooth communication is performed normally.
Figure 12 Current Consumption of Bluetooth Communication (Advertisement Interval Set to 1 Second)
Next, measure the integrated current value of Bluetooth communication. Figure 12 shows the current consumption when the advertisement interval is set to 1 second. The current at the time of advertisement transmission obtained from the detailed current waveform with the time axis accelerated to 1.25ms is about 3.75ms width, about 15mA peak current, and 6.01mA average current (Figure 12). The integrated current value per advertisement is 0.00225mA*s. This is IBT (Figure 2). The current during the non-advertising period drops to about 5uA, which is about 1/1000.
So far, we have measured the integrated current value for sound analysis and the integrated current value for Bluetooth communication. That is, the processing time for one sound analysis (Audio Feature Generator 10 times) is 1.99s, and the integrated current value is 2.94mA*s. The integrated current value per Bluetooth communication advertisement is 0.00225mA*s. The integrated current value for sound analysis is dominant. Battery life can be estimated from these, the measurement interval, and the CR2032 battery capacity ( 225mAh ) (Figure 13). The sound analysis process takes 1.99s, so the measurement period should be longer than that. For example, the battery life was estimated to be 162 days with a measurement cycle of 1 minute, and 691 days (approximately 1 year and 11 months) with a measurement cycle of 5 minutes. I believe that we have obtained the prospect of realizing a smart sound analysis edge device that runs on a coin battery for over a year.
Figure 13 Smart Sound Analytics Edge Device Battery Life Estimate
8. Summary
1. A smart sound analysis edge device was prototyped for the purpose of sound acquisition, frequency analysis, and normal / abnormal sound discrimination with edge devices. Using Silicon Labs EFR32xG24 Dev Kit, it is small (30.4mm x 51.0mm) including a microphone, operates with a single coin battery, and does not require wiring work. It is ideal for using as a smart sound analysis edge device attached to the motors and robots of existing equipment. It is possible to dramatically reduce the amount of data transferred to the cloud and reduce the processing load on the cloud. For the frequency analysis, we used the mel spectrum, which is widely used in speech recognition.
2. We measured the processing time and integrated current value of sound analysis, and the integrated current value of Bluetooth communication. We estimated the battery life from these and the sound measurement period set by the user. The battery life was estimated to be about 162 days when the measurement interval was set to 1 minute, and about 691 days (1 year and 11 months) when the measurement interval was set to 5 minutes. We believe that we have obtained the prospect of realizing a smart sound analysis edge device that can operate for more than a year on a single coin battery.
9. References
[1] xG24-DK2601B EFR32xG24 Dev Kit
https://www.silabs.com/development-tools/wireless/efr32xg24-dev-kit?tab=overview
[2] EFR32MG24B310F1536IM48
https://www.silabs.com/wireless/zigbee/efr32mg24-series-2-socs/device.efr32mg24b310f1536im48?tab=specs
[3] Silicon Labs Thunderboard Sense 2 (SLTB004A) – App Demo | Symmetry Electronics
https://www.youtube.com/watch?v=doL8a6flXY4
Comment: It is a demo video for Thunderboard Sense 2, not for EFRxG24 Development Kit. The demo content, however, is almost the same between the two.
[4] Simplicity Studio Software
https://www.silabs.com/developers/simplicity-studio
[5] Audio Feature Generator
https://docs.silabs.com/gecko-platform/4.0/machine-learning/api/group-ml-audio-feature-generation
[6] TensorFlow Lite for Microcontrollers
https://docs.silabs.com/gecko-platform/latest/machine-learning/tensorflow/overview
[7] Understanding the Mel Spectrogram
https://medium.com/analytics-vidhya/understanding-the-mel-spectrogram-fca2afa2ce53
[8] Mel Filter Bank
https://siggigue.github.io/pyfilterbank/melbank.html
[9] Piano key frequencies
https://en.wikipedia.org/wiki/Piano_key_frequencies
[10] Sample Applications
https://docs.silabs.com/gecko-platform/4.0/machine-learning/tensorflow/sample-apps
[11] Edge AI Evangelist’s Thoughts Vol.15: Bluetooth Microcomputers – Part 2
https://hacarus.com/ai-lab/20211115-microcomputer-2/
[12] EFR Connect BLE Mobile App
https://www.silabs.com/developers/efr-connect-mobile-app
[13] CR2032 : Lithium Batteries
https://industrial.panasonic.com/ww/products/pt/lithium-batteries/models/CR2032
Nominal Capacity 225mAh, Diameter:20mm, Height 3.2mm
[14] CR123A : Lithium Batteries
https://industrial.panasonic.com/ww/products/pt/lithium-batteries/models/CR123A
Nominal Capacity 1550mAh, CR cylindrical(standard type), Diameter:17mm, Height:34.5mm