Hello everyone, I’m Haruyuki Tago, Edge Evangelist at HACARUS Tokyo R&D center.
In this series of articles, I will share some insights from my decades of experience in the semiconductor industry and I will comment on various AI industry related topics from my unique perspective.
In 2019, HACARUS released a tech blog titled “HACARUS Tech Blog: Sparse Modeling with SPRESENSE” , where the author gave a brief overview of Sony’s SPRESENSE edge device. The article described a few potential applications including noise removal under SPRESENSE’s Arduino development environment.
Today I will explain the findings from an experiment with LASSO, performed using sparse modeling methods on the SPRESENSE.
SPRESENSE’s Development Environment
Starting with a basic overview of this device, Sony itself refers to the SPRESENSE as an IoT single-board computer . The SPRESENSE comes equipped with two main chips, the CXD5602, which has six Arm Cortex-M4F cores and various I/Os. The second is the CXD5247, which is responsible for power management and audio functions (Figure 1).
Focusing on the CXD5602, its inclusion of the six Arm Cortex-M4F cores enables powerful computing performance including floating-point arithmetic, power control of its cores, and support for an inter-core communication framework.
Moving on to the semiconductor manufacturing technology, 28nm FD-SOI (Fully Depleted – Silicon On Insulator) is used, allowing its transistors to switch on and off more sharply than conventional silicon transistors. This difference enables lower supply voltages which leads to lower power consumption.
Before viewing this experiment, I knew that this semiconductor technology was used in wristwatch ICs, where ultra-low power consumption is important. However, this is my first time hearing about its use in high-performance MCU chips, such as the Cortex-M4F. It is definitely a cutting edge product from Sony that aims to deliver both low power consumption and powerful computing power .
The SPRESENSE is an extremely powerful IoT device, that comes in a compact package. I think that it is important to describe exactly how it was set up for its use in the following experiment.
For the development environment, we took a laptop running Windows 10 64bit and installed Arduino 1.8.13 . Looking at the Arduino screen in Figure 2, the C++ program (left) is compiled under the Arduino environment, and the complied binary is transferred to SPRESENSE, and then SPRESENSE is booted automatically.
The input and output of the program is done through the serial terminal of the Arduino (right), and the execution time of the LASSO ADMM program is less than one second, making it short enough to be practical.
Looking at the Boston Housing Dataset
Now that we have reviewed the basic system specifications, We can move on to try out an edge AI application for SPRESENSE. In the experiment, we used a housing price dataset for the city of Boston, USA, associating 13 parameters are thought to affect prices . The dataset includes 506 cases. Taking a glance at Figure 3, we can see several of the datasets where the objective variable Y represents the house price (MEDV) and X expresses the explanatory variables of the 13 previously mentioned parameters labeled X(1) through X(13).
Observing the results in Figure 4, we can dig deeper into the meanings of the explanatory and objective variables. One example would be X(1), where the CRIM represents the number of crimes per capita by town.
The Housing Model
To solve the problem presented, first a predictive model for Y had to be established. Below is the multiple regression equation where β(1) through β(13) are the regression coefficients.
Y = β(1) x X(1) + β(2) x X(2) ・・・+ β(13) x X(13)
When using this formula, it was possible to find the beta value that best fits the 506 data sets. To determine the measure of goodness of fit, the Sum of Squared Errors of prediction (SSE) method was used and the results are shown in Figure 5.
Next, using the well-known statistical technique known as multiple regression analysis (least squares), we can compute beta (1) through beta(13). Utilizing this method, it is easy to find the best fitting solution using the 13 explanatory variables above, but are they all necessary? Of the 13 variables, which are essential and which, if any, are irrelevant for the prediction. A method to answer this question is LASSO, a sparse modeling technique.
The Results from LASSO and their Interpretation
LASSO (Least Absolute Shrinkage and Selection Operator) is a method that has been mentioned throughout this article as well as previous volumes of this series. In this volume, we will only focus on the analysis and interpretation related to the Boston housing data, but if you would like to learn more about this sparse modeling method, please look at references ,,,, and .
An important parameter of LASSO is the regulation parameter, lambda (λ), which increases or decreases the number of β(*) (where * is the number of parameters estimated to be zero) . λ, which acts as the x-axis in Figures 6, 7, and 8, covers a large range of values, so these values were normalized by applying a natural logarithmic function.
The LASSO results (Solution Path) for different ln(λ) values are shown in Figure 6, where the 13 curves correspond to each β(x) parameter. Looking at cases where ln(λ) is relatively small (ex. -5.0), the results are similar to the least squares solution and all 13 beta parameters are non-zero values.
Another important detail that is shown in the Solution Path is the relationship between the value of ln(λ) and β(*), the number of beta values equal to zero. From Figure 7, let’s start with the case of ln(λ)=-5.0, where the value of β(*)=0. This once again shows that all 13 of the original parameters are non-zero values. As the value of ln(λ) increases, however, the number of parameters equalling zero steadily increases. Using ln(λ)=1.0 as an example, the number of parameters equalling zero increases to seven.
Moving on to Figure 8, let’s next look at the relationship between ln(λ) and the fitting error represented by the Squared Errors prediction (SSE). Starting from the point where ln(λ)=-6.0, the SSE value is 24.2. This number remains relatively low and steady until ln(λ) reaches a value of 0.5. Here, β(*)=5 and the SSE value increases to 30.5. By slightly raising ln(λ) to 1.0, β(*) increases to seven and the SSE value jumps to 33.7, which is 39.3% higher than the original SSE value.
From this point, further increasing the value of ln(λ) causes the SSE value to rapidly increase. As the number of explanatory variables equalling zero increases, the fit of the prediction model becomes worse. It can be understood that the fit of the prediction model became worse because the explanatory variables were reduced too much.
This relationship shows that there is a trade off between model accuracy and the number of explanatory variables observed. Of course it depends on the purpose, but personally, I think that the area where ln(λ) is between 0.5 and 1.0 is reasonable for practical use. In this area (green shaded region), the number of explanatory variables is reduced from 13 to 5, 6 or 7, while the SSE value is only 26~39.3% higher than when all 13 explanatory variables are used.
So far, we have looked at the relationship between the SSE prediction and the LASSO program. Now, let’s take this information and connect it back to the original housing problem. Looking at the three most influential explanatory variables, we can observe the following values in terms of their absolute magnitude:
- β(6)=RM=2.57 – Average number of rooms per dwelling
- β(13)=LSTAT=-0.593 – % of the population engages in low salary occupations
- β(2)=ZN=0.0629 – % of residential parcels larger than 25,000 square feet
These results show that the results are consistent with the idea that housing prices are higher in areas with more rooms, areas with higher average income (Note a negative coefficient for LSTAT), and larger houses. Figure 9 shows a final comparison between LASSO and multiple regression analysis.
To wrap things up, the Sony SPRESENSE is an attractive single-board computer that aims to achieve both high computing performance and low power consumption by using chips that employ 28nm FD-SOI semiconductor technology and an efficient architecture design such as the Cortex-M4F.
In this article, we observed and analyzed the Boston housing data using LASSO ADMM, a sparse modeling software from HACARUS that runs on the SPRESENSE. To create a prediction model for housing prices, we adopted a multiple regression equation with 13 explanatory variables. When running the program, we found the execution time to be less than one second, which was sufficient for practical purposes.
A flexible solution was obtained by controlling the value of the LASSO regularization parameter, λ, as well as the trade-off between the number of regression coefficients (estimated to be 0 of 13) and “the sum of squared residuals between predicted and observed values.”
When looking at a case where ln(λ)=1.0, six explanatory variables that had a large impact on the prediction model were selected. It is also worth noting that ignoring the remaining seven variables did not have a significant effect on the model fit. In the end, a sparse model was obtained by selecting 6 explanatory variables from the original 13.
 ソニー Spresense 製品情報
 Spresense Arduino スタートガイド
 GitHub hacarus/spresense-examples
 The Boston Housing Dataset
 染田 貴志，機械学習プロジェクトにおける課題と，スパースモデリングに期待が高まる背景
 木虎 直樹，スパースモデリングはなぜ生まれたか？ 代表的なアルゴリズム「LASSO」の登場
 増井隆治，スパースモデリングとは｜仕組み・強み・ディープラーニング との違いを解説
 川野秀一・松井英俊・廣瀨慧，スパース推定法による統計モデリング，統計学One Point 共立出版，ISBN978-4-320-11257-5