What is Dictionary Learning?
In dictionary learning, patches of signals such as images and waveforms are represented as a weighted linear sum of dictionaries (clusters of atoms) (Figure 1). This allows the extraction of essential signals and a variety of applications are possible as listed below:
・Noise suppression (Figure 1)
・Data Deficiency Completion
In addition to a wide range of applications, one appealing feature is the ability to learn and predict using images of different sizes and waveforms by cropping and aligning the input data for each patch.
Figure 1: is an example of dictionary learning. This is based on an example of “image denoising using dictionary learning” introduced by scikit-learn, a typical machine learning module in Python. The image of the raccoon was obtained from PIXINIO and edited.
Mathematically, dictionary learning is a form of matrix decomposition, as is principal component analysis. (Figure 2). For this reason, in scikit-learn, dictionary learning along with principal component analysis is included in the submodule sklearn.decomposition.
Figure 2. shows the data preprocessing and learning flow in dictionary learning. Each patch of the images you want to train can be vectorized and coalesced into a matrix so that they can be subjected to matrix decomposition. In practice, the patches are cut out by shifting them little by little to avoid the loss of reproducibility that comes from the patch boundaries. The eye image was cropped from “The Olivetti faces dataset”, provided by AT&T Laboratories Cambridge.
Principal component analysis selects a number of bases that are less than or equal to – and orthogonal to – the data; so as to maximize the variance of the data.
On the other hand, dictionary learning selects bases (atoms) so that the data is maximally sparse, so that the bases do not have to be orthogonal, which can lead to an excessive number of bases (Tezuka 2014).
Without maximizing the sparsity of the data, there are countless solutions making it impossible to solve, such as when the number of unknowns is greater than the number of equations. As with other sparse modeling methods such as LASSO, sparse constraints contribute in many ways from the calculation to the interpretability and applicability of the solution.
It is also possible to use an existing dictionary, such as a wavelet basis. However, it is known that learning a dictionary from the target itself or from images close to the target can yield a sparser solution.
In real-life learning, we decompose the matrix by alternately fixing and updating the dictionary and the weight matrix in order to obtain a maximally sparse weight matrix. In Scikit-learn you can use ‘lars’ (Least Angle Regression) or ‘cd’ (Coordinate Descent) for matrix decomposition in the learning phase. In addition to ‘lars’ and ‘cd’, ‘omp’ (Orthogonal Matching Pursuit) and ‘Threshold Values’ can be used for matrix decomposition in the prediction phase. The major methods for updating dictionaries and weight matrices are ‘Method of Optimal Direction’ (MOD) and ‘k-SVD’. In the case of scikit-learn, the former is used. If you want to use ‘k-SVD’ which is faster than ‘MOD’, try using the spm-image module, which we have developed and published on GitHub. The interface is compatible with scikit-learn, so scikit-learn users should be able to start using it right away.
We hope that this first overview has been a helpful insight into HACARUS study sessions – we are currently hiring checkout our open positions here.
「Taro Tezuka, 辞書学習によるビッグデータからのパターン発見」, CICSJ Bulletin, 2014, Book32, 4, p. 76-, Release Date 2014/12/08, Online ISSN 1347-2283, Print ISSN 0913-3747, https://doi.org/10.11546/cicsj.32.76, https://www.jstage.jst.go.jp/article/cicsj/32/4/32_76/_article/-char/ja