In this article, Takahiro Suzuki, a data scientist at HACARUS explains the convex optimization problem covering all the basic terms you need to understand it.

## 0. Introduction

My name is Takahiro Suzuki, data scientist in the HACARUS Tohoku team.

As mentioned in a previous blog,  HACARUS has started study sessions to learn more about sparse modeling as part of our everyday business. The Sparse Modeling Study Sessions are contributing to the promotion of mutual knowledge exchange among HACARUS members – in this blog series we are sharing some of these insights with outside audiences – enjoy!

This article covers the second Sparse Modeling Study Session, focused on LASSO solutions. The second study session will be introduced through a series of posts on the LASSO solution method. In this post, I will introduce the convex optimization problem, which is an important concept in finding a LASSO solution. Let’s take a look!

A convex optimization problem is a problem such as the following.$\mathrm{minimize}_{\mathbf{x} \ \in \ \mathbb{R}^{n}} \ f(\mathbf{x}) \ \mathrm{subject} \ \mathrm{to} \ \mathbf{x} \in C$

Here, $$f$$ and $$C$$ in the equation can’t be arbitrary, and are subject to mathematical conditions. Such mathematical conditions include, for example, “$$f$$ is a convex function” and “$$C$$ is a closed convex set”, but I anticipate these terms are not familiar to those who are new to the subject. Therefore, in this post, I will first explain the definition of mathematical terms such as “convex function” and “convex set”, and then I will define the convex optimization problem.

## 1.Convex Set

First, I will introduce the concept of a convex set. It’s a technical method, and I’ll  start by defining the terms, followed by examples. The following topics will follow essentially the same process, but specific examples may be skipped. Additional information may also be added.

### Definition

If the set $$\mathit{C} (\subset \mathbb{R}^{n})$$  is a set for any $${\bf x} , {\bf y} \in \mathit{C}$$ and any $$\rm{\lambda} \in [0, 1]$$
$\bf{\lambda} \bf{x} + \rm{(1 – \lambda)} \bf{y} \in \mathit{C} \tag{1} \label{eq:conv-set}$ $$\mathit{C}$$ is said to be a convex set when the equation above stands.

### Examples

The defining equation of the convex set is equation (1), but is is difficult to understand simply looking at the equation. Let’s take a look the convex set by considering the example of $$n=2$$.

Figure 1. Specific examples of convex and non-convex sets.

In the figure above, the set represented by the left circle is a convex set, and the set represented by the fan shape on the right is not a convex set (Such a set is called a non-convex set.) The difference between a convex and non-convex set is shown in the red part of the figure. That is, all points on the line segment connecting points $$\mathbf{x}, \mathbf{y}$$ on the set are called “convex” when all the points on the connecting line segment are in the same set. There are points, however in the part shown in red that are not included in the set. This is a non-convex set. In the above picture, the set of shapes satisfies the conditional when we choose $$\mathbf{x}, \mathbf{y}$$ represented by black, but not in the case of the set of points on the line segment represented by red. If you have this image, you can see whether the set represented by the sphere is convex or not.

## 2. Indicator Functions

There are often situations in which we want to express whether $$\mathbf{x} \in \mathbb{R}^{n}$$ belongs to a set $$C$$ in a numeric way. For example, you could give the value of 1 if x belongs to $$C$$, and 0 if it doesn’t, and so on. In fact, this is a common definition of the indicator functions, but in the context of mathematical optimization, it is possible to define it with $$0, \infty$$.

### Definition

For $$C \subset \mathbb{R}^{n}$$,  the function defined as $$I_{C} : \mathbb{R}^{n} \to \mathbb{R} \cup \{ \infty \}$$ is called an indicator function. (At least we will do so here)
$I_{C}(\mathbf{x}) =\left\{\begin{array}{ll} 0 & ( \, \mathbf{x} \in C) \\ \infty & ( \, \mathbf{x} \notin C) \\ \end{array} \right. \tag{2} \label{eq:2}$

As mentioned above, a general indicator function is defined by 0,1, but why does mathematical optimization use $$0, \infty$$ ? The primary reason for this is that by using a directive function defined by $$0, \infty$$,  we can rewrite optimization problems with constraints, to an optimization problem without – while still maintaining equivalence. This is shown below.

$\mathrm{minimize} _{\mathbf{x} \in \mathbb{R}^{n}} f(\mathbf{x}) \ \mathrm{s.t.}\ \mathbf{x} \in C \, \Longleftrightarrow \, \mathrm{minimize}_{\mathbf{x} \in \mathbb{R}^{n}}\left\{ f(\mathbf{x}) + I_{C}(\mathbf{x})\right\} \\$

In the optimization problem above, the constraint $$\mathbf{x} \in C$$ has been eliminated using an indicator function. You can understand that the problems are equivalent; as $$\mathbf{x} \notin C$$,  and the problem is a minimization problem.

## 3. Effective Domain

### Definition

For the function $$f : \mathbb{R}^{n} \to \mathbb{R} \cup \{ \infty \}$$ as defined by the following formula is known as the effective domain of the function f.

$\mathrm{dom}(f) = \{\mathbf{x} \in \mathbb{R}^{n} \mid f(\mathbf{x}) < \infty\} \tag{3} \label{eq:3}$

### Example

From definition equation (3), we see that the effective domain is the set of inputs for which the function $$f$$ takes a finite value. Saying $$n=2, C=\{(x_{1}, x_{2}) | {x_{1}}^{2} + {x_{2}}^{2} \le 1\}$$ is given, which is represented by equation (2) , let us consider the effective domain of the indicator function $$I_{C}$$ as shown in the following figure. Since the indicator function can only take two values, plotting the values in the two-dimensional graph looks like the following figure. The $$I_{C}$$ takes a finite value in the blue part of the figure (inside the unit circle), so the effective definition of the indicator function $$I_{C}$$ is:

$\mathrm{dom}(I_{C}) = \{(x_{1}, x_{2}) | {x_{1}}^{2} + {x_{2}}^{2} \le 1\} = C$

Figure 2: Effective Definition Area of the indicator function $$I_{C}$$

## 4. Proper

There is a term associated with the effective domain defined in chapter 3. The definition is simple.

### Definition

If the effective domain $$\mathrm{dom}(f)$$ of a function $$f : \mathbb{R}^{n} \to \mathbb{R} \cup \{ \infty \}$$ is non-empty, then the function $$f$$ is a proper function.

### Example

The indicator function $$I_{C}$$, introduced in the example in chapter 3 is a proper function because its effective domain is not an empty set.

## 5. Convex Functions

This chapter will introduce you to convex functions. As you can see from the definition, it is very similar to that of a convex set.

### Definition

When proper function $$f : \mathbb{R}^{n} \to \mathbb{R} \cup \{ \infty \}$$ for any $${\bf x} , {\bf y} \in \mathrm{dom}(f)$$ and any $$\rm{\lambda} \in [0, 1]$$ is true, $$f$$ is said to be a convex function.

$f(\lambda {\bf x} + (1 – \lambda) {\bf y}) \le \lambda f({\bf x}) + (1 – \lambda)f({\bf y}) \tag{4}$

### Example

The parabola $$f(x) = x^{2}$$ is a convex function (we’ll prove it in Chapter 7). Let’s take this parabola as an example and try to understand convex functions from the diagram.

Figure 3. Example of a convex function.

Consider $$s , t \in \mathrm{dom}(f)$$ as fixed. For example, let’s assume that the positional relationship here is as shown in the figure at points A and B. Since the parabola is a convex function, conditional expression (4) holds for $$s , t$$.

What does conditional expression (4) mean in the figure? Let’s consider the left and right sides of the conditional expression separately.

First, as for the right-hand side, since $$f(s)$$ is the y-coordinate of point A and $$f(t)$$ is the y-coordinate of point B, we may assume that $$\lambda f(s) + (1-\lambda)f(t)$$ is the point at which the line AB is internally divided by $$\lambda \ : (1-\lambda)$$ (let D be this point). It can be seen that this corresponds to the coordinates.

Next, on the left side, $$\lambda s + (1-\lambda)t$$ is the x-coordinate of point D. Thus, $$f(\lambda s+(1-\lambda)t)$$ is a point on the parabola whose x-coordinate is the same as that of point D (this point is defined as C. The y-coordinate of (point C) is the same as the y-coordinate of (point D).

From these points, we can understand that conditional expression (4) expresses (the y-coordinate of point C) ≤ (the y-coordinate of point D) in the above figure.

Note that $$\lambda$$ can move freely in the range of $$[0,1]$$. That is, any point on the line AB can be point D, and the point with the x-coordinate that is the same as the point D,  is also point C.

To sum up, we can see that for a fixed $$s , t \in \mathrm{dom}(f)$$, conditional expression(4) holds, meaning that the line AB is always above the function f. In fact, since $$s , t$$ is an arbitrary value, the line AB also moves around.

In any case, I hope you can imagine that a function in which the line segment is above the function is a convex function.

## 6. Closed Sets

Next, we will look into the definition of a closed set. Closed sets are strictly defined for spatial distance, but in the context of this blog, the definition in euclid space is sufficient , so we will keep it there.

### Definition

When a subset $$U$$ of a Euclid space $$\mathbb{R}^{n}$$ is an open set, for any point $$p \in U$$, a certain $$\epsilon > 0$$ exists and $$\{ x \in \mathbb{R}^{n} | ||x-p||_{2} < \epsilon \} \subset U$$. A closed set of $$U \subset \mathbb{R}^{n}$$ means that the complementary set of $$U$$ is an open set.

### Example

It sounds difficult in theory, but it’s easy to understand with an example. For example, $$[-1, 1]$$ is a closed set, $$(-1, 1)$$ is an open set, and $$[-1, 1)$$ neither a closed nor an open set.

## 7.  The Convex Optimization Problem

Now that we’ve covered the mathematical terms, let’s finally introduce the convex optimization problem.

### Definition

If the function$$f : \mathbb{R}^{n} \to \mathbb{R} \cup \{ \infty \}$$ is a proper convex function and the set $$C \subset \mathbb{R}^{n}$$ is a closed convex set; the problem of finding the x that minimizes $$\mathbf{x} \in C$$ among $$\mathbf{x}$$ is called
the convex optimization problem, and it is applicable to general optimization problems as well. The convex optimization problem can be described in one of the following ways

$\mathrm{minimize}_{\mathbf{x} \ \in \ \mathbb{R}^{n}} \ f(\mathbf{x}) \ \mathrm{subject} \ \mathrm{to} \ \mathbf{x} \in C$ $\mathrm{min}_{\mathbf{x} \ \in \ \mathbb{R}^{n}} \ f(\mathbf{x}) \ \mathrm{s.t.} \ \mathbf{x} \in C$

### Example

When $$n=1, f(x) = x^{2}, C = \mathbb{R}$$, let’s consider an optimization problem to find the minimum value of the function $$f$$. This can be expressed as a formula:

$\mathrm{minimize}_{x \ \in \ \mathbb{R}} \ x^{2} ( \mathrm{subject} \ \mathrm{to} \ {x} \in\mathbb{R}) \tag{5}$

In this case, you may not have to write down beyond “subject to”, since it is obvious. We will check whether this optimization problem is a convex optimization problem or not. First, function $$f$$ is, for any $$s , t \in \mathrm{dom}(f)$$ and any $$\mathrm{\lambda} \in [0, 1]$$.

\begin{align*}
\lambda f(s) + (1-\lambda)f(t) – f(\lambda s+(1-\lambda)t)
&= \lambda s^{2} + (1-\lambda) t^{2} – (\lambda s+(1-\lambda)t)^{2}\\
&= \lambda s^{2} + (1-\lambda) t^{2} – (\lambda^{2} s^{2} + 2 \lambda (1-\lambda) st + (1-\lambda)^{2} t^{2})\\
&= \lambda (1-\lambda)(s^{2} -2st + t^{2})\\
&= \lambda (1-\lambda)(s – t)^{2}\\
&\ge 0
\end{align*}

and so we know that it is a convex function. By following a similar procedure, we can show that $$C$$ is a closed convex set. Therefore, we know that the optimization problem (3) is a convex optimization problem.

As we can see from the example above, to determine whether an optimization problem is a convex optimization problem is generally complicated. Is there any significance or meaning showing that it is a convex optimization problem, even considering all this trouble?

The reason for this is because convex optimization problems have a significant specialty.

## 8. An Important Factor of the Convex Optimization Problem

### Factor

An arbitrary local optimal solution is a global optimal solution and the entire optimal solution is a convex set.

The reason why this nature of the convex optimization problem is important is that it is generally difficult to find a global optimal solution.

As the term “optimization” problem suggests, the desired solution is the one that minimizes the objective function in the feasible domain: the global optimal solution (here we restrict ourselves to minimization because the maximization problem is basically equivalent to the minimization problem). However, it is difficult to solve global optimal solutions, and in comparison, it is easier to find local optimal solutions. Therefore, compromises are often made to local optimization. Furthermore, as soon as the optimization problem turns out to be a convex optimization problem, the local optimal solution, which was originally a “compromise”, is transformed into an “optimal” approach.

## 9. Summary

I’m going to give a brief summary of what we’ve covered so far.

In this post, we discussed the convex optimization problem, which is important in finding a solution to LASSO; and defined the key concepts and introduced them. It’s a long way to fully understand the convex optimization problem, especially since convex sets and convex functions are not two words we hear everyday. I am happy that we can now share a mutual image.

Also, as you can infer from the word “convex” appearing in key places, when you think about optimization, the debate concerning convexity is very important. If you are interested I recommend you look into convexity further.

## 10. In Conclusion

It’s been a long article, but thank you very much for reading to the end.

In this post, I introduced the mathematical preparation part of solving LASSO, but in my next post, I will introduce an approach to LASSO based on the properties of convex optimization. I will be writing the next post too, so if you are interested in this topic, please take a look!