6.5 Least-Squares problems (最小二乘问题)

科技2022-07-14 145

本文为《Linear algebra and its applications》的读书笔记

Least-Squares problems

Inconsistent systems arise often in applications. When a solution is demanded and none exists, the best one can do is to find an

\boldsymbol x

that makes

A\boldsymbol x

as close as possible to

\boldsymbol b

.Think of

A\boldsymbol x

as an

a p p r o x i m a t i o n

\boldsymbol b

. The smaller the distance between

\boldsymbol b

and

A\boldsymbol x

, given by

\left\|\boldsymbol b - A\boldsymbol x\right\|

, the better the approximation.The general least-squares problem is to find an

\boldsymbol x

that makes

\left\|\boldsymbol b - A\boldsymbol x\right\|

as small as possible.

\left\|\boldsymbol b - A\boldsymbol x\right\|

is called the least-squares error of this approximation.

The adjective “least-squares” arises from the fact that $\left\|\boldsymbol b - A\boldsymbol x\right\|$ is the square root of a sum of squares.

Notice that

A\boldsymbol x

will necessarily be in

C o l A

. So we seek an

\boldsymbol x

that makes

A\boldsymbol x

the closest point in

C o l A

\boldsymbol b

. See Figure 1.

Solution of the General Least-Squares Problem

Apply the Best Approximation Theorem in Section 6.3 to the subspace $C o l A$ . Let $\hat \boldsymbol b=proj_{ColA}\boldsymbol b$

Because $\hat\boldsymbol b$ is in $C o l A$ , the equation $A\boldsymbol x =\hat\boldsymbol b$ is consistent, and there is an $\hat\boldsymbol x$ in $\mathbb R^n$ such that $A\hat\boldsymbol x =\hat\boldsymbol b\ \ \ \ \ \ \ \ \ \ (1)$

Since $\hat\boldsymbol b$ is the closest point in $C o l A$ to $\boldsymbol b$ , a vector $\hat\boldsymbol x$ is a least-squares solution of $A\boldsymbol x =\boldsymbol b$ if and only if $\hat\boldsymbol x$ satisfies (1). See Figure 2. [There are many solutions of (1) if the equation has free variables.]

Suppose $\hat\boldsymbol x$ satisfies $A\hat\boldsymbol x =\hat\boldsymbol b$ . By the Orthogonal Decomposition Theorem in Section 6.3, $\boldsymbol b -\hat\boldsymbol b$ is orthogonal to $C o l A$ , so $\boldsymbol b - A\hat\boldsymbol x$ is orthogonal to each column of $A$ . If $\boldsymbol a_j$ is any column of $A$ , then $\boldsymbol a_j \cdot (\boldsymbol b- A\hat\boldsymbol x)=0$ , and $\boldsymbol a^T_j(\boldsymbol b - A\hat\boldsymbol x)= 0$ . Since each $\boldsymbol a^T_j$ is a row of $A^T$ , $A^T (\boldsymbol b - A\hat\boldsymbol x)=\boldsymbol 0\ \ \ \ \ \ \ (2)$ (This equation also follows from Theorem 3 in Section 6.1.) Thus $\begin{aligned}A^T \boldsymbol b - A^TA\hat\boldsymbol x&=\boldsymbol 0\\ A^TA\hat\boldsymbol x&=A^T \boldsymbol b\end{aligned}$

These calculations show that each least-squares solution of $A\boldsymbol x =\boldsymbol b$ satisfies the equation

The equation represents a system of equations called the normal equations (法方程) for $A\boldsymbol x =\boldsymbol b$ . A solution of (3) is often denoted by $\hat\boldsymbol x$ .

The next theorem gives useful criteria for determining when there is only one leastsquares solution of $A\boldsymbol x =\boldsymbol b$ . (Of course, the orthogonal projection $\hat\boldsymbol b$ is always unique.)

Formula (4) for $\hat \boldsymbol x$ is useful mainly for theoretical purposes and for hand calculations when $A^TA$ is a $2\times 2$ invertible matrix.

PROOF If $A\boldsymbol x =\boldsymbol 0$ , then $A^TA\boldsymbol x =\boldsymbol 0$ $\therefore NulA\subseteq NulA^TA$ . If $A^TA\boldsymbol x =\boldsymbol 0$ , then $\boldsymbol x^TA^TA\boldsymbol x =(A\boldsymbol x)^TA\boldsymbol x=\boldsymbol 0$ $\therefore A\boldsymbol x=\boldsymbol 0$ $\therefore NulA^TA\subseteq NulA$ . $\therefore NulA= NulA^TA\\\therefore rankA^TA=n-dimNulA^TA=n-dimNulA=rankA$

So when $A$ has $n$ linearly independent columns, $rankA^TA=rankA=n$ , which means $A^TA$ is an invertible matrix.

Alternative Calculations of Least-Squares Solutions 最小二乘解的另一个计算

The next example shows how to find a least-squares solution of $A\boldsymbol x =\boldsymbol b$ when the columns of $A$ are orthogonal. Such matrices often appear in linear regression problems.

EXAMPLE 4 Find a least-squares solution of $A\boldsymbol x =\boldsymbol b$ for

SOLUTION Because the columns $\boldsymbol a_1$ and $\boldsymbol a_2$ of $A$ are orthogonal, the orthogonal projection of $\boldsymbol b$ onto $C o l A$ is given by

Now that $\hat \boldsymbol b$ is known, we can solve $A\hat\boldsymbol x=\hat\boldsymbol b$ . But this is trivial(简单), since we already know what weights to place on the columns of $A$ to produce $\hat\boldsymbol b$ . It is clear from (5) that

In some cases, the normal equations for a least-squares problem can be $i l l - c o n d i t i o n e d (病态的)$ ; that is, small errors in the calculations of the entries of $A^TA$ can sometimes cause relatively large errors in the solution $\hat \boldsymbol x$ . If the columns of $A$ are linearly independent, the least-squares solution can often be computed more reliably through a $Q R$ factorization of $A$ (described in Section 6.4).

Processed: 0.016, SQL: 8

6.5 Least-Squares problems (最小二乘问题)

目录

Least-Squares problems

Solution of the General Least-Squares Problem

Alternative Calculations of Least-Squares Solutions 最小二乘解的另一个计算