Agmon 1954
Agmon 1954
Agmon 1954
LINEAR INEQUALITIES
S H M U E L AGMON
assuming, of course, that the above system is consistent. Sometimes one has,
in addition, to minimize a given linear form l(x). Thus, in linear programming
one obtains a problem of the latter type. To cite another example, this time
from analysis, the problem of finding the polynomial of best approximation of
degree less than n corresponding to a discrete function defined in N points is of
the latter type. In this paper we shall be dealing only with the simpler problem
of finding a solution of (1.1). Nevertheless, it is known (7, lectures IV and V)
that the more difficult problem of minimization can be reduced to a system of
inequalities involving no minimization by the duality (or minimax) principle.
(However, this will increase considerably the number of unknowns and
inequalities in the equivalent system.)
That the numerical problem of solving a system of inequalities is in general
no easy task could be inferred from the fact that even in the case of equations
the numerical solution is not easy, and that many ingenious methods were
devised (2) in the hope of obtaining at least an approximate solution in a
"reasonable" number of steps. The situation is much worse in the case of
inequalities. Of the existing methods one could mention the double description
method (3) and the simplex method due to Dantzig (1). The elimination
method (proposed already by Fourier) is ruled out in general due to the huge
number of elementary operations involved.
We propose to discuss here an iteration procedure of finding a solution of
(1.1). The idea of the algorithm involved was communicated to the author by
T. S. Motzkin. 1 This method, which uses orthogonal projection, will be seen
later to be intimately connected with the so-called relaxation method in the
case of equations (4; 5; 6), and it could be considered (after a suitable trans-
formation) to be the extension of this method to inequalities. Even in the case
of equations it seems to us that our results are not completely devoid of
interest, for we shall get a simple geometric proof for the convergence of the
Received May 27, 1953. The preparation of this paper was sponsored (in part) by the Office
of the Air Comptroller, U.S.A.F.
Ht was through valuable conversations which the author had with T. S. Motzkin that he
was led to consider the problems treated here.
382
RELAXATION FOR LINEAR INEQUALITIES 383
612. (The boundary of 12, denoted by 612, consists of those points of 12 which lie on at
least one of the hyper planes 71-*). Let ik (k = 1, . . . , s) be the sub-set of indices for
which U(y) = 0, and let fiy be the polyhedral cone defined by
(2.4) lih(x) > 0 (k = 1, . . . ,s).
Then x is exterior to tiy and y is also the nearest point to x on dtty.
Proof. Let us assume, on the contrary, that x is not exterior to £ly and
consequently is on the right side of all oriented hyperplanes irik. It follows from
this and from the fact that y is on the right side of all hyperplanes T t (i = 1, ..., m),
that any irt (i ^ ik) having x on its wrong side intersects the open interval xy
at a point yt. At least one such hyperplane exists since x is exterior to 12. Let y*
be the nearest yt to y. Then it is easy to see that y* £ 612. But this leads to the
contradiction: \x — y*\ < \x — y\, which establishes the first part of our
contention.
Let now y be any point on 612^ different from y. Obviously the whole seg-
ment yy is contained in 612y. Also, there exists a spherical neighborhood Ne in
En, around y, such that its points are on the right side of all hyperplanes wt
with i 9^ H- Thus, the segment which is the intersection of N€ and the seg-
ment yy is contained in 612. In particular there exists a point y' £ 612 which is
between y and y on the segment yy. We therefore have: \x — y'\ < a±\x — y\
+ a2\x — y\ with ai + a2 = 1, <x\ > 0, «2 > 0. But sincex — y'\ > \x — y
(y being the nearest point to x on 612) we conclude that x — y\ > \x — y\
This proves the second part of the lemma.
LEMMA 2.3. Let a polyhedral cone C be given by:
x("+1) is the orthogonal projection of x(l° on the farthest hyperplane 7r* with
respect to which it is on the wrong side, if x(v) is not a solution of (1.1). (If
this hyperplane is not unique one chooses one of the hyperplanes with respect
to which x(v) is on the wrong side and whose distance from x(v) is maximum.)
Numerically, if x(v) is not a solution we consider all indices i' for which
li>(x(v)) < 0, and among them pick an i0 for which — lv{x{v))/\ar\ has its
greatest value; at being the vector: (aUj . . . , ani). Then, x("+1) = x(l° + tau
where/ = -lio(x^)/\aio\2.
We shall establish now:
THEOREM 3. Let (1.1) be a consistent system of linear inequalities and let
{x(v)} be the sequence of iterates defined above. Then x{v) —» x where x is a solution
of (1.1). Furthermore, if R is the distance of x(0) from the nearest solution, we have
(3.1) \xiv) - x\ < 2Rd\ v = 0, 1 , . . . ,
where 0 < 6 < 1 depends only on the matrix (atJ).
Proof. We first claim that if y is a solution of (1.1) then x{v) approaches y
steadily, or, more precisely
(3.2) \x — y\ < \x — y\ — |x — x |.
(v) {v)
Indeed, (3.2) is trivial if x is a solution. If x is not a solution then (3.2)
follows from the refinement (2.3) of Lemma 2.1 with x = x(v\ xf = x(H_1) and
X = 1. (We use also the fact that y is a solution and hence is on the right side
Of 7Tj.)
Let us consider now the polyhedral cone Cix it defined by
n
(3.3) S &ikùxi > Q (k = 1, . . . , s),
3=1
where the minimum is taken over all possible choices of ii, . . . , is from
i = 1, . . . , m. Let
e = V"ï _=: 7,
and let us denote by y(v) the nearest point to x(v) on dft. We assert that
386 SHMUEL AGMON
(3.7) |x — 3/ | < \x — y \ — M \x — y \ ,
which establishes (3.5).
The remainder of the proof now follows easily. We first note that the iterates
x (0) , x (1) , . . . , x(v), . . . are all included in the hypersphere 5 ( 0 ) : \x — 3>(0)|
< \xi0) — 3^(0)| = R. This follows from (3.2). For the same reason x (1) , . . . , x{v\
. . . lie in the hypersphere 5 ( 1 ) : \x — 3>(1)| < |x (1) — y (1) |. But since ya) is the
nearest point to x(1) on dO, and on account of (3.5), we have
\xm-y™\< \xw-yw\<8R.
In the same way we get that x(v\ x ( " +1) , . . . are contained in the hypersphere
S(v) : \x — y(v)\ < \x(v) — y(v)\ < 6V R. It is now evident, since we have a
sequence of hyperspheres with non-zero intersection and whose radii tend to
zero, that oo
n s" = x.
Thus we get lim x(v) = x — lim y{v) which proves the convergence of x(v) to a
solution of (1.1). Moreover, since both x(p) and x belong to the hypersphere S(v)
whose diameter does not exceed 2Rdv, (3.1) follows, and the proof is complete.
In the above theorem we have established that the rapidity of convergence
of the iterates to the solution is at least linear. However, the positive constant
/x appearing in the definition of 6 was obtained from Lemma 2.3 where we had
only an existence statement. More elaborate considerations can give a lower
bound to n in terms of the matrix (a^). Let C = (ctj) (i,j= 1, . . . , r) be a
rectangular matrix. We shall denote by \C\ the determinant of C, and by T(C)
the expression
r(c)=
(3.8) [s(s | c ^yT
where the C*/s are the cofactors of the elements ctj. With this notation the
following result may be established :
RELAXATION FOR LINEAR INEQUALITIES 387
(3.9) £ 4 = 1,
and if r is the rank of the matrix ( a 0 ) , then
is the associated quantity (3.8), while the minimum is to be taken over all different
combinations of the i's which correspond to linearly independent rows.
We omit the somewhat lengthy proof of this theorem. We remark only
that in its proof we make use of the invariance under orthogonal transforma-
tions of the numerator and denominator in (3.10), and use induction with
respect to n.
where %(v) is the projection of x(v) on the farthest hyperplane with respect to
which it is on the wrong side, and where 0 < |/?| < 1. We say that we over-
project, or underproject, according as /3 > 0 or 0 < 0.
388 SHMUEL AGMON
In connection with the last procedure we remark that it is very plausible that
overprojecting with a small positive constant /3 will accelerate the convergence
of {x (, °}. Indeed, slowness in the convergence of the method of orthogonal
projection as described in §3 may arise if some of the "solid angles" of the poly-
tope Q are very small. Overprojecting has the effect of opening the angles and
this may have the effect of accelerating the convergence.
(iii) Systematic projection. In this procedure the m hyperplanes 7r* : lt(x) = 0,
are arranged in a periodic infinite sequence -KV (V = 1 , 2 , . . . ) where 7r„ = -KI
and lv(x) = li(x) if v = i (mod m). The sequence of iterates x{v) (v = 0, 1, . . .)
is defined then in the following way: x (0) is arbitrary; x (l/+1) = x(l° if x(v) is on
the right side of irv+i (i.e., if lv+i(x{v)) > 0) while if x{v) is on the wrong side of
7T„4-i, then x(v+1) is the orthogonal projection of x(v) on TTV+I.
or in matrix notation
xA* + b > 0.
Obviously if (5.1) is consistent the same will be true for the system (5.6). Let
us define the sequence {x(v)} by:
(5.7) x(v) = y(v)A, v = 0, 1, . . . .
(v)
We claim that {x } is also a sequence obtained from (5.6) by the method of
orthogonal projection (i) of §4. Indeed, we note that Li(y(v)) = li(x(v)) so that
the residuals are the same for the two systems. Now, if y{v) is a solution of
(5.1) then x(v) is a solution of (5.7), and xU) = x (v) , j > v. If y{v) is not a solu-
tion, then:
(H-D (") i *
}
y ^ = y + tv eip
where
-Liw(yM) = max ( - £ , ( ? < " ) ) , Lip(y(r+1)) = 0.
i
Obviously, we have also:
-liv(x(v)) = max (-h(x(v))), liv(x(v+1)) = 0,
i
so that x(v) is replaced by the point x("+1) situated on the hyperplane Tiv
corresponding to the negative residual with the largest absolute value. Finally,
we have
x x == ti>ei^/i — tv\a\iv, . . . , amiv),
from which follows the convergence of y(v) to a point y. Since the solution x of
(5.6) is related to y by (5.5), y is also a solution of (5.1). Finally, we may
write :
3,0+1) _ yU)
y - y (») <E 2 £î <7 1~ 0
In order to get the exact statement (5.4) we note that
1*
JR < \x
=b-y(o,i(zg»y<b-y°>ki<2,
which, when combined with (5.13), gives (5.4) where
2miQ2
K =
«T(l " 0)
RELAXATION FOR LINEAR INEQUALITIES 391
We have discussed above one type of relaxation. Similar results hold for
other types, such as relaxation with maximum change of \y^v+1) — y^v)\ (this
corresponds to the method of orthogonal projections of §3), over and under
relaxation and the method of systematic relaxation. It is also obvious that the
results hold true for the case of equations after we make the necessary change
in the algorithm defining {y(v)}.
We shall now proceed to show that conversely the orthogonal projection
method can be interpreted as a relaxation procedure, at least when the initial
point x(0) is suitably chosen. We shall suppose that a consistent system of
inequalities (1.1) is given. Or, in matrix notation,
xA* + b > 0,
where A is an m X n matrix, and x = (xi, . . . , xn) a row vector. Let us intro-
duce the new variable y = (yi, . . . , ym) which will again be connected with x
by
x = y A.
Let us also consider the associated system
(5.14) yAA* + b>0,
which, when expanded, has the form of (5.1), where G = A A* is an m X m
symmetric and positive semi-definite matrix. Let now x(0) be a starting point
of the form
(5.15) x(0) = ywA,
and let us define x (1) , . . . , x(v\ . . . by the orthogonal projection method (i) of
§4. That is, xiv+1) is the orthogonal projection of xiv) on the hyperplane iriv:
hv{°°) = 0 corresponding to the negative residual U{x^v)) with the largest abso-
lute value. We shall also define a corresponding sequence y(v) in Em as follows:
yw is the chosen starting point in (5.15), y(v+v is the projection of y(v) on the
hyperplane Liv{y) = 0 in the direction parallel to the yiv axis {iv being the
sequence of indices associated with the xiv)Js). It is now easy to see (using
(5.2)) that
xw=y"A9
and that, moreover, the sequence {y(v)} may also be obtained by the relaxation
scheme (5.2). Now, since in the proof of Theorem 5 we did not use the consis-
tency of the system (5.1), but only that of (5.6), we may use the same proof to
obtain again that y(v) converges to a solution y, and that (5.4) holds. Thus, we
have established the equivalence of the two methods, and have also obtained,
as a by-product, that the two systems (1.1) and (5.14) are either both consis-
tent or inconsistent.
392 SHMUEL AGMON
REFERENCES
1. G. B. Dantzig, Maximization of a linear form whose variables are subject to a system of linear
inequalities (U.S.A.F., 1949), 16 pp.
2. G. E. Forsythe, Solving linear algebraic equations can be interesting, Bull. Amer. Math. Soc,
59 (1953), 299-329.
3. T. S. Motzkin and H. Raiffa, G. L. Thompson, R. M. Thrall, The double description method,
in Contributions to the Theory of Games, Annals of Mathematics Series, 2 (1953), 51-74.
4. R. V. Southwell, Relaxation methods in engineering science (Oxford, 1940).
5. , Relaxation methods in theoretical physics (Oxford, 1946).
6. G. Temple, The general theory of relaxations applied to linear systems, Proc. Roy. Soc. London,
169 (1939), 476-500.
7. Linear programming seminar notes, Institute for Numerical Analysis (Los Angeles, 1950).