iter09

Cálculo Numérico

•
ESTÁCIO

marcos
02/11/2015
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 65 páginas
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 6, do total de 65 páginas
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 9, do total de 65 páginas
Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados
16 milhões de materiais de várias disciplinas
Impressão de materiais
Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
E aí, curtiu este material?
Ajude a incentivar outros estudantes a melhorar o conteúdo
Gostou desse material? Compartilhe! 🧡
Cálculo Numérico

30.971 Materiais compartilhados
Baixe o app para aproveitar ainda mais
Leia os materiais offline, sem usar a internet. Além de vários outros recursos!
Prévia do material em texto
Iterative Solution of
Linear Systems
being the second part of
M
3
4N4
Computational Linear
Algebra
c© Gerald Moore
Contents
1 Introduction 1
2 The Classical Iterative Methods 3
2.1 Richardson’s method . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Jacobi’s method . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 The Gauss-Seidel method . . . . . . . . . . . . . . . . . . . . 5
2.4 Additional points about iterative methods . . . . . . . . . . . 6
3 Convergence Criteria for Iterative Methods 9
4 Classical Convergence Results 12
4.1 Convergence of Richardson’s method . . . . . . . . . . . . . . 12
4.2 Diagonal Dominance . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Young’s SOR theory . . . . . . . . . . . . . . . . . . . . . . . 19
4.4 Results for the model problem . . . . . . . . . . . . . . . . . . 24
5 Convergence for Symmetric Positive-Definite Matrices 26
5.1 Optimal scaling for Gauss-Seidel . . . . . . . . . . . . . . . . . 29
5.2 Symmetric iterative methods . . . . . . . . . . . . . . . . . . . 32
6 Stieltjes and M matrices 34
7 Termination Criteria 34
8 Acceleration of Iterative Methods by Chebyshev Polynomi-
als 37
8.1 Richardson’s method . . . . . . . . . . . . . . . . . . . . . . . 43
8.2 Jacobi’s method . . . . . . . . . . . . . . . . . . . . . . . . . . 44
8.3 Gauss-Seidel method . . . . . . . . . . . . . . . . . . . . . . . 44
8.4 SOR method . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
9 The Conjugate Gradient Method 48
9.1 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Iterative Methods 1
1 Introduction
The fundamental idea behind iterative methods is to start with an initial
guess x(0) for the solution x⋆ of Ax = b and then generate a sequence of
approximations x(1),x(2), . . . which (hopefully) converges to x⋆.
Example: The system
2x1 − x2 = 1
−x1 + 2x2 = 1
has solution x⋆ = (1, 1)T . Re-write the equations in the form
x1 =
1 + x2
2
x2 =
1 + x1
2
and generate a sequence of iterates through the formula
x
(k+1)
1 =
1 + x
(k)
2
2
x
(k+1)
2 =
1 + x
(k)
1
2
.
With x(0) = (0, 0)T as the initial guess, the following table of
iterates and errors e(k) ≡ x⋆ − x(k) is obtained.
k 0 1 2 3 4 5
x(k)
0
0
1/2
1/2
3/4
3/4
7/8
7/8
15/16
15/16
31/32
31/32
e(k)
1
1
1/2
1/2
1/4
1/4
1/8
1/8
1/16
1/16
1/32
1/32
So e(k) = 1
2k
(1, 1)T and the error components converge to zero
while the components of x(k) converge to those of x⋆.
An alternative iterative method is derived by simply noting that
the new value x
(k+1)
1 can be used immediately to compute x
(k+1)
2 .
Thus our formula to generate successive iterates now becomes
x
(k+1)
1 =
1 + x
(k)
2
2
x
(k+1)
2 =
1 + x
(k+1)
1
2
.
2 Gerald Moore
With x(0) = (0, 0)T as the initial guess again, a somewhat differ-
ent table of iterates and errors is obtained.
k 0 1 2 3
x(k)
0
0
1/2
3/4
7/8
15/16
31/32
63/64
e(k)
1
1
1/2
1/4
1/8
1/16
1/32
1/64
Not surprisingly, the iterates seem to be converging rather faster
than in the first example.
In general, an iterative method for solving Ax = b depends on a splitting
A = M− N,
where M must be non-singular, and looking for a solution of the re-arranged
problem
Mx = Nx + b.
Then a sequence of iterates is generated by the formula
Mx(k+1) = Nx(k) + b.
[In the above examples, where A =
(
2 −1
−1 2
)
, we first chose M =
(
2 0
0 2
)
and N =
(
0 1
1 0
)
and then M =
(
2 0
−1 2
)
and N =
(
0 1
0 0
)
.] The
logic behind this idea is that if the iterates converge to a vector z, i.e.
limk→∞ x(k) = z, then
Mz = lim
k→∞
Mx(k+1) = lim
k→∞
Nx(k) + b = Nz + b
and so z must be the solution x⋆. For practical reasons, it is necessary to
choose M so that systems of equations with coefficient matrix M can be solved
easily, else M = A and N = 0 would give the exact solution in one iteration!
This means that M will usually be either a diagonal or a triangular matrix.
Different iterative methods are obtained by different choices of M and N.
There is no single efficient iterative method which is guaranteed to converge
for any system Ax = b. The most we can say is that a particular iterative
method will converge if A has certain properties, and that it must converge
at a certain rate.
Iterative Methods 3
2 The Classical Iterative Methods
We shall now describe a number of iterative methods which were devised
before 1950. It will be useful to show how these methods apply to a model
problem and we shall make use of the following simple discretisation of a
partial differential equation. For Poisson’s equation on the unit square with
zero Dirichlet boundary conditions,
−∆u(x, y) = f(x, y) in Ω
u(x, y) = 0 on ∂Ω,
Ω
(0, 0)
(0, 1)
(1, 0)
(1, 1)
ﬀ∂Ω
the simplest finite difference/element approximation leads to the equations
4uij − ui−1,j − ui+1,j − ui,j−1 − ui,j+1 = h2f(ih, jh)
i = 1, . . . ,m− 1
j = 1, . . . ,m− 1
with u0j = umj = 0 j = 0, . . . ,m and ui0 = uim = 0 i = 0, . . . ,m. Here
a uniform mesh of size h = 1
m
has been placed on Ω and uij approximates
u(ih, jh). If the unknowns and equations are ordered in the natural way,
u11, u21, . . . , um−1,1, u12, . . . , um−1,2, . . . , u1,m−1, . . . , um−1,m−1,
then in matrix form the (m− 1)2 equations become
T −I
−I T −I
. . . . . . . . .
. . . . . . . . .
−I T −I
−I T

u = b.
Here I is the (m− 1)×(m− 1) identity matrix and T is the (m− 1)×(m− 1)
tridiagonal matrix
T ≡

4 −1
−1 4 −1
. . . . . . . . .
. . . . . . . . .
−1 4 −1
−1 4

.
4 Gerald Moore
2.1 Richardson’s method
Take M = I and N = I− A, so that the iterative method is
x(k+1) = x(k) + b− Ax(k),
or in component form
x
(k+1)
i = x
(k)
i + bi −
n∑
j=1
aijx
(k)
j i = 1, . . . , n.
Richardson’s method has more chance of working well if we allow M to be a
suitably chosen scaling of the identity matrix, i.e. M = 1
ω
I and N = 1
ω
I − A
with ω 6= 0. Then the iterative method becomes
x(k+1) = x(k) + ω
(
b− Ax(k))
or
x
(k+1)
i = x
(k)
i + ω
(
bi −
n∑
j=1
aijx
(k)
j
)
i = 1, . . . , n.
Applied to the model problem, the scaled Richardson’s method is
u
(k+1)
ij = ωh
2f(ih, jh) + (1− 4ω)u(k)ij
+ ω
{
u
(k)
i−1,j + u
(k)
i+1,j + u
(k)
i,j−1 + u
(k)
i,j+1
} i = 1, . . . ,m− 1
j = 1, . . . ,m− 1 .
2.2 Jacobi’s method
Let A = D − L − U, where D is the diagonal of A, −L is the strict lower
triangular part of A and −U is the strict upper triangular part of A. If the
diagonal elements of A are all non-zero, D will be non-singular and so we can
choose M = D and N = L+U. This is Jacobi’s method, which can be written
x(k+1) =
{
D−1[b + (L + U)x(k)]
x(k) + D−1(b− Ax(k))
or
x
(k+1)
i =
bi − n∑
j=1
j 6=i
aijx
(k)
j
 /aii i = 1, . . . , n.
Iterative Methods 5
[Note that the results given on p.1 are an example of Jacobi’s method.] For
the model problem this iterative method becomes
u
(k+1)
ij =
1
4
h2f(ih, jh)
+
1
4
{
u
(k)
i−1,j + u
(k)
i+1,j + u
(k)
i,j−1 + u
(k)
i,j+1
} i = 1, . . . ,m− 1
j = 1, . . . ,m− 1 .
If the diagonal elements of A are all the same (as they are for the model prob-
lem) then Jacobi’s method is identical with the scaled Richardson’s method
where 1
ω
equals the diagonal element (ω = 1
4
for the model problem.) It is
also possible to introduce a scaling factor into Jacobi’s method, but this is
not often used.
2.3 The Gauss-Seidel method
With A = D − L − U as above, choose M = D − L and N = U. This is the
Gauss-Seidel method,which can be written
x(k+1) =
{
(D− L)−1[b + Ux(k)]
x(k) + (D− L)−1(b− Ax(k))
At each iteration a triangular system of equations must be solved by forward
substitution, using the lower triangular part of A. If the lower triangular
system is solved as in Part 1, then in component form we may write
x
(k+1)
i =
bi − n∑
j=1
j>i
aijx
(k)
j −
n∑
j=1
j<i
aijx
(k+1)
j
 /aii i = 1, . . . , n.
This shows that the Gauss-Seidel method corresponds to Jacobi’s method but
with the new computed values used as soon as possible. [Cf. the numerical
results on p.1.] As with Jacobi’s method, it is necessary that the diagonal
elements of A are all non-zero. For the model problem the Gauss-Seidel
method becomes
u
(k+1)
ij =
1
4
h2f(ih, jh)
+
1
4
{
u
(k+1)
i−1,j + u
(k)
i+1,j + u
(k+1)
i,j−1 + u
(k)
i,j+1
} i = 1, . . . ,m− 1
j = 1, . . . ,m− 1
if the natural ordering is used. This question of ordering will be commented
on later.
6 Gerald Moore
It is very useful to allow a scaling (or relaxation) factor into the Gauss-
Seidel method, i.e. choose M = 1
ω
D−L and N = 1−ω
ω
D+U. Thus the relaxed
Gauss-Seidel iteration is
x(k+1) =
{
( 1
ω
D− L)−1[b + (1−ω
ω
D + U)x(k)]
x(k) + ( 1
ω
D− L)−1(b− Ax(k))
Again we have a lower triangular system to solve at each iteration and again
the algorithm in Part 1 gives the result in component form
x
(k+1)
i =
bi + 1− ωω aiix(k)i −
n∑
j=1
j>i
aijx
(k)
j −
n∑
j=1
j<i
aijx
(k+1)
j
 /(ω−1aii)
= (1− ω)x(k)i + ω
bi − n∑
j=1
j>i
aijx
(k)
j −
n∑
j=1
j<i
aijx
(k+1)
j
 /aii
for i = 1, . . . , n. Thus we have the somewhat surprising result that each
component of the new iterate x(k+1) consists of two parts:-
(1− ω)× “the corresponding component of x(k)”
+ ω× “an ordinary Gauss-Seidel step”.
For our model problem the relaxed Gauss-Seidel method becomes
u
(k+1)
ij = (1− ω)u(k)ij +
ω
4
{
h2f(ih, jh)
+ u
(k+1)
i−1,j + u
(k)
i+1,j + u
(k+1)
i,j−1 + u
(k)
i,j+1
} i = 1, . . . ,m− 1
j = 1, . . . ,m− 1
provided that the natural ordering is used.
Of course, if ω is set equal to 1, we have the Gauss-Seidel iteration again.
We shall see later that it is often better to choose 1 < ω < 2 and in this case
the method is called the Successive Over-Relaxation method or simply SOR.
During the 1950’s and 1960’s, when the optimal value of ω was known for
several important problems, it was regarded as the best iterative method for
systems Ax = b.
2.4 Additional points about iterative methods
• Only the non-zero elements of A need to be stored. If these form a
simple pattern or can be easily generated when required [as with the
Iterative Methods 7
model problem], even this is not necessary. Hence iterative methods
have a great advantage in storage requirements over direct methods for
many large problems.
• Iterative methods can take advantage of good starting values and the
iteration can be terminated when the results are sufficiently accurate.
• Direct methods gain in efficiency when treating several systems with
the same coefficient matrix but with different r.h.s. vectors.
• The Richardson and Jacobi methods are examples of simultaneous
displacement methods. Each component of the new iterate is generated
directly from the components of the old iterate and program storage for
two vectors is required. Gauss-Seidel (and its relaxed version) is an ex-
ample of a successive displacement method. The new iterate can over-
write the old iterate and only program storage for one vector is required.
Apart from their storage advantage, successive displacement methods
often converge faster; however simultaneous displacement methods are
more suitable for parallel computers.
• The Richardson and Jacobi methods, and all simultaneous displace-
ment methods, give the same results no matter in what order the un-
knowns/equations are processed. [Equations and unknowns must be
ordered in the same way however. This is equivalent to both pre-
multiplying and post-multiplying A with the same permutation ma-
trix.] This ordering, however, does alter the iterates (and sometimes
the convergence itself) of the Gauss-Seidel and SOR methods. Various
orderings may have practical or theoretical advantages for particular
systems Ax = b. We shall just end this section by giving an example
of an ordering which is often used with the model problem and similar
systems.
In the red-black, or chequerboard, ordering we divide the mesh points
into two classes:-
1. red points with i+ j even,
2. black points with i+ j odd.
Then all the red points and their associated equations are ordered naturally,
followed by all the black points/equations; i.e.
u11, u13, . . . , u1,m−2, . . . , um−1,2, . . . , um−1,m−1︸ ︷︷ ︸
uR
u12, u14, . . . , u1,m−1, . . . , um−1,1, . . . , um−1,m−2︸ ︷︷ ︸
uB
8 Gerald Moore
assuming for convenience that m is odd. With this ordering, the matrix form
of the (m− 1)2 equations becomes(
4I −G
−GT 4I
)
=
(
uR
uB
)
=
(
bR
bB
)
with u ≡ (uR,uB)T and b ≡ (bR, bB)T , where the r.h.s. vector b is now
h2f(ih, jh) in chequerboard ordering. I is the (m−1)
2
2
× (m−1)2
2
identity matrix
and G is also an (m−1)
2
2
× (m−1)2
2
matrix:
G ≡

B I
I BT I
I B I
. . . . . . . . .
. . . . . . . . .
I B I
I BT

.
Here I is now the m−1
2
×m−1
2
identity matrix and B is the m−1
2
×m−1
2
bidiagonal
matrix
B ≡

1
1 1
1 1
. . . . . .
. . . . . .
1 1

.
Consequently, the Gauss-Seidel iteration for this ordering can be written
u
(k+1)
R =
1
4
(
bR + Gu
(k)
B
)
u
(k+1)
B =
1
4
(
bB + G
Tu
(k+1)
R
)
.
Problems
P–1 Perform one iteration (by hand) for a) Jacobi, b) Gauss-Seidel, ap-
plied to the matrix on p. 77 of the first part of the notes with b =
(0, 1
2
, 1, 1
2
, 0)T . Start from x(0) = 0.
Iterative Methods 9
P–2 Apply the SOR method with ω = 4
2+
√
3
to the example on p.1.
P–3 Describe in component form the red-black, Gauss-Seidel method for
the model problem on p.7.
P–4 Run the programs for Jacobi, Gauss-Seidel and SOR applied to the
model problem [with f(x, y) = 1] and compare their respective conver-
gence rates.
3 Convergence Criteria for Iterative Meth-
ods
The solution x⋆ of Ax = b satisfies
Mx⋆ = Nx⋆ + b
and our iterates satisfy
Mx(k+1) = Nx(k) + b.
Hence, if e(k) ≡ x⋆ − x(k), then, by subtraction, Me(k+1) = Ne(k) and
e(k+1) = M−1Ne(k)
or
e(k+1) =
(
M
−1
N
)k
e(0).
Definition: The matrix C = M−1N is called the iteration matrix
and we say that an iterative method (defined by M and N) is
convergent if the sequence
y(k) = Cky(0)
converges to zero for any initial vector y(0).
Lemma A: If ‖C‖1 < 1 or ‖C‖2 < 1 or ‖C‖∞ < 1 then the itera-
tive method converges. [A sufficient condition for convergence.]
Proof: For p = 1, 2 or ∞, we have
‖y(k)‖p = ‖Cky(0)‖p
≤ ‖Ck‖p‖y(0)‖p
≤ {‖C‖p}k ‖y(0)‖p
and limk→∞ {‖C‖p}k = 0 �
10 Gerald Moore
Definition: The spectral radius of an n×n matrix B is denoted
ρ(B) and defined by
ρ(B) = max
1≤i≤n
{|λi| : λi is an eigenvalue of B.}
Lemma B: An iterative method converges if and only if ρ(C) < 1.
[A necessary and sufficient condition for convergence.]
Proof: We prove the two results in turn.
1. ρ(C) ≥ 1⇒ “iteration is not convergent”.
Let λ be an eigenvalue of C, with |λ| ≥ 1, and v a corre-
sponding eigenvector, normalised so that ‖v‖2 = 1. Then
‖Ckv‖2 = ‖λkv‖2
= |λ|k‖v‖2
≥ 1
and so limk→∞
{
Ckv
} 6= 0.
2. ρ(C) < 1⇒ “iteration is convergent”
Assume C has eigenvalues λ1, . . . , λn and n corresponding
linearly independent eigenvectors v1,. . . ,vn. Any vector
z ∈ Rn can be written in terms of these eigenvectors, i.e.
z =
n∑
i=1
αivi,
and so
C
kz = Ck
(
n∑
i=1
αivi
)
=
n∑
i=1
αiC
kvi
=
n∑
i=1
αiλ
k
i vi.
Since, however, |λi| < 1 for i = 1, . . . , n, we must have
limk→∞
{
λki
}
= 0 and so
lim
k→∞
{
C
kz
}
= 0.
[The result also holds for a general n×n matrix C, but the
proof is more complicated.] �
Iterative Methods 11
Additional points:
1. The two convergence proofs obviously imply
‖B‖p ≥ ρ(B) p = 1, 2,∞
for any matrix B. This is also clearly seen if ρ(B) = |λ| and v is a
normalised eigenvector corresponding to λ, since then
ρ(B) = |λ|‖v‖p
= ‖λv‖p
= ‖Bv‖p
≤ ‖B‖p‖v‖p
= ‖B‖p.
2. If B is symmetric then ρ(B) = ‖B‖2.
3. If the iteration matrix has size < 1 when measured in a certain norm,
then the error in that norm is reduced by at least this amount at each
iteration, i.e.
‖e(k+1)‖p = ‖M−1Ne(k)‖p
≤ ‖M−1N‖p‖e(k)‖p.
If we only have ρ(M−1N) < 1 then the error is not necessarily reduced
in a regular way.
We end this section by proving that, for the scaled Gauss-Seidel method,
only ω ∈ (0, 2) gives any chance of convergence. This is because the relaxed
Gauss-Seidel iteration matrix for A ≡ D− L− U is(
1
ω
D− L
)−1(
1− ω
ω
D + U
)
= (D − ωL)−1([1− ω]D + ωU)
and then we have
det
[
(D− ωL)−1 ([1− ω]D + ωU)]
= det
[
(D− ωL)−1] det [(1− ω)D + ωU]
= det
[
D
−1] det [D] det [(1− ωI) + ωD−1U]
= det [(1− ω)I]
= (1− ω)n.
12 Gerald Moore
(Here we have repeatedly used the result that the determinant of a triangular
matrix is equal to the product of the diagonal elements.) Consequently, if ω
is outside the interval (0, 2), the determinant [product of the eigenvalues!] of
the iteration matrix is greater or equal to 1 in modulus and thus the spectral
radius [largest eigenvalue in modulus!] of the iteration matrix is greater or
equal to 1 and so the iteration cannot converge.
Problems
P–1 Check that, if B =
( 1/4 2
1/8 1/4
)
, then ρ(B) < 1 but ‖B‖1 > 1, ‖B‖2 > 1,
‖B‖∞ > 1.
P–2 Prove that
ρ(B) = lim
k→∞
{‖Bk‖1/kp }
for any symmetric matrix B and p = 1, 2,∞. [HINT: first obtain the
result for p = 2 and then use Problem 4 on p. 37 of the first part of the
notes.]
P–3 The statement in the previous question is actually true for any square
matrix B. Do some computations to verify this result for the matrix
in question 1 above. In what sense does this mean that, if ρ(M−1N) <
1, then the convergence rate of an iterative method, on average, is
ρ(M−1N) for p = 1, 2,∞.
4 Classical Convergence Results
4.1 Convergence of Richardson’s method
The iteration matrix M−1N is equal to I−A. So if λ1, . . . , λn are the eigenval-
ues of A, 1−λ1, . . . , 1−λn will be the eigenvalues of M−1N. Thus Richardson’s
method will converge iff the eigenvalues of A lie strictly within the circle of
radius 1 centred on 1 in the complex plane.
6
-&%
'$
0 1 2
ℑ
ℜ
Iterative Methods 13
If the eigenvalues of A are real, they must lie in the interval (0, 2) for con-
vergence.
The scaled Richardson method has iteration matrix M−1N = I − ωA,
which has eigenvalues 1−ωλ1, . . . , 1−ωλn. Convergence can most easily be
established when the eigenvalues of A are all real. In this case the method
will converge iff all the eigenvalues of A lie in the interval
(
2
ω
, 0) for ω < 0, (0,
2
ω
) for ω > 0.
Consequently, if the eigenvalues of A are all real and of one sign (eg. A
is symmetric, positive-definite), then the scaled Richardson method must
converge for |ω| sufficiently small and ω of the correct sign. The eigenvalues
of M−1N will lie between 1 − ωλ1 and 1 − ωλn and so we can minimise
ρ(M−1N), i.e. make the iteration converge as fast as possible, by choosing
ω =
2
λ1 + λn
.
In this case
ρ(M−1N) =
{
1− 2λ1
λ1+λn
= λn−λ1
λn+λ1
λi > 0 i = 1, . . . , n,
1− 2λn
λ1+λn
= λ1−λn
λ1+λn
λi < 0 i = 1, . . . , n,
and if A were symmetric we could write
‖M−1N‖2 = cond2(A)− 1
cond2(A) + 1
.
Example: It is not difficult to check that the eigenvalues and
eigenvectors for our model problem are
4
{
sin2
kpih
2
+ sin2
lpih
2
}
k = 1, . . . ,m− 1
l = 1, . . . ,m− 1
and
sin ikpih sin jlpih
k, l = 1, . . . ,m− 1
i, j = 1, . . . ,m− 1
respectively, where (k, l) indexes the eigenvalues and (i, j) gives
the components of a corresponding eigenvector.
A is symmetric, positive-definite and so the scaled Richardson
method will converge for suitably chosen ω. Since λmin = 8 sin
2 πh
2
and λmax = 8 cos
2 πh
2
, the optimal value of ω is
ωopt =
2
λmin + λmax
=
1
4
,
14 Gerald Moore
which gives
‖M−1N‖2 =
cot2
(
πh
2
)− 1
cot2
(
πh
2
)
+ 1
= cos pih
= 1− pi
2h2
2
+O(h4).
Thus as h gets smaller, i.e. we approximate the differential equa-
tion more accurately, the convergence becomes slower. This is the
great difficulty with iterative methods applied to discretisations
of differential equations and, as we shall see, it occurs to a greater
or lesser degree for almost all iterative methods.
As a final remark on our model problem, we note that Richardson’s method
with optimal ω is the same as Jacobi’s method.
4.2 Diagonal Dominance
The iteration matrix M−1N for Jacobi’s method is I− D−1A, where D is the
diagonal matrix containing the diagonal terms of A. In one important case
it is easy to see that the method must converge.
Definition: An n×nmatrix B is called strictly diagonally dominant
if
|bii| >
n∑
j=1
j 6=i
|bij| i = 1, . . . , n.
Example: 1 0 02 4 1
1 1 3
 is strictlydiagonally
dominant
 2 −1 0−1 2 −1
0 −1 2
 is not strictlydiagonally
dominant
If a matrix A is strictly diagonally dominant, then not only must it be non-
singular (see Problems) but also its Jacobi iteration matrix must be strictly
less than 1 in the ∞-norm, i.e.
‖I− D−1A‖∞ = max
1≤i≤n

n∑
j=1
j 6=i
|aij|
|aii|
 < 1.
Iterative Methods 15
Hence strict diagonal dominance of A means that the Jacobi method con-
verges for the system Ax = b and there is a continual reduction in the
∞-norm of the error vector at each Jacobi iteration.
In a similar manner, it is also possible to show that the Gauss-Seidel
method is guaranteed to converge.
Theorem: If A is strictly diagonally dominant then the Gauss-
Seidel method converges.
Proof: If y = M−1Nx then, in component form, we have
yi = −
(
i−1∑
j=1
aijyj +
n∑
j=i+1
aijxj
)
/aii
for i = 1, . . . , n. Consequently, if ‖y‖∞ ≡ |yk| say, we can write
‖y‖∞ ≤
k−1∑
j=1
∣∣∣∣akjakk
∣∣∣∣ |yj|+ n∑
j=k+1
∣∣∣∣akjakk
∣∣∣∣ |xj|
≤ αk‖y‖∞ + βk‖x‖∞,
where
αi ≡
i−1∑
j=1
∣∣∣∣aijaii
∣∣∣∣ and βi ≡ n∑
j=i+1
∣∣∣∣aijaii
∣∣∣∣ .
Hence
‖y‖∞ ≤ γ‖x‖∞,
where
γ ≡ max
1≤i≤n
{
βi
1− αi
}
and γ < 1 because the strict diagonal dominance of A implies
that αi + βi < 1 for each row i. Therefore
‖M−1N‖∞ ≤ γ < 1
and so the Gauss-Seidel method must converge �
Many matrices are only diagonally dominant, i.e. with an inequality re-
placing the strict inequality in the above definition.
16 Gerald Moore
Example:
2 −1
−1 2 −1
. . . . . . . . .
. . . . . . . . .
−1 2 −1
−1 2

is diagonally dominant
but not
strictly diagonally dominant
Diagonal dominance on its own is not sufficient for the Jacobi or Gauss-Seidel
methods to converge, even if at least one row of A has a strict inequality. It
is still possible, however, to prove that both methods will converge, provided
that the conditions on the coefficient matrix A are slightly strengthened.
Thus we call A irreducibly diagonal dominant if
1. A is diagonally dominant,
2. at least one row of A must have the strict inequality,
3.it is not possible to divide the set of integers {1, 2, . . . , n} into two
non-empty disjoint sets I and J so that aij = 0 whenever i ∈ I and
j ∈ J .
This last condition [irreducibility!] means that it is not possible to re-arrange
the rows and corresponding columns of A so that the matrix looks like(
AII 0
AJI AJJ
)
,
i.e. Ax = b does not contain a smaller system of equations hidden inside it.
Example: The matrix 2 −1 0−1 2 −1
0 −1 1

is irreducibly diagonally dominant.
Now we have the key theorem.
Theorem: If B is irreducibly diagonally dominant then B is non-
singular.
Iterative Methods 17
Proof: Suppose Bz = 0 with z 6= 0. It is impossible for
|zj| = ‖z‖∞ 1 ≤ j ≤ n
, because then we would have
|bkk||zk| =
∣∣∣∣∣∑
j 6=k
bkjzj
∣∣∣∣∣
≤ |zk|
∑
j 6=k
|bkj|
for each row k of B, and this would give a contradiction for the
particular row where
|bkk| >
∑
j 6=k
|bkj|.
Hence we may split the integers {1, 2, . . . , n} into the two non-
empty disjoint sets
I ≡ {i : |zi| = ‖z‖∞} and J ≡ {j : |zj| < ‖z‖∞} .
Then, if i ∈ I, we may write
|bii| ≤
∑
k 6=i
|bik| |zk||zi|
=
∑
k 6=i
k∈I
|bik|+
∑
k∈J
|bik| |zk||zi| ,
and bik 6= 0 for any k ∈ J enables us to say that
|zk|
|zi| < 1 ⇒ |bii| <
∑
k 6=i
|bik|
which contradicts the diagonal dominance of B. Hence we must
have
i ∈ I, j ∈ J ⇒ bij = 0,
which contradicts the irreducibility of B �
18 Gerald Moore
Corollary I: If A is irreducibly diagonally dominant then the
Jacobi method converges.
Proof: Suppose CJ ≡ I−D−1A has an eigenvalue µ with |µ| ≥ 1,
so that
I− 1
µ
CJ ≡ D−1
{
D− 1
µ
(L + U)
}
is singular and hence so is D− 1
µ
(L+U). But |µ| ≥ 1 means that
this last matrix inherits the irreducible diagonal dominance of A,
which contradicts the theorem �
Corollary II: If A is irreducibly diagonally dominant then the
Gauss-Seidel method converges.
Proof: Suppose CGS ≡ I− (D− L)−1A has an eigenvalue µ with
|µ| ≥ 1, so that
I− 1
µ
CGS ≡ (D− L)−1
{
D− L− 1
µ
U
}
is singular and hence so is D − L − 1
µ
U. But |µ| ≥ 1 means that
this last matrix inherits the irreducible diagonal dominance of A,
which contradicts the theorem �
This type of proof can also be used to show that the Jacobi and Gauss-
Seidel methods also converge when A satisfies the weaker condition of essential
diagonal dominance. This means that
1. A is diagonally dominant,
2. at least one row of A must have the strict inequality,
3. if it is possible to divide the set of integers {1, 2, . . . , n} into two non-
empty disjoint sets I and J so that aij = 0 whenever i ∈ I and j ∈ J
then one of the rows of A with index in I must have the strict inequality.
This last condition means that if it is possible to re-arrange the rows and
corresponding columns of A so that the matrix looks like(
AII 0
AJI AJJ
)
,
then one of the rows in AII must have the strict inequality.
Iterative Methods 19
4.3 Young’s SOR theory
If a scaling factor is introduced into the Gauss-Seidel method, the significant
questions are:-
1. Does varying ω away from 1 (the Gauss-Seidel method) lead to faster
convergence?
2. What is the optimal value of ω to use?
In this section we shall prove that, for a number of simple (but important!)
coefficient matrices A, which include our model problem, it is possible to
state exactly what value of ω should be chosen in terms of ρJ , the spectral
radius of the Jacobi iteration matrix. For this class of coefficient matrices it
is also possible to give a formula connecting ρJ with ρω, the spectral radius
of the scaled Gauss-Seidel iteration matrix.
We shall use our usual splitting
A ≡ D− L− U,
assuming that the diagonal elements of A are non-zero, and write
CJ ≡ I− D−1A = D−1 (L + U)
and
Cω ≡ I−
(
1
ω
D− L
)−1
A = (D− ωL)−1 ([1− ω]D + ωU) .
The key property which we require our splitting to possess is:-
The characteristic equation, and hence the
eigenvalues, of
B(z) ≡ z−1D−1L + zD−1U
are independent of non-zero z ∈ C.
This property is established for a several types of coefficient matrix A through
similarity transformations S(z), i.e. we show that
B(z) = S(z)−1B(1)S(z).
1. If A is a tridiagonal matrix then S(z) is a diagonal matrix with iith
element zi−1.
20 Gerald Moore
2. If
A ≡
(
D1 C1
C2 D2
)
,
where D1 and D2 are diagonal matrices, then we take
S(z) ≡
(
I1
zI2
)
.
3. If A is a block tridiagonal matrix with diagonal diagonal blocks, then
we take S(z) to be a block diagonal matrix with the ith diagonal block
equal to zi−1 times the identity matrix. [Thus this is a generalisation
of the last two types.]
4. If A is a block tridiagonal matrix with tridiagonal diagonal blocks and
diagonal off-diagonal blocks, then we take S(z) to be a block diagonal
matrix with the ith diagonal block equal to zi−1 times the diagonal
matrix whose jjth element is zj−1.
For matrices satisfying this property, we can connect the characteristic
equations for CJ and Cω.
• By putting z = ±1, we see that the eigenvalues of
D
−1 (L + U) and − D−1 (L + U)
are identical, and the only way this can occur is for
det (µI− CJ) = µp
r∏
i=1
(
µ2 − µ2i
)
where µi ∈ C and p+ 2r = n.
• The characteristic polynomial for Cω can be written
det (λI− Cω) = det
(
λI− (D− ωL)−1 ([1− ω]D + ωU))
= det (D− ωL)−1 det (λ (D− ωL)− [1− ω]D− ωU)
= det
(
[λ+ ω − 1] I− λωD−1L− ωD−1U) .
If, however, λω 6= 0 we may set z = √ω
λ
and conclude that the eigen-
values of
λωD−1L + ωD−1U and
√
λωD−1 (L + U)
Iterative Methods 21
are identical, and hence
(†) det (λI− Cω) = det
(
[λ+ ω − 1] I−
√
λωD−1 (L + U)
)
.
If, on the other hand, λω = 0 then
det ([λ+ ω − 1] I −λωD−1L− ωD−1U)
= [λ+ ω − 1]n
= det
(
[λ+ ω − 1] I−
√
λωD−1 (L + U)
)
which means that (†) still holds.
• Since CJ ≡ D−1 (L + U), we see that
det (λI− Cω) = det
(
[λ+ ω − 1] I−
√
λωCJ
)
= (λ+ ω − 1)p
r∏
i=1
(
[λ+ ω − 1]2 − λω2µ2i
)
We may therefore pair-off the eigenvalues of CJ , C1 ≡ CGS and Cω, i.e.
Jacobi Gauss-Seidel Scaled Gauss-Seidel
p 0 0 1− ω
2r ±µi 0, µ2i (‡)
with (‡) denoting the two roots of
(‡) (λ+ ω − 1)2 = λω2µ2i .
This immediately tells us that
ρGS = [ρJ ]
2 ,
so that Gauss-Seidel converges if and only if Jacobi converges and if they
both converge Gauss-Seidel is twice as fast.
To make any further progress we need to assume that the eigenvalues of
CJ are real, and so from now on we take
µi ∈ R and 0 < |µi| < 1 i = 1, . . . , r.
We also know that only ω ∈ (0, 2) is of interest.
22 Gerald Moore
For the p zero eigenvalues of CJ , we know that the corresponding p eigen-
values 1 − ω of Cω decrease in modulus for ω ∈ (0, 1) and then increase in
modulus for ω ∈ (1, 2).
6
-
0 1 2
1
|1− ω|
ω
@
@
@
@
�
�
�
�
For each pair of non-zero eigenvalues of CJ , ±µi, we must examine the
behaviour of the two roots of (‡). Using the formula for the roots of a
quadratic, we obtain
ω Roots of (‡)
0 < ω < 2
1+
√
1−µ2i
2 distinct non-negative real roots
ω = 2
1+
√
1−µ2i
Double real root ω − 1
2
1+
√
1−µ2i
< ω < 2 Complex conjugate roots, modulus ω − 1
The behaviour of the two distinct real roots of (‡), for 0 < ω < 2
1+
√
1−µ2i
, is
shown in the graph below.
6
-
−1
1
1
ω = 2
1+
√
1−µ2i
ω = 1
0 < ω < 1
y = λ+ω−1ω
y = |µi|
√
λ
y
λ
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
�
��
Thus, for each ω ∈
(
0, 2
1+
√
1−µ2i
)
, the two intersections of the curve y =
µi
√
λ and the line y = λ+ω−1
ω
give the two real roots of (‡). Hence we see
that the intersection on the upperbranch of y = µi
√
λ always gives the larger
root, but this root decreases in size as ω increases.
Iterative Methods 23
For ω ∈
(
2
1+
√
1−µ2i
, 2
)
, we may write (‡) as
λ2 +
(
2[ω − 1]− ω2µ2i
)
λ+ [ω − 1]2 = 0
in order to see that the modulus of the complex conjugate roots is ω − 1,
which increases with ω.
Hence, for each individual µi, the best choice of ω to minimize the mod-
ulus of the largest corresponding eigenvalue of Cω is
ω =
2
1 +
√
1− µ2i
.
In particular, for µi ≡ ρJ the best choice of ω is 2
1+
√
1−ρ2
J
. Since, however,
µi > µj ⇒ 2
1 +
√
1− µ2i
>
2
1 +
√
1− µ2j
,
this choice of ω will make all the eigenvalues of Cω have modulus ω − 1. So
we finally see that the optimal choice of ω is
ωopt =
2
1 +
√
1− ρ2J
and this makes
ρωopt = ωopt − 1 ≡
1−
√
1− ρ2J
1 +
√
1− ρ2J
.
6
-
0 1 ωopt 2
1
ρ
ω
�
�
[Note that ωopt is always in the interval (1, 2).] Of course, we must be able
to calculate (or estimate) the largest eigenvalue in modulus of I − D−1A in
order to use these results.
A most important point is that, if the Jacobi method is converging slowly
because ρJ = 1− ε with small ε > 0, then
ρGS = (1− ε)2 ≈ 1− 2ε
ρSOR =
1−√1− (1− ε)2
1 +
√
1− (1− ε)2 ≈ 1− 2
√
2ε
and there is an order-of-magnitude improvement in the convergence of the
SOR method.
24 Gerald Moore
4.4 Results for the model problem
For our model problem, we know from p.14 that
ρJ = cospih = 1− pi
2h2
2
+O(h4).
Consequently,
ρGS = cos
2 pih = 1− pi2h2 +O(h4)
ωopt =
2
1 + sinpih
= 2− 2pih+O(h2)
ρSOR =
1− sin pih
1 + sinpih
= 1− 2pih+O(h2)
The table below shows how the spectral radius of our three methods varies
with h = 1/m, and also gives the approximate number of iterations required
to reduce the error by a factor of 1000, i.e. from 1 to 0.001.
m = 1/h 10 20 50 100 200
Jacobi .95016 .98769 .99803 .99951 .99988
Gauss-Seidel .90451 .97553 .99606 .99901 .99975
SOR .52786 .72954 .88184 .93909 .96907
Jacobi 138 558 3497 13996 55990
Gauss-Seidel 69 279 1749 6998 27995
SOR 17 35 92 195 413
Problems
P–1 What is the optimal scaling for Richardson’s method applied to the
coefficient matrix
( −4 2
−3 1
)
.
P–2 If A has complex eigenvalues, determine when it is possible for the
scaled Richardson method to converge and work out the optimal scal-
ing.
P–3 If A is strictly diagonally dominant, prove that A must be non-singular
by deducing a contradiction from the kth equation of
Az = 0
if zk = ‖z‖∞ > 0.
Iterative Methods 25
P–4 Consider the matrix  1 α αα 1 α
α α 1
 ,
used in Problem 5 on p. 67 of the first part of the notes. For what
values of α:-
1. is the matrix diagonally dominant?
2. does the Jacobi method converge?
P–5 Illustrate the possible slowness of convergence with Jacobi’s method
applied to irreducibly diagonally dominant matrices by computing the
first few iterates for the tridiagonal system
2 −1
−1 2 −1
. . . . . . . . .
. . . . . . . . .
−1 2 −1
−1 2


x1
x2
...
...
x9
x10

=

1
0
...
...
0
1

starting from x(0) = 0.
P–6 Adapt the proof in the notes for irreducibly diagonally dominant ma-
trices, to show that Jacobi and Gauss-Seidel will also converge for es-
sentially diagonally dominant matrices.
P–7 For Jacobi’s method applied to the model problem, why is it only nec-
essary to compute the “red” values for even iterates and the “black”
values for odd iterates? Is this true for Gauss-Seidel as well?
P–8 If a matrix B is column diagonally dominant,i.e.
|bjj| >
n∑
i=1
i6=j
|bij| j = 1, . . . , n,
prove that Jacobi converges. [HINT: I− BD−1 = D(I− D−1B)D−1.]
P–9 Check that both Jacobi and Gauss-Seidel converge for exactly one (but
a different one!) of the matrices 1 −2 2−1 1 −1
−2 −2 1
 and
 2 −1 12 2 2
−1 −1 2
 .
26 Gerald Moore
P–10 Multiply out the similarity transformations to check that the key eigen-
value property on p.19 holds for the four given types of matrix.
P–11 Why is the optimal value of ω for the model problem the same for both
the natural and red-black orderings?
P–12 Compute ρJ and ρGS for the matrix 4 −1 −1−1 4 −1
−1 −1 4
 ;
then find the optimal value of ω for SOR and compute ρSOR.
5 Convergence for Symmetric Positive-Definite
Matrices
When the coefficient matrix A is symmetric positive-definite, we can define
the vector norm
‖x‖A ≡
√
xTAx ∀x ∈ Rn
and the corresponding matrix norm
‖B‖A ≡ ‖A 12BA− 12‖2 ∀B ∈ Rn×n.
For most iterative methods, based on a splitting A = M−N, it is easiest and
most natural to discuss convergence in terms of proving that
ρ(I−M−1A) ≤ ‖I−M−1A‖A < 1.
Our main result is the following theorem.
Theorem: If the symmetric matrix
M + MT − A
is positive-definite then
‖I−M−1A‖A < 1.
Proof: First we note that M cannot be singular because
Mz = 0 z 6= 0
Iterative Methods 27
would imply
zT (M + MT − A)z = −zTAz < 0.
Now if y = (I−M−1A)x then
‖y‖2
A
≡ yTAy = (x−w)TA(x−w) w ≡ M−1Ax
= xTAx− 2wTAx + wTAw
= xTAx− 2wTMw + wTAw
= xTAx−wT (MT + M)w + wTAw
= xTAx−wT (MT + M− A)w
Then
wT (MT + M− A)w ≥ µmin‖w‖22,
where µmin > 0 is the smallest eigenvalue of M + M
T − A, and
‖w‖22 = xTAM−TM−1Ax
= (A
1
2x)TA
1
2M
−T
M
−1
A
1
2 (A
1
2x)
≥ µˆmin‖A 12x‖22
= µˆmin‖x‖2A,
where µˆmin > 0 is the smallest eigenvalue of (M
−1A
1
2 )TM−1A
1
2 ,
i.e. 1
‖A− 12M‖2
2
. Hence
‖(I−M−1A)x‖2
A
≤ (1− µminµˆmin)‖x‖2A
and so
‖I−M−1A‖A ≤
√
1− µminµˆmin < 1 �
Corollary: If, in addition, M is a symmetric positive-definite
matrix, then all the eigenvalues of I−M−1A are real and
ρ(I−M−1A) = ‖I−M−1A‖A = ‖I−M−1A‖M.
Proof: I − M−1A is similar to the two symmetric matrices I −
A
1
2M−1A
1
2 and I−M− 12AM− 12 , i.e.
I− A 12M−1A 12 = A 12 (I−M−1A)A− 12
and
I−M− 12AM− 12 = M 12 (I−M−1A)M− 12 .
28 Gerald Moore
Consequently,
ρ(I−M−1A) = ρ(I− A 12M−1A 12 )
= ‖I− A 12M−1A 12‖2 = ‖I−M−1A‖A
and
ρ(I−M−1A) = ρ(I−M− 12AM− 12 )
= ‖I−M− 12AM− 12‖2 = ‖I−M−1A‖M �
Hence, if M + MT − A is positive-definite there is a guaranteed reduction in
the A-norm of the error at each iteration. If, in addition, M is symmetric
positive-definite then there is also a guaranteed reduction in the M-norm of
the error at each iteration.
Now we wish to apply these results to the Jacobi and Gauss-Seidel meth-
ods, and we note that both of these iterations are defined since aii > 0 1 ≤
i ≤ n.
• Jacobi
M + MT − A = 2D− A
This matrix need not be positive-definite and, in fact, the Jacobi method
need not converge for symmetric, positive-definite A, e.g. 1 0.8 0.80.8 1 0.8
0.8 0.8 1
 .
If, however, Jacobi does converge then the corollary applies since D is
symmetric, positive-definite. It is easy to check, see Problems, that
Jacobi must converge for n = 2.
• scaled Jacobi
M + MT − A = 2
ω
D− A
If λmax > 0 is the largest eigenvalue of the symmetric matrix D
− 1
2AD
− 1
2 ,
equivalently the largest eigenvalue of the similar matrix D−1A or for
the generalised problem A − λD, then 2
ω
D − A is positive-definite iff
2
ω
> λmax, i.e.
0 < ω <
2
λmax
.
The corollary will then apply.
Iterative Methods 29
• Gauss-Seidel
M + MT − A = D
Thus the Gauss-Seidel method converges for all symmetric, positive-
definite A.
• scaled Gauss-Seidel
M + MT − A = 2− ω
ω
D
Thus the scaled Gauss-Seidel method converges for all symmetric, positive-
definite A provided that 0 < ω < 2. Together with the previous result
on p.11 this means that we have proved the famous Ostrowski-Reich
theorem:-
A symmetric, positive-definite ⇒
{
scaled Gauss-Seidelconverges
iff 0 < ω < 2.
We may also apply these theorems to the block Jacobi and block Gauss-Seidel
methods, which are well-defined since the diagonal blocks of a symmetric,
positive-definite matrix are also symmetric and positive-definite. If D now
denotes a block diagonal matrix, we have exactly the same results as above:
e.g. block Jacobi with only two blocks converges for all symmetric, positive-
definite A, see Problems.
5.1 Optimal scaling for Gauss-Seidel
If we denote the scaled Gauss-Seidel iteration matrix by
Cω ≡ I−
(
1
ω
D− L
)−1
A,
then it is possible to estimate the value of ω ∈ (0, 2) which will minimise
‖Cω‖A.
First we note that
‖Cω‖2A = ‖A
1
2CωA
− 1
2‖22 = ρ
(
[A−
1
2C
T
ωA
1
2 ][A
1
2CωA
− 1
2 ]
)
30 Gerald Moore
and so we are concerned with the largest eigenvalue of
[A−
1
2C
T
ωA
1
2 ][A
1
2CωA
− 1
2 ]
= [I− A 12 ( 1
ω
D− L)−TA− 12 ][I− A 12 ( 1
ω
D− L)−1A 12 ]
= I + A
1
2 (
1
ω
D− L)−T{
A− ( 1
ω
D− L)T − ( 1
ω
D− L)
}
(
1
ω
D− L)−1A 12
= I− ( 2
ω
− 1)A 12 ( 1
ω
D− L)−TD( 1
ω
D− L)−1A 12 .
Hence
ρ
(
[A−
1
2C
T
ωA
1
2 ][A
1
2CωA
− 1
2 ]
)
= 1− ( 2
ω
− 1)µmin,
where µmin > 0 is the smallest eigenvalue of
A
1
2 (
1
ω
D− L)−TD( 1
ω
D− L)−1A 12
=
{
A
− 1
2 (
1
ω
D− L)D−1( 1
ω
D− L)TA− 12
}−1
=
{
[A−
1
2 (
1
ω
D− L)D− 12 ][A− 12 ( 1
ω
D− L)D− 12 ]T
}−1
.
Thus
µmin =
1
‖A− 12 ( 1
ω
D− L)D− 12‖22
and so
‖Cω‖A =
√
1−
2
ω
− 1
‖A− 12 ( 1
ω
D− L)D− 12‖22
.
Secondly, we know that ‖A− 12 ( 1
ω
D− L)D− 12‖22 is the largest eigenvalue of
(†) A− 12
{
(
1
ω
D− L)D−1( 1
ω
D− L)T
}
A
− 1
2
Iterative Methods 31
and so, putting ωˆ ≡ 2−ω
2ω
, we can write
(
1
ω
D− L)D−1( 1
ω
D− L)T
= [ωˆD + (
1
2
D− L)]D−1[ωˆD + (1
2
D− LT )]
= ωˆ2D + ωˆ[
1
2
D− L + 1
2
D− LT ] + (1
2
D− L)D−1(1
2
D− LT )
= ωˆ2D + ωˆA + (
1
2
D− L)D−1(1
2
D− LT ).
Hence (†) equals
ωˆ2A−
1
2DA
− 1
2 + ωˆ + A−
1
2 (
1
2
D− L)D−1(1
2
D− LT )A− 12 ,
which means that
‖A− 12 ( 1
ω
D− L)D− 12‖22 ≤ ωˆ2‖A−
1
2D
1
2‖22 + ωˆ + ‖A−
1
2 (
1
2
D− L)D− 12‖22.
Finally we must assume that we can bound the remaining matrix terms,
i.e. determine γ,Γ > 0 so that
(‡)
{
‖A− 12D 12‖22 ≤ 1γ
‖A− 12 (1
2
D− L)D− 12‖22 ≤ 14Γ,
and then the bound
‖A− 12 ( 1
ω
D− L)D− 12‖22 ≤
ωˆ2
γ
+ ωˆ +
Γ
4
means that we obtain
‖Cω‖A ≤
√
1− 2ωˆ
ωˆ2
γ
+ ωˆ + Γ
4
.
As ω varies in (0, 2), ωˆ varies in (0,∞) and ωˆ
ωˆ2
γ
+ωˆ+Γ
4
achieves its maximum
at ωˆ = 1
2
√
γΓ, i.e. ω′ = 2
1+
√
γΓ
with
‖Cω′‖A ≤
√√
Γ−√γ√
Γ +
√
γ
.
32 Gerald Moore
5.2 Symmetric iterative methods
If the condition
M + MT − A positive-definite
tells us that the splitting A = M − N gives a convergent iterative method,
then it equally shows us that the splitting A = MT −NT leads to a convergent
iteration. There is no reason to prefer one of these splittings to the other
and thus, from symmetry considerations, it is natural to combine them in
the single iteration
x(k+
1
2
) = x(k) + M−1(b− Ax(k))
x(k+1) = x(k+
1
2
) + M−T (b− Ax(k+ 12 ))
with matrix
(I−M−TA)(I−M−1A) ≡ I− [MS]−1 A
where
M
S = M(M + MT − A)−1MT .
Since we already know that ‖I−M−1A‖A < 1 and ‖I−M−TA‖A < 1, it is
clear that
‖I− [MS]−1 A‖A ≤ ‖I−M−TA‖A‖I−M−1A‖A < 1.
In fact we have more than this.
• I− [MS]−1 A is similar to
A
1
2 (I−M−TA)(I−M−1A)A− 12 = (I− A 12M−1A 12 )T (I− A 12M−1A 12 )
and so its eigenvalues are real and non-negative.
• The spectral radius of the symmetric iteration
ρ
(
I− [MS]−1 A) = ρ([I− A 12M−1A 12 ]T [I− A 12M−1A 12 ])
= ‖I− A 12M−1A 12‖22
= ‖I−M−1A‖2
A
(
≤ ‖I− [MS]−1 A‖A) .
is exactly the square of the A-norm of the original iteration.
Iterative Methods 33
The most obvious application of symmetric iteration methods is for the
point/block (scaled) Gauss-Seidel methods. For example, with the point
method we have
M =
1
ω
D− L and MT = 1
ω
D− LT .
The analysis in the previous subsection can be used to estimate the value of
ω which minimises ρ
(
I− [MSω]−1 A), i.e. ω′ = 21+√γΓ with
ρ
(
I− [MSω′]−1 A) ≤ √Γ−√γ√
Γ +
√
γ
.
Problems
P–1 If ‖x‖A ≡ ‖A 12x‖2, prove that the corresponding matrix operator norm
is characterised by
‖B‖A = ‖A 12BA− 12‖2.
P–2 Just use simple algebra to show that the Jacobi method converges for
every 2× 2 symmetric positive-definite matrix.
P–3 Expand out
M(M + MT − A)−1MT {I− (I−M−TA)(I−M−1A)}
to verify the formula for MS on p.32.
P–4 Instead of solving Ax = b to obtain x⋆, it is also possible to solve
• ATAx = c where c = ATb, or
• AATy = b with x = ATy.
What are the advantages/disadvantages ot iterative methods applied
to these systems? Write out, in component form, the Gauss-Seidel
iteration for the second case in terms of the variable x.
34 Gerald Moore
6 Stieltjes and M matrices
Most important symmetric, positive-definite matrices that arise from discreti-
sations of differential equations, however, have their off-diagonal elements
nonpositive.
Definition: A symmetric, positive-definite matrix B with
|bij| ≤ 0 if i 6= j
is called a Stieltjes matrix.
It can be proved that Jacobi’s method will converge for all systems of equa-
tions whose coefficient matrix is a Stieltjes matrix.
If the coefficient matrix is a Stieltjes matrix, and so Jacobi’s method
converges, it can be proved that the Gauss-Seidel method converges faster,
i.e.
ρG-S ≤ ρJ
with equality only if A is a diagonal matrix. Here ρJ denotes the spectral
radius of the Jacobi iteration matrix and ρG-S denotes the spectral radius
of the Gauss-Seidel iteration matrix. For general coefficient matrices it is
possible for Gauss-Seidel to converge and Jacobi to diverge or vice-versa.
7 Termination Criteria
Ideally we would like to stop our iterative method when a suitable norm of
the error,
‖e(k)‖ ≡ ‖x⋆ − x(k)‖,
is sufficiently small. The error, however, is not known (because x⋆ is not
known!) and so we must be content with noting that both the residual,
r(k) ≡ b− Ax(k),
and the pseudo-residual,
s(k) ≡ x(k+1) − x(k),
should tend to the zero vector as x⋆ is approached. These are both com-
putable quantities and the key question is:- how do the sizes of r(k) and s(k)
relate to the size of e(k).
Iterative Methods 35
The relationship between the residual and the error has nothing to do
with the particular iterative method being used, as can be seen from
e(k) = x⋆ − x(k)
= A−1(Ax⋆ − Ax(k))
= A−1(b− Ax(k))
= A−1r(k).
Thus to relate the error to the residual we need to estimate or bound a norm
of A−1.
Example: For the model problem we have
‖A−1‖2 = 1
8 sin2(pih/2)
and thus
‖e(k)‖2 ≤ ‖A−1‖2‖r(k)‖2
=
‖r(k)‖2
8 sin2(pih/2)
.
Consequently, for small h, ‖e(k)‖2 can be significantly larger than
‖r(k)‖2 and this should be taken into account when deciding at
which point to stop an iteration.
The relationship between the pseudo-residual and the error is given by
s(k) = e(k+1) − e(k)
= (M−1N− I)e(k)
and so
e(k) = (M−1N− I)−1s(k).
[Since we are assuming that the iteration is convergent, i.e. ρ(M−1N) < 1,
M−1N− I cannot have a zero eigenvalue and thus must be non-singular.] We
illustrate the significance of this relationship with two examples.
• Richardson’s method with optimal ω for A symmetric, positive-definite.
Here we have M = 1
ω
I and N = 1
ω
I − A with ω = 2
λ1+λn
. Consequently
(M−1N− I)−1 = −1
ω
A−1 and
‖e(k)‖2 ≤ ‖(M−1N− I)−1‖2‖s(k)‖2
=
λ1 + λn
2λ1
‖s(k)‖2
=
1
2
(1 + cond2(A)) ‖s(k)‖2.
36 Gerald MooreFor the model problem this becomes
‖e(k)‖2 ≤ ‖s
(k)‖2
2 sin2(pih/2)
.
• Jacobi’s method for strictly diagonally dominant A.
For Jacobi’s method we have M−1N = I − D−1A, and if A is strictly
diagonally dominant p.14 shows that
‖M−1N‖∞ < 1
and this value can easily be computed. Hence, using the result on p.
36 of the first part of the notes, I−M−1N is non-singular with
‖(M−1N− I)−1‖∞ ≤ 1
1− ‖M−1N‖∞ .
Thus our relationship between the error and the pseudo-residual is
given by
‖e(k)‖∞ ≤ ‖(M−1N− I)−1‖∞‖s(k)‖∞
≤ ‖s
(k)‖∞
1− ‖M−1N‖∞
and if ‖M−1N‖∞ is close to 1 then ‖e(k)‖∞ may be much larger than
‖s(k)‖∞.
In other situations the idea is always the same: to estimate or bound a norm
of (M−1N− I)−1 and thus to relate e(k) and s(k). This, however, may require
a great deal of ingenuity.
Problems
P–1 If the residual and pseudo-residual are both monitored during a run
of Jacobi’s method applied to the model problem, when should the
iteration be stopped with the guarantee that
‖x⋆ − x(k)‖2 ≤ 10−3?
Iterative Methods 37
8 Acceleration of Iterative Methods by Cheby-
shev Polynomials
Suppose we are trying to solve Ax = b using the iterative method
x(j+1) = M−1
(
Nx(j) + b
)
,
where A ≡ M − N and M is non-singular. Although ρ (M−1N) < 1 and the
method is converging, we would like to find a way of reaching the solution
x⋆ faster, i.e. accelerating the convergence and obtaining a more efficient
method. If the matrix M−1N has real eigenvalues, there is a general technique
for doing this.
Assume that the iterates x(0),x(1), . . . ,x(k) have been computed from the
basic method above. We would like to find constants c0(k), c1(k), . . . , ck(k)
so that
y(k) =
k∑
j=0
cj(k)x
(j)
is the best possible approximation to the unknown solution x⋆. [Much bet-
ter than x(k), which would correspond to taking ck(k) = 1 and the other
constants as zero.] We see at once that there must be a condition on the
choice of constants, because if x(0) were actually equal to x⋆ then also
x(j) = x⋆ j = 1, . . . , k and so we must have
k∑
j=0
cj(k) = 1
to ensure that y(k) also equals x⋆ in this case. Apart from this condition, the
constants should be chosen to make the error
x⋆ − y(k) = x⋆ −
k∑
j=0
cj(k)x
(j)
=
k∑
j=0
cj(k)
{
x⋆ − x(j)}
as small as possible.
38 Gerald Moore
Looking more closely at this error, we see that
x⋆ − y(k) =
k∑
j=0
cj(k)
{
x⋆ − x(j)}
=
k∑
j=0
cj(k)
(
M
−1
N
)j {
x⋆ − x(0)}
= pk
(
M
−1
N
)
e(0),
where pk(t) is the k
th degree polynomial
ck(k)t
k + ck−1(k)tk−1 + · · ·+ c1(k)t+ c0(k)
and our condition above means that pk(1) must equal 1. The way we shall
make x⋆ − y(k) small is by choosing the constants c0(k), c1(k), . . . , ck(k) so
that the eigenvalues of the matrix
pk(M
−1
N) ≡ c0(k)I + c1(k)M−1N + · · ·+ ck(k)
(
M
−1
N
)k
are small. [It is a simple mathematical result that, if λ1, . . . , λn are the eigen-
values of M−1N, then the eigenvalues of pk(M−1N) are pk(λ1), . . . , pk(λn).] It
would be impractical, however, to assume that all the eigenvalues of M−1N
are known, and we only make the more realistic assumption that an interval
[α, β], with
−1 < α ≤ β < 1,
is known, within which all the eigenvalues of M−1N must lie, i.e.
α ≤ λi ≤ β i = 1, . . . , n.
Thus the question we want to answer is:-
how to choose the kth degree polynomial pk(t), i.e. the constants
c0(k), c1(k), . . . , ck(k), so that pk(1) = 1 but
max
α≤t≤β
{|pk(t)|}
is as small as possible?
Fortunately there is a ready-made mathematical solution to this question.
The Chebyshev polynomial of degree k, Tk(t), is defined by the recurrence
relation
T0(t) = 1
T1(t) = t
Tk(t) = 2tTk−1(t)− Tk−2(t) k = 2, 3, 4, . . .
Iterative Methods 39
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
t
T n
Chebyshev Polynomials
T
n+1(t) = 2t Tn(t) − Tn−1(t)
n=1
n=2
n=3
n=4
It is known that Tk(t) has the smallest maximum modulus over [−1, 1],
max
−1≤t≤1
{|Tk(t)|} ,
from amongst all kth degree polynomials satisfying the normalising condition
that the coefficient of tk equals 2k−1, and that this maximum is 1. [See the
graph for the first few Chebyshev polynomials.] Consequently, by a change-
of-variable,
Tk
(
2t− β − α
β − α
)
is the kth degree polynomial which has smallest maximum modulus over [α, β]
subject to the condition that the coefficient of tk is 22k−1/(β−α)k; and hence
p⋆k(t) =
Tk(
2t−β−α
β−α )
Tk(
2−β−α
β−α )
is the kth degree polynomial which has smallest maximum modulus over [α, β]
subject to the condition that its value at t = 1 is 1. Our coefficients can thus
be obtained.
Example: If α = −1
2
and β = 1
2
then our polynomial is Tk(2t)
Tk(2)
.
k Tk(2t)/Tk(2) cj(k)
1 t c0(1) = 0 c1(1) = 1
2 (8t2 − 1)/7 c0(2) = −1/7 c1(2) = 0 c2(2) = 8/7
3 (32t3 − 6t)/26 c0(3) = c2(3) = 0 c1(3) = −3/13 c3(3) = 16/13
40 Gerald Moore
To give a bound on the rate of convergence of this method we note that
ρ
(
p⋆k(M
−1
N)
)
= max
1≤i≤n
{|p⋆k(λi)|}
≤ max
α≤λ≤β
{|p⋆k(λ)|}
=
1
Tk(
2−β−α
β−α )
,
because
max
α≤t≤β
{
Tk(
2t− β − α
β − α )
}
= max
−1≤t≤1
{Tk(t)} = 1.
Thus, because k iterations are required to obtain x(0), . . . ,x(k) and y(k), our
average rate of convergence for the accelerated method is{
1
Tk(
2−β−α
β−α )
} 1
k
.
We hope that this will be significantly better than the convergence rate of
the basic method, i.e. ρ(M−1N) = max{|α|, |β|}.
Example: With Jacobi’s method applied to the model problem
(which is the same as Richardson’s method with optimal scaling),
we have the tight bounds
α = 1− 2 cos2 pih
2
β = 1− 2 sin2 pih
2
= − cos(pih) = cos(pih)
for the eigenvalues of M−1N on p.13. Consequently 2−β−α
β−α =
1
cos(πh)
and we can generate the following table for {1/Tk(sec(pih))} 1k .
k : m 10 20 50 100 200
2 .90866 .97582 .99607 .99901 .99975
5 .82790 .94492 .99032 .99754 .99938
10 .77859 .91157 .98159 .99515 .99877
20 .75216 .88412 .96841 .99076 .99757
These results should be compared with ρJ given on p.24.
Of course if M−1N is symmetric, we also have an error bound in the 2-norm,
i.e.
‖x⋆ − y(k)‖2 ≤
1
Tk(
2−β−α
β−α )
‖x⋆ − x(0)‖2.
Iterative Methods 41
At first sight a practical drawback of this accelerated iterative method
seems to be that it is necessary to keep and store x(0),x(1), . . . ,x(k) in order
to compute y(k). By using the three-term recurrence relation between the
Chebyshev polynomials
Tk(t) = 2tTk−1(t)− Tk−2(t),
however, we can obtain a very efficient formula for y(k) in terms of y(k−1)
and y(k−2). From the formula
Tk(
2t− β − α
β − α ) = 2
{
2t− β − α
β − α
}
Tk−1(
2t− β − α
β − α )− Tk−2(
2t− β − α
β − α ),
it follows that
Tk(
2t−β−α
β−α )
Tk(
2−β−α
β−α )
= 2
{
2t− β − α
β − α
}
Tk−1(
2−β−α
β−α )
Tk(
2−β−α
β−α )
Tk−1(
2t−β−α
β−α )
Tk−1(
2−β−α
β−α )
− Tk−2(
2−β−α
β−α )
Tk(
2−β−α
β−α )
Tk−1(
2t−β−α
β−α )
Tk−1(
2−β−α
β−α )
and so
p⋆k(t) = 2
{
2t− β − α
β − α
}
Tk−1(θ)
Tk(θ)
p⋆k−1(t)−
Tk−2(θ)
Tk(θ)
p⋆k−2(t),
where θ = 2−β−α
β−α . Consequently
x⋆ − y(k) = p⋆k(M−1N)(x⋆ − x(0))
= 2
{
2M−1N− β − α
β − α
}
Tk−1(θ)
Tk(θ)
p⋆k−1(M
−1
N)(x⋆ − x(0))
−Tk−2(θ)
Tk(θ)
p⋆k−2(M
−1
N)(x⋆ − x(0))
= 2
{
2M−1N− β − α
β − α
}
Tk−1(θ)
Tk(θ)
(x⋆ − y(k−1))
−Tk−2(θ)
Tk(θ)
(x⋆ − y(k−2)).
Now, using the relation
1 = 2θ
Tk−1(θ)
Tk(θ)
− Tk−2(θ)
Tk(θ)
42 Gerald Moore
and the notation
ωk = 2θ
Tk−1(θ)
Tk(θ)
,
we have
x⋆ − y(k) = 2
{
2M−1N− 2I
β − α
}
Tk−1(θ)
Tk(θ)
(x⋆ − y(k−1)) + ωk(x⋆ − y(k−1))
+(1− ωk)(x⋆− y(k−2))
=
2
θ(β − α)ωk(M
−1
N− I)(x⋆ − y(k−1))− ωky(k−1)
−(1− ωk)y(k−2) + x⋆.
Since(
M
−1
N− I) (x⋆ − y(k−1)) = M−1 (Ay(k−1) − Ax⋆) = M−1 (Ay(k−1) − b) ,
the final neat formula is
y(k) = ωk
(
y(k−1) − y(k−2) + γz(k−1))+ y(k−2)
for k ≥ 2, where γ = 2
θ(β−α) =
2
2−β−α and Mz
(k−1) = b − Ay(k−1). The
starting values for this accelerated iteration are
y(0) = x(0)
and
y(1) =
2
2− β − αx
(1) − β + α
2− β − αx
(0) p⋆1(t) =
2t−β−α
2−β−α
=
2
2− β − α
{
x(0) + M−1(b− Ax(0))}− β + α
2− β − αx
(0)
= x(0) + γM−1(b− Ax(0)).
There is also a simple formula for the {ωk} since the relation between Cheby-
shev polynomials means that
Tk(θ)
Tk−1(θ)
= 2θ − Tk−2(θ)
Tk−1(θ)
and thus
ωk = 2θ
Tk−1(θ)
Tk(θ)
=
1
1− ωk−1/(4θ2) ,
Iterative Methods 43
with
ω1 = 2θ
T0(θ)
T1(θ)
= 2.
The following algorithm shows how the accelerated iteration proceeds.
read its;
read α, β;
γ ← 2
2−β−α ;
θ ← 2−β−α
β−α ;
ω ← 2;
read yO;
z ← M−1(b− AyO);
yN ← yO + γz;
print yN ;
for j = 1, . . . , its do
ω ← 1
1−ω/(4θ2) ;
z ← M−1(b− AyN);
z ← ω(yN − yO + γz) + yO;
yO ← yN ;
yN ← z;
print yN ;
end j.
Notice that the original iteration, with splitting A = M − N, only appears
through solutions of systems of equations with coefficient matrix M.
Finally, we shall briefly talk about the application of the acceleration
technique to particular iterative methods.
8.1 Richardson’s method
If A is symmetric, positive-definite [with eigenvalues 0 < λ1 ≤ . . . ≤ λn] and
the optimal scaling for Richardson’s method is used, see p.13, then
α =
λ1 − λn
λ1 + λn
, β =
λn − λ1
λn + λ1
and so
γ =
2
2− β − α = 1 , θ =
2− β − α
β − α =
λn + λ1
λn − λ1 .
44 Gerald Moore
Hence the Chebyshev acceleration of the optimally scaled Richardson’s method
is
y(k) = ωk
{
y(k−1) − y(k−2) + 2
λn + λ1
(b− Ay(k−1))
}
+ y(k−2).
8.2 Jacobi’s method
If Jacobi’s method converges and A is symmetric with positive diagonal el-
ements [e.g. A could be symmetric, positive-definite] then Chebyshev accel-
eration can be applied, because
M
−1
N = I− D−1A
has real eigenvalues. This is shown in three steps:-
1. D−
1
2AD
− 1
2 is symmetric and so has real eigenvalues,
2. D−1A = D−
1
2
(
D
− 1
2AD
− 1
2
)
D
1
2 has the same eigenvalues as D−
1
2AD
− 1
2
and so D−1A has real eigenvalues,
3. if λ is an eigenvalue of I − D−1A then 1 − λ is an eigenvalue of D−1A
and so I− D−1A has real eigenvalues.
To accelerate Jacobi’s method, we replace M by D in the algorithm on p.43.
8.3 Gauss-Seidel method
In general the eigenvalues of the Gauss-Seidel iteration matrix cannot be
guaranteed real, even for symmetric, positive-definite A. The eigenvalues will
be real,however, for the small but important class of problems [including the
model problem] mentioned on p.21, for which the optimal choice of relaxation
parameter is known. This is true for both the natural ordering and the red-
black ordering, although usually the latter will give better results. In either
case it can be proved mathematically that
α = 0 , β = ρ2J
are sharp bounds for the eigenvalues of the iteration matrix.
Example: For the model problem we would have α = 0 and
β = cos2 pih. Consequently
γ =
2
2− cos2 pih , θ =
2− cos2 pih
cos2 pih
Iterative Methods 45
and the following table can be generated for the average rate of
convergence { 1
Tk(2 sec2(πh)−1)}
1
k .
k : m 10 20 50 100 200
1 .82566 .95223 .99215 .99803 .99951
2 .71912 .91070 .98447 .99607 .99901
5 .60615 .83095 .96351 .99033 .99755
10 .56575 .78167 .93781 .98160 .99516
20 .54648 .75518 .91264 .96843 .99076
These results should be compared with those in the tables given
on p.24 & p.40.
8.4 SOR method
Unfortunately, even for the model problem, Gauss-Seidel with optimal relax-
ation parameter does not have real eigenvalues and so the Chebyshev accel-
eration cannot be applied. There is, nevertheless, a closely related iterative
method which consists of two parts:-
• an ordinary SOR iteration with the splitting
M =
1
ω
D− L N = 1− ω
ω
+ U,
• a backwards SOR iteration with the splitting
M =
1
ω
D− U N = 1− ω
ω
+ L.
Thus the total iteration consists of
x(k+
1
2
) = x(k) + (
1
ω
D− L)−1(b− Ax(k))
x(k+1) = x(k+
1
2
) + (
1
ω
D− U)−1(b− Ax(k+ 12 ))
or in compact form [solving the triangular systems by substitution]
x
(k+ 1
2
)
i = (1− ω)x(k)i + ω
bi −∑
j=1
j>i
aijx
(k)
j −
∑
j=1
j<i
aijx
(k+ 1
2
)
j
 /aii
46 Gerald Moore
for i = 1, . . . , n and
x
(k+1)
i = (1− ω)x
(k+ 1
2
)
i + ω
bi −∑
j=1
j<i
aijx
(k+ 1
2
)
j −
∑
j=1
j>i
aijx
(k+1)
j
 /aii
for i = n, . . . , 1. This is called the SSOR method (Symmetric Successive
Over-Relaxation). As we saw in section 5:-
1. 0 < ω < 2 is a necessary condition for SSOR to converge,
2. if A is symmetric positive-definite then 0 < ω < 2 is sufficient for SSOR
to converge.
The choice of ω, however, is not so critical here and for many problems the
value
ω =
2
1 +
√
2(1− ρJ)
is almost optimal and gives
ρSSOR ≤ 1−
√
(1− ρJ)/2
1 +
√
(1− ρJ)/2
.
Example: For the model problem the SSOR iteration, with nat-
ural ordering, is
u
(k+ 1
2
)
ij = (1− ω)u(k)ij +
ω
4
{
h2f(ih, jk)+u
(k+ 1
2
)
i−1,j + u
(k)
i+1,j
+ u
(k+ 1
2
)
i,j−1 + u
(k)
i,j+1
}
for i, j = 1, . . . ,m− 1 and
u
(k+1)
ij = (1− ω)u
(k+ 1
2
)
ij +
ω
4
{
h2f(ih, jk)+u
(k+ 1
2
)
i−1,j + u
(k+1)
i+1,j
+ u
(k+ 1
2
)
i,j−1 + u
(k+1)
i,j+1
}
for i, j = m− 1, . . . , 1, with
ω =
2
1 +
√
2(1− ρJ)
=
2
1 + 2 sin(pih/2)
Iterative Methods 47
almost the optimal choice of relaxation factor. This gives
ρSSOR ≤ 1− sin(pih/2)
1 + sin(pih/2)
= 1− pih+O(h2).
Although if used on its own, SSOR is generally slower than SOR [about twice
as many iterations are required for the model problem 1−pih v. 1−2pih], the
important advantage of the SSOR method is that it often has real eigenvalues
and so can be accelerated by Chebyshev polynomials. E.g. if A is symmetric
positive-definite (A = D−L−LT ) then the SSOR iteration matrix M−1N can
be written as(
D− ωLT )−1 ([1− ω]D + ωL) (D− ωL)−1 ([1− ω]D + ωLT ) =
D
− 1
2
(
I− ωLT1
)−1
([1− ω]I + ωL1) (I− ωL1)−1
(
[1− ω]I + ωLT1
)
D
− 1
2 =
D
− 1
2
{(
I− ωLT1
)−1
([1− ω]I + ωL1)
}{
(I− ωL1)−1
(
[1− ω]I + ωLT1
)}
D
− 1
2 ,
where L1 ≡ D− 12LD− 12 , and is thus symmetric positive-definite with real eigen-
values.
Example: If SSOR with Chebyshev acceleration is applied to
the model problem, it is possible to choose
α = 0 , β =
1− sin(pih/2)
1 + sin(pih/2)
,
although these may not be the tightest possible bounds on the
eigenvalues of M−1N. Consequently,
γ =
2
2− β − α
=
2(1 + sin(pih/2))
1 + 3 sin(pih/2)
θ =
2− β − α
β − α
=
1 + 3 sin(pih/2)
1− sin(pih/2)
and the following table can be generated for the average rate of
convergence { 1
Tk(θ)
} 1k .
k : m 10 20 50 100 200
1 .57413 .74596 .88518 .94000 .96931
2 .44422 .62087 .80257 .88964 .94128
5 .36261 .51430 .69305 .80056 .87986
10 .33832 .47989 .64747 .75112 .83332
20 .32680 .46355 .62542 .72559 .80546
48 Gerald Moore
These results should be compared with those in the tables given
on p.24, p.40 & p.45.
To conclude this section, we would like to emphasise that, in general, accurate
values for the eigenvalue bounds α and β will not be known a priori. Either,
they should be computed before the iteration is started, or, an adaptive
strategy should be followed in which estimates of α and β are improved as
the iteration proceeds.Problems
P–1 Why are the values for k = 1 in the table on p.45 different from the
values of ρGS in the table on p.24.
P–2 Use the programs for Chebyshev acceleration of Jacobi and SSOR ap-
plied to the model problem to compare results with the basic Jacobi
and SOR methods.
P–3 Explain why only the “red” values for even iterates and “black” val-
ues for odd iterates need be computed when Chebyshev acceleration is
applied to Jacobi’s method for the model problem.
9 The Conjugate Gradient Method
This is an iterative method for solving Ax = b when A is a n×n, symmetric,
positive-definite matrix; although the method is guaranteed to converge in
at most n iterations (with exact arithmetic) and so it may also be considered
a direct method. When used with a “preconditioner” (see later), it is today
one of the most popular iterative methods.
The basic idea is to choose a set
{p1,p2, . . . ,pn}
of linearly independent vectors to form a basis for Rn and then to express
the unknown solution x⋆ of Ax = b in terms of this basis, i.e.
x⋆ =
n∑
j=1
α⋆jpj.
In order to compute the coefficients α⋆j easily, the vectors {p1, . . . ,pn} are
chosen to be A-conjugate, i.e.
pTi Apl = 0 i 6= l.
Iterative Methods 49
It is then easy to derive the formula
α⋆j =
pTj b
pTj Apj
j = 1, . . . , n
from
pTj b = p
T
j Ax
⋆
= pTj A
(
n∑
i=1
α⋆ipi
)
=
n∑
i=1
α⋆ip
T
j Api
= α⋆jp
T
j Apj.
The iterates x(k) of the conjugate gradient method are defined by
x(0) = 0
x(k) =
k∑
j=1
α⋆jpj k ≥ 1
and so it is clear that convergence must occur in at most n iterations. It is
also useful to note the connection between x(k) and x(k−1), i.e.
x(k) = x(k−1) + α⋆kpk
= x(k−1) +
pTk b
pTk Apk
pk.
The fundamental questions which have yet to be answered are:-
a) how to choose {p1, . . . ,pn} so that they are A-conjugate;
b) how to choose {p1, . . . ,pn} so that the iterates x(k) will get close to x⋆
quickly, i.e. n iterations is a lot for large matrices and we would like to
terminate earlier.
As the first step towards answering these questions, we note that the
solution x⋆ of Ax = b is also given by minimising the real-valued function
φ(x) ≡ 1
2
xTAx− bTx.
This is most easily seen by re-writing φ inthe form
φ(x) ≡ 1
2
(b− Ax)T A−1 (b− Ax)− 1
2
bTA−1b,
50 Gerald Moore
when it is clear that the unique minimum is given by the solution of Ax = b
because
z 6= 0⇒ zTA−1z > 0.
This idea links up with the conjugate gradient iterates because
x(k) =
k∑
j=1
α⋆jpj
is the minimum of φ(x) over all vectors of the form
x =
k∑
j=1
αjpj α1, . . . , αk arbitrary.
This follows from
φ
(
k∑
j=1
αjpj
)
=
k∑
j=1
{
α2j
2
pTj Apj − αjbTpj
}
and the condition that
∂φ
∂αj
= αjp
T
j Apj − bTpj = 0 j = 1, . . . , k
at a minimum. Hence
αj =
bTpj
pTj Apj
j = 1, . . . , k
= α⋆j
at the minimum, which means that φ is minimised by x(k). Furthermore, if
we define the residual
rk ≡ b− Ax(k)
then
pTj rk = p
T
j b− pTj A
(
k∑
i=1
α⋆ipi
)
j = 1, . . . , k
= pTj b− α⋆jpTj Apj
= 0,
a result which will often be used later. (It is interesting to note that the min-
imisation ideas in this paragraph are almost identical to the formulation of
a differential equation as a minimisation problem in finite element analysis.)
Iterative Methods 51
Now that we have linked the problem Ax = b with the minimisation of
1
2
xTAx − bTx, it is possible to develop a strategy for choosing the vectors
{p1, . . . ,pn} one-by-one. Having obtained x(k−1) we would like to choose pk
so that φ(x) is decreased as much as possible when moving from x(k−1) to
x(k). The most “down-hill” direction for φ(x) at x(k−1) is given by
−∇φ(x(k−1)) = b− Ax(k−1)
= rk−1
and so we would like to choose pk ≡ rk−1. There is, however, no reason for
rk−1 to satisfy the key A-conjugacy property. Thus, instead, we choose
pk ≡ rk−1 −
k−1∑
j=1
βjpj with βj ≡
pTj Ark−1
pTj Apj
,
which is the closest vector to rk−1 (in the 2-norm) which does satisfy the
A-conjugacy property. Hence, beginning with
x(0) = 0
p1 = r0 ≡ b,
we may compute the conjugate gradient iterates. Note that the iteration will
only breakdown if pk = 0, but this means that
rk−1 =
k−1∑
j=1
βjpj.
We have already decided, however, that
pTj rk−1 = 0 j = 1, . . . , k − 1,
from which it follows that
‖rk−1‖22 = rTk−1rk−1
=
(
k−1∑
j=1
βjpj
)T
rk−1
=
k−1∑
j=1
βjp
T
j rk−1
= 0.
52 Gerald Moore
Hence rk−1 = 0, i.e. Ax(k−1) = b and the solution x⋆ has already been
reached. Finally, note that our choice of pk means that the residuals are
orthogonal, i.e.
rTj rk = 0 j < k,
because
rj = pj+1 +
j∑
i=1
βipi
and
pTj rk = 0 j ≤ k.
The present state of our conjugate gradient algorithm is given in the
following pseudo-code.
x(0) ← 0;
for k = 1, . . . , n do
rk−1 ← b− Ax(k−1);
if rk−1 = 0 then output x(k−1) and quit;
if k = 1 then p1 ← r0;
else pk ← rk−1 −
∑k−1
j=1
pTj Ark−1
pTj Apj
pj;
α⋆k ← b
Tpk
pT
k
Apk
;
x(k) ← x(k−1) + α⋆kpk;
end k;
output x(n).
At the moment, however, this algorithm is still very inefficient. Although
only matrix-vector multiplications with A are required, there are a number
of these at each iteration. Also p1,p2, . . . need to be kept throughout the
iteration, which is totally impractical with regard to storage. We shall now
show how only the current p needs to be known and how only one matrix-
vector product is required at each iteration.
The first great gain in efficiency is obtained by noticing that
pTj Ark−1 = 0 j < k − 1.
Hence the previous p vectors do not need to be kept, and the most time-
consuming expression in our algorithm is reduced to
pk ← rk−1 −
pTk−1Ark−1
pTk−1Apk−1
pk−1.
Iterative Methods 53
The result pTj Ark−1 = 0 j < k − 1 comes from
pTj Ark−1 =
(
α⋆jApj
)T
rk−1/α⋆j
=
(
Ax(j) − Ax(j−1))T rk−1/α⋆j
=
(
Ax(j) − b + b− Ax(j−1))T rk−1/α⋆j
= (rj−1 − rj)T rk−1/α⋆j
= 0,
(†)
because the residuals are orthogonal. Other gains in efficiency are achieved
as follows.
a) Since
• x(k−1)TApk =
(∑k−1
j=1 α
⋆
jpj
)T
Apk = 0 by A-conjugacy
• rTk−1pk = rTk−1
(
rk−1 −
∑k−1
j=1 βjpj
)
= rTk−1rk−1,
we have
α⋆k =
bTpk
pTk Apk
=
(
rk−1 + Ax(k−1)
)T
pk
pTk Apk
=
rTk−1pk
pTk Apk
=
rTk−1rk−1
pTk Apk
.
b) Since pk−1 = rk−2 −
∑k−2
j=1 βjpj, we have
pTk−1Ark−1
pTk−1Apk−1
=
pTk−1Ark−1
pTk−1Ark−2
=
(rk−2 − rk−1)T rk−1
(rk−2 − rk−1)T rk−2
as in (†) above
= −r
T
k−1rk−1
rTk−2rk−2
,
using the orthogonality of the residuals.
54 Gerald Moore
c) rk = rk−1 − α⋆kApk from x(k) = x(k−1) + α⋆kpk.
Now our algorithm can be written in the following much more efficient
form.
x(0) ← 0;
r0 ← b;
for k = 1, . . . , n do
if rk−1 = 0 then output x(k−1) and quit;
if k = 1 then p1 ← r0;
else pk ← rk−1 + r
T
k−1rk−1
rT
k−2
rk−2
pk−1;
α⋆k ←
rT
k−1rk−1
pT
k
Apk
;
x(k) ← x(k−1) + α⋆kpk;
rk ← rk−1 − α⋆kApk;
end k;
output x(n).
The only matrix-vector product formed at the kth iteration is Apk, and only
the vectors x(k), rk,pk,Apk and the scalars r
T
k rk, r
T
k−1rk−1 need to be carried.
We shall briefly explain some of the surprising convergence properties of
the conjugate gradient method. These are based on the expression
φ(x) = 1
2
(x⋆ − x)T A (x⋆ − x)− 1
2
bTA−1b,
which shows that x(k) not only minimises φ(x) over all vectors of the form
x ≡
k∑
j=1
αjpj with α1, . . . , αk arbitrary,
but also minimises 1
2
(x⋆ − x)T A (x⋆ − x) over these vectors. If we denote
this subspace by Pk then(
x⋆ − x(k))T A (x⋆ − x(k)) = min
x∈Pk
{
(x⋆ − x)T A (x⋆ − x)
}
= min
x∈Pk
{
(b− Ax)T A−1(b− Ax)
}
,
with the left-hand side giving the error in the kth iterate and the right-hand
side showing how small this can be. To make further progress, we observe
that Pk is also the set of all vectors of the form
k∑
j=1
γjA
j−1b γ1, . . . , γk arbitrary,
Iterative Methods 55
because
p1 = b
p2 = r1 +
[
p1 term
]
=
[
Ab term
]
+
[
b term
]
p3 = r2 +
[
p2 terms
]
=
[
A
2b term
]
+
[
b,Ab terms
]
...
(I.e. we are saying that {p1, . . . ,pk} and {b,Ab, . . . ,Ak−1b} span the same
subspace Pk.) Consequently(
x⋆ − x(k))T A (x⋆ − x(k)) = min
x∈Pk
{
(b− Ax)T A−1 (b− Ax)
}
= min
γ1,...,γk

(
b−
k∑
j=1
γjA
jb
)T
A
−1
(
b−
k∑
j=1
γjA
jb
)
= min
{
[pk(A)b]
T
A
−1 [pk(A)b]
}
,
where the minimum in the last expression is taken over all polynomials pk(t)
of degree k or less which satisfy pk(0) = 1, i.e. pk(t) ≡ 1 +
∑k
j=1 γjt
j.
If {λi}ni=1 are the eigenvalues of A, {ui}ni=1 the corresponding orthonormal
eigenvectors, and the expansion for b in terms of the eigenvectors is
b =
n∑
i=1
ηiui;
then
pk(A)b =
n∑
i=1
pk(λi)ηiui,
A
−1[pk(A)b] = n∑
i=1
pk(λi)
λi
ηiui,
[
pk(A)b
]T
A
−1[pk(A)b] = n∑
i=1
pk(λi)
2
λi
η2i .
Hence it follows that(
x⋆ − x(k))T A (x⋆ − x(k)) = min{ n∑
i=1
pk(λi)
2
λi
η2i
)
,
with the minimum taken over all polynomials pk(t) of degree k or less satis-
fying pk(0) = 1. This formula immediately enables us to make a number of
comments about the convergence of the conjugate gradient method.
56 Gerald Moore
a) If A has only s distinct eigenvalues λ1, . . . , λs then we can find an s
th
degree polynomial which vanishes at each eigenvalue and equals 1 at
zero, i.e.
ps(t) ≡ (λ1 − t)(λ2 − t) . . . (λs − t)
λ1λ2 . . . λs
.
So
(
x⋆ − x(s))T A (x⋆ − x(s)) = 0 and the solution x⋆ is found in at
most s iterations.
b) If b has only s non-zero eigenvector components then, by the same
reasoning, x⋆ is found in at most s iterations.
c) If the eigenvalues of A cluster in s groups, we can find an sth degree
polynomial which is small on each of these groups and equals 1 at zero:
the graph shows an example with s = 3. So
(
x⋆ − x(s))T A (x⋆ − x(s))
0 0.5 1 1.5 2 2.5 3 3.5
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
t
p 3
(t)
Fitting three clusters of eigenvalues
° ° ° ° ° ° ° ° °
must be small and x(s) will be close to the solution x⋆.
d) If the eigenvalues of A are 0 < λ1 ≤ · · · ≤ λn, we know that the
polynomial pk(t) of degree k or less which has the smallest maximum
modulus on the interval [λ1, λn], subject to the condition pk(0) = 1, is
the scaled Chebyshev polynomial
p⋆k(t) ≡
Tk
(
2t−λn−λ1
λn−λ1
)
Tk
(
−λn−λ1
λn−λ1
) .
Iterative Methods 57
Additionally, the maximum modulus of this polynomial on [λ1, λn] is
1
Tk
(
λn+λ1
λn−λ1
) .
So (
x⋆ − x(k))T A (x⋆ − x(k)) ≤ 1
Tk
(
λn+λ1
λn−λ1
)2 n∑
i=1
η2i
λi
=
1
Tk
(
λn+λ1
λn−λ1
)2bTA−1b
=
1
Tk
(
λn+λ1
λn−λ1
)2x⋆TAx⋆.
Using the result that
λ1x
Tx ≤ xTAx ≤ λnxTx ∀x ∈ Rn,
we can therefore write down the relative error result
‖x⋆ − x(k)‖2
‖x⋆‖2 ≤
√
λn√
λ1
1
Tk
(
λn+λ1
λn−λ1
) .
It can be proved mathematically that
Tk(t) >
(
t+
√
t2 − 1)k
2
if t > 1
and, since
λn + λ1
λn − λ1 +
√(
λn + λ1
λn − λ1
)2
− 1 =
√
λn +
√
λ1√
λn −
√
λ1
,
our relative error result can be written
‖x⋆ − x(k)‖2
‖x⋆‖2 ≤ 2
√
λn√
λ1
(√
λn −
√
λ1√
λn +
√
λ1
)k
= 2
√
cond2(A)
(√
cond2(A)− 1√
cond2(A) + 1
)k
,
where cond2(A) ≡ λnλ1 .
58 Gerald Moore
This last comment is generally a pessimistic result, although it does show
that the conjugate gradient method is expected to converge faster for ma-
trices A with smaller condition numbers, i.e. a smaller relative spread of
eigenvalues. It is important to realise that, unlike the Chebyshev accelera-
tion method, it is not necessary to know anything about the eigenvalues in
order to apply the conjugate gradient method, i.e. there are no parameters
to estimate.
Finally, it is not necessary for the starting value x(0) in the conjugate
gradient method to be zero. If a good approximation to x⋆ is known, then
this can be used in the algorithm on page 54. Only the first two statements
need to be changed, i.e.
r0 ← b− Ax(0),
and all the results go across with minor adjustments.
9.1 Preconditioning
Unfortunately, for large n, inexact arithmetic leads to a loss of A-conjugacy
for the vectors {pj} and exact convergence in at most n iterations can no
longer be expected. In any case, for large n we would like x(k) to be close
to the solution x⋆ for k much smaller than n. This leads us to consider the
conjugate gradient method as a genuine iterative method, and even with in-
exact arithmetic it remains true that the convergence rate is governed by the
condition of A, or more precisely the grouping of the eigenvalues as discussed
earlier. Unfortunately, the coefficient matrix obtained after discretising a
standard differential equation is usually unpleasant in this respect.
Example: Our model problem has coefficient matrix A with
eigenvalues
4
{
sin2
kpih
2
+ sin2
lpih
2
}
k = 1, . . . ,m− 1
l = 1, . . . ,m− 1
as described on page 13. As h decreases, the eigenvalues spread
out further and
cond2(A) = cot
2 pih
2
=
4
pi2h2
(
1 +O(h2)
)
Iterative Methods 59
increases. The convergence rate on page 57 depends on√
cond2(A)− 1√
cond2(A) + 1
=
cot πh
2
− 1
cot πh
2
+ 1
= 1− pih+O(h2),
and thus deteriorates as h decreases. Looking at the results on
page 24, we see that the convergence is twice as slow as SOR.
Thus the conjugate gradient method does not converge very fast for large
practical problems. It is, however, possible to improve matters.
In many cases it is better to solve(
M
−1
A
)
x =
(
M
−1b
)
rather than Ax = b, with the idea that M is a symmetric positive-definite
matrix chosen so that the conjugate gradient method works well on M−1A. M
is called a preconditioning matrix and should approximate A in the sense that
the eigenvalues of M−1A are clustered together in a small number of groups.
(In the extreme case M ≡ A is possible, but this would be impractical!) We
shall assume that the Cholesky factorisation of M is known, M = UTU, with
U containing relatively few non-zero elements and thus the direct solution of
systems
Mz = d
is easy and efficient. For example M ≡ D, the diagonal part of A, is pos-
sible but superior choices will be given later. There remains, however, one
difficulty in applying the conjugate gradient method to the system(
M
−1
A
)
x =
(
M
−1b
)
,
and this is that M−1A will not usually be a symmetric matrix. Thus instead
we apply the conjugate gradient method to the system
Ây = c, (‡)
where
Â ≡ U−TAU−1
c ≡ U−Tb;
noting the two key points
a) U−TAU−1 is symmetric,
60 Gerald Moore
b) U−TAU−1 has the same eigenvalues as M−1A because
U
−1 (
U
−T
AU
−1)
U =
(
U
T
U
)−1
A
= M−1A.
If the solution of (‡) is y⋆ then the solution of Ax = b is x⋆ = U−1y⋆.
Now we apply the standard conjugate gradient method on page 54 to (‡).
Using the notation pˆk and rˆk ≡ c− Ây(k), this becomes the algorithm below.
y(0) ← 0;
rˆ0 ← c;
for k = 1, . . . , n do
if rˆk−1 = 0 then output U−1y(k−1) and quit;
if k = 1 then pˆ1 ← rˆ0;
else pˆk ← rˆk−1 + rˆ
T
k−1rˆk−1
rˆTk−2rˆk−2
pˆk−1;
α⋆k ← rˆ
T
k−1rˆk−1
pˆTk
bApˆk
;
y(k) ← y(k−1) + α⋆kpˆk;
rˆk ← rˆk−1 − α⋆kÂpˆk;
end k;
output U−1y(n).
Now, although we needed to introduce Â, y, c, rˆ and pˆ in order to obtain
the conjugate gradient method applied