Mathematical Analysis volume 2 (2)

Humanas / Sociais

Theo Lucas
em 18/04/2024
Conteúdos escolhidos para você

699 pág.
Calculus Volume 2 - Apostol

235 pág.
(Texts and Readings in Mathematics) Terence Tao - Analysis II-Springer (2016)

UFPA
259 pág.
(Universitext) Hervé Le Dret - Nonlinear Elliptic Partial Differential Equations_ An Introduction-Springer (2018)

UFPI
158 pág.
introdução à gemetria simmplética-anna canas

UFF
493 pág.
Jerry Shurman - Multivariable Calculus -Reed College (2011)

Perguntas dessa disciplina

Pergunta 2 Funções podem representar transformações entre dois conjuntos de valores. Em algumas situações, é possível reverter essa transformação,...

UNIVESP
10 As operações de adição, subtração e multiplicação também podem ser aplicadas às matrizes, desde que preenchidos certos requisitos. Para que duas...

omo o nome já sugere, o Teorema Fundamental do Cálculo é um dos resultados mais importantes do Cálculo Diferencial e Integral, fazendo uma ligação ent

UNIVESP
O Teorema de Bolzano, também conhecido como Teorema do Valor Intermediário para Zero, é um importante resultado da análise matemática que estabelec...

UNIASSELVI
Em cada questão, apresente todos os cálculos e raciocínios feitos, justificando suas respostas. Questão 1 (2,0 pontos) Assinale V para verdadeiro o...

Material
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Crie sua conta grátis para liberar esse material. 🤩

Já tem uma conta?
Ao continuar, você aceita os Termos de Uso e Política de Privacidade
Conteúdos escolhidos para você

699 pág.
Calculus Volume 2 - Apostol

235 pág.
(Texts and Readings in Mathematics) Terence Tao - Analysis II-Springer (2016)

UFPA
259 pág.
(Universitext) Hervé Le Dret - Nonlinear Elliptic Partial Differential Equations_ An Introduction-Springer (2018)

UFPI
158 pág.
introdução à gemetria simmplética-anna canas

UFF
493 pág.
Jerry Shurman - Multivariable Calculus -Reed College (2011)

Perguntas dessa disciplina

Pergunta 2 Funções podem representar transformações entre dois conjuntos de valores. Em algumas situações, é possível reverter essa transformação,...

UNIVESP
10 As operações de adição, subtração e multiplicação também podem ser aplicadas às matrizes, desde que preenchidos certos requisitos. Para que duas...

omo o nome já sugere, o Teorema Fundamental do Cálculo é um dos resultados mais importantes do Cálculo Diferencial e Integral, fazendo uma ligação ent

UNIVESP
O Teorema de Bolzano, também conhecido como Teorema do Valor Intermediário para Zero, é um importante resultado da análise matemática que estabelec...

UNIASSELVI
Em cada questão, apresente todos os cálculos e raciocínios feitos, justificando suas respostas. Questão 1 (2,0 pontos) Assinale V para verdadeiro o...

Prévia do material em texto
Mathematical Analysis
Volume II
Teo Lee Peng
Mathematical Analysis
Volume II
Teo Lee Peng
January 1, 2024
Contents i
Contents
Contents i
Preface iv
Chapter 1 Euclidean Spaces 1
1.1 The Euclidean Space Rn as a Vector Space . . . . . . . . . . . 1
1.2 Convergence of Sequences in Rn . . . . . . . . . . . . . . . . 23
1.3 Open Sets and Closed Sets . . . . . . . . . . . . . . . . . . . 33
1.4 Interior, Exterior, Boundary and Closure . . . . . . . . . . . . 46
1.5 Limit Points and Isolated Points . . . . . . . . . . . . . . . . 59
Chapter 2 Limits of Multivariable Functions and Continuity 66
2.1 Multivariable Functions . . . . . . . . . . . . . . . . . . . . . 66
2.1.1 Polynomials and Rational Functions . . . . . . . . . . 66
2.1.2 Component Functions of a Mapping . . . . . . . . . . 68
2.1.3 Invertible Mappings . . . . . . . . . . . . . . . . . . 69
2.1.4 Linear Transformations . . . . . . . . . . . . . . . . . 70
2.1.5 Quadratic Forms . . . . . . . . . . . . . . . . . . . . 74
2.2 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . 79
2.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . 121
2.5 Contraction Mapping Theorem . . . . . . . . . . . . . . . . . 127
Chapter 3 Continuous Functions on Connected Sets and Compact Sets 132
3.1 Path-Connectedness and Intermediate Value Theorem . . . . . 132
3.2 Connectedness and Intermediate Value Property . . . . . . . . 147
3.3 Sequential Compactness and Compactness . . . . . . . . . . . 161
3.4 Applications of Compactness . . . . . . . . . . . . . . . . . . 181
3.4.1 The Extreme Value Theorem . . . . . . . . . . . . . . 181
3.4.2 Distance Between Sets . . . . . . . . . . . . . . . . . 184
Contents ii
3.4.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . 191
3.4.4 Linear Transformations and Quadratic Forms . . . . . 192
3.4.5 Lebesgue Number Lemma . . . . . . . . . . . . . . . 195
Chapter 4 Differentiating Functions of Several Variables 201
4.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 201
4.2 Differentiability and First Order Approximation . . . . . . . . 221
4.2.1 Differentiability . . . . . . . . . . . . . . . . . . . . . 221
4.2.2 First Order Approximations . . . . . . . . . . . . . . 233
4.2.3 Tangent Planes . . . . . . . . . . . . . . . . . . . . . 237
4.2.4 Directional Derivatives . . . . . . . . . . . . . . . . . 238
4.3 The Chain Rule and the Mean Value Theorem . . . . . . . . . 248
4.4 Second Order Approximations . . . . . . . . . . . . . . . . . 263
4.5 Local Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . 271
Chapter 5 The Inverse and Implicit Function Theorems 285
5.1 The Inverse Function Theorem . . . . . . . . . . . . . . . . . 285
5.2 The Proof of the Inverse Function Theorem . . . . . . . . . . 298
5.3 The Implicit Function Theorem . . . . . . . . . . . . . . . . . 309
5.4 Extrema Problems and the Method of Lagrange Multipliers . . 329
Chapter 6 Multiple Integrals 343
6.1 Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . . . 344
6.2 Properties of Riemann Integrals . . . . . . . . . . . . . . . . . 376
6.3 Jordan Measurable Sets and Riemann Integrable Functions . . 389
6.4 Iterated Integrals and Fubini’s Theorem . . . . . . . . . . . . 431
6.5 Change of Variables Theorem . . . . . . . . . . . . . . . . . . 450
6.5.1 Translations and Linear Transformations . . . . . . . 454
6.5.2 Polar Coordinates . . . . . . . . . . . . . . . . . . . . 466
6.5.3 Spherical Coordinates . . . . . . . . . . . . . . . . . 477
6.5.4 Other Examples . . . . . . . . . . . . . . . . . . . . . 482
6.6 Proof of the Change of Variables Theorem . . . . . . . . . . . 487
6.7 Some Important Integrals and Their Applications . . . . . . . 509
Contents iii
Chapter 7 Fourier Series and Fourier Transforms 517
7.1 Orthogonal Systems of Functions and Fourier Series . . . . . 518
7.2 The Pointwise Convergence of a Fourier Series . . . . . . . . 540
7.3 The L2 Convergence of a Fourier Series . . . . . . . . . . . . 556
7.4 The Uniform Convergence of a Trigonometric Series . . . . . 570
7.5 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . 586
Appendix A Sylvester’s Criterion 615
Appendix B Volumes of Parallelepipeds 622
Appendix C Riemann Integrability 629
References 642
Preface iv
Preface
Mathematical analysis is a standard course which introduces students to rigorous
reasonings in mathematics, as well as the theories needed for advanced analysis
courses. It is a compulsory course for all mathematics majors. It is also strongly
recommended for students that major in computer science, physics, data science,
financial analysis, and other areas that require a lot of analytical skills. Some
standard textbooks in mathematical analysis include the classical one by Apostol
[Apo74] and Rudin [Rud76], and the modern one by Bartle [BS92], Fitzpatrick
[Fit09], Abbott [Abb15], Tao [Tao16, Tao14] and Zorich [Zor15, Zor16].
This book is the second volume of the textbooks intended for a one-year course
in mathematical analysis. We introduce the fundamental concepts in a pedagogical
way. Lots of examples are given to illustrate the theories. We assume that students
are familiar with the material of calculus such as those in the book [SCW20].
Thus, we do not emphasize on the computation techniques. Emphasis is put on
building up analytical skills through rigorous reasonings.
Besides calculus, it is also assumed that students have taken introductory
courses in discrete mathematics and linear algebra, which covers topics such as
logic, sets, functions, vector spaces, inner products, and quadratic forms. Whenever
needed, these concepts would be briefly revised.
In this book, we have defined all the mathematical terms we use carefully.
While most of the terms have standard definitions, some of the terms may have
definitions defer from authors to authors. The readers are advised to check the
definitions of the terms used in this book when they encounter them. This can be
easily done by using the search function provided by any PDF viewer. The readers
are also encouraged to fully utilize the hyper-referencing provided.
Teo Lee Peng
Chapter 1. Euclidean Spaces 1
Chapter 1
Euclidean Spaces
In this second volume of mathematical analysis, we study functions defined on
subsets of Rn. For this, we need to study the structure and topology of Rn first.
We start by a revision on Rn as a vector space.
In the sequel, n is a fixed positive integer reserved to be used for Rn.
1.1 The Euclidean Space RnRnRn as a Vector Space
If S1, S2, . . ., Sn are sets, the cartesian product of these n sets is defined as the set
S = S1 × · · · × Sn =
n∏
i=1
Si = {(a1, . . . , an) | ai ∈ Si, 1 ≤ i ≤ n}
that contains all n-tuples (a1, . . . , an), where ai ∈ Si for all 1 ≤ i ≤ n.
The set Rn is the cartesian product of n copies of R. Namely,
Rn = {(x1, x2, . . . , xn) |x1, x2, . . . , xn ∈ R} .
The point (x1, x2, . . . , xn) is denoted as x, whereas x1, x2, . . . , xn are called the
components of the point x. We can define an addition and a scalar multiplication
on Rn. If x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) are in Rn, the addition of
x and y is defined as
x+ y = (x1 + y1, x2 + y2, . . . , xn + yn).
In other words, it is a componentwise addition. Given a real number α, the scalar
multiplication of α with x is given by the componentwise multiplication
αx = (αx1, αx2, . . . , αxn).
The set Rn with the addition and scalar multiplication operations is a vector
space. It satisfies the 10 axioms for a real vector space V .
Chapter 1. Euclidean Spaces 2
The 10 Axioms for a Real Vector Space VVV
Let V be a set that is equipped with two operations – the addition and the
scalar multiplication. For any two vectors u and v in V , their addition is
denoted by u + v. For a vector u in V and a scalar α ∈ R, the scalar
multiplication of v by α is denoted by αv. We say that V with the additionand scalar multiplication is a real vector space provided that the following
10 axioms are satisfied for any u, v and w in V , and any α and β in R.
Axiom 1 If u and v are in V , then u+ v is in V .
Axiom 2 u+ v = v + u.
Axiom 3 (u+ v) +w = u+ (v +w).
Axiom 4 There is a zero vector 0 in V such that
0+ v = v = v + 0 for all v ∈ V.
Axiom 5 For any v in V , there is a vector w in V such that
v +w = 0 = w + v.
The vector w satisfying this equation is called the negative of v, and is
denoted by −v.
Axiom 6 For any v in V , and any α ∈ R, αv is in V .
Axiom 7 α(u+ v) = αu+ αv.
Axiom 8 (α + β)v = αv + βv.
Axiom 9 α(βv) = (αβ)v.
Axiom 10 1v = v.
Rn is a real vector space. The zero vector is the point 0 = (0, 0, . . . , 0) with
all components equal to 0. Sometimes we also call a point x = (x1, . . . , xn) in
Chapter 1. Euclidean Spaces 3
Rn a vector, and identify it as the vector from the origin 0 to the point x.
Definition 1.1 Standard Unit Vectors
In Rn, there are n standard unit vectors e1, . . ., en given by
e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), · · · , en = (0, . . . , 0, 1).
Let us review some concepts from linear algebra which will be useful later.
Given that v1, . . . ,vk are vectors in a vector space V , a linear combination of
v1, . . . ,vk is a vector v in V of the form
v = c1v1 + · · ·+ ckvk
for some scalars c1, . . . , ck, which are known as the coefficients of the linear
combination.
A subspace of a vector space V is a subset of V that is itself a vector space.
There is a simple way to construct subspaces.
Proposition 1.1
Let V be a vector space, and let v1, . . . ,vk be vectors in V . The subset
W = {c1v1 + · · ·+ ckvk | c1, . . . , ck ∈ R}
of V that contains all linear combinations of v1, . . . ,vk is itself a vector
space. It is called the subspace of V spanned by v1, . . . ,vk.
Example 1.1
In R3, the subspace spanned by the vectors e1 = (1, 0, 0) and e3 = (0, 0, 1)
is the set W that contains all points of the form
x(1, 0, 0) + z(0, 0, 1) = (x, 0, z),
which is the xz-plane.
Next, we recall the concept of linear independence.
Chapter 1. Euclidean Spaces 4
Definition 1.2 Linear Independence
Let V be a vector space, and let v1, . . . ,vk be vectors in V . We say that
the set {v1, . . . ,vk} is a linearly independent set of vectors, or the vectors
v1, . . . ,vk are linearly independent, if the only k-tuple of real numbers
(c1, . . . , ck) which satisfies
c1v1 + · · ·+ ckvk = 0
is the trivial k-tuple (c1, . . . , ck) = (0, . . . , 0).
Example 1.2
In Rn, the standard unit vectors e1, . . . , en are linearly independent.
Example 1.3
If V is a vector space, a vector v in V is linearly independent if and only if
v ̸= 0.
Example 1.4
Let V be a vector space. Two vectors u and v in V are linearly independent
if and only if u ̸= 0, v ̸= 0, and there does not exists a constant α such that
v = αu.
Let us recall the following definition for two vectors to be parallel.
Definition 1.3 Parallel Vectors
Let V be a vector space. Two vectors u and v in V are parallel if either
u = 0 or there exists a constant α such that v = αu.
In other words, two vectors u and v in V are linearly independent if and only
if they are not parallel.
Chapter 1. Euclidean Spaces 5
Example 1.5
If S = {v1, . . . ,vk} is a linearly independent set of vectors, then for any
S ′ ⊂ S, S ′ is also a linearly independent set of vectors.
Now we discuss the concept of dimension and basis.
Definition 1.4 Dimension and Basis
Let V be a vector space, and let W be a subspace of V . If W can be
spanned by k linearly independent vectors v1, . . . ,vk in V , we say that W
has dimension k. The set {v1, . . . ,vk} is called a basis of W .
Example 1.6
In Rn, the n standard unit vectors e1, . . ., en are linearly independent and
they span Rn. Hence, the dimension of Rn is n.
Example 1.7
In R3, the subspace spanned by the two linearly independent vectors e1 =
(1, 0, 0) and e3 = (0, 0, 1) has dimension 2.
Next, we introduce the translate of a set.
Definition 1.5 Translate of a Set
If A is a subset of Rn, u is a point in Rn, the translate of the set A by the
vector u is the set
A+ u = {a+ u | a ∈ A} .
Example 1.8
In R3, the translate of the set A = {(x, y, 0) |x, y ∈ R} by the vector
u = (0, 0,−2) is the set
B = A+ u = {(x, y,−2) |x, y ∈ R}.
In Rn, the lines and the planes are of particular interest. They are closely
Chapter 1. Euclidean Spaces 6
related to the concept of subspaces.
Definition 1.6 Lines in RnRnRn
A line L in Rn is a translate of a subspace of Rn that has dimension 1. As
a set, it contains all the points x of the form
x = x0 + tv, t ∈ R,
where x0 is a fixed point in Rn, and v is a nonzero vector in Rn. The
equation x = x0 + tv, t ∈ R, is known as the parametric equation of the
line.
A line is determined by two points.
Example 1.9
Given two distinct points x1 and x2 in Rn, the line L that passes through
these two points have parametric equation given by
x = x1 + t(x2 − x1), t ∈ R.
When 0 ≤ t ≤ 1, x = x1 + t(x2 − x1) describes all the points on the line
segment with x1 and x2 as endpoints.
Figure 1.1: A Line between two points.
Chapter 1. Euclidean Spaces 7
Definition 1.7 Planes in RnRnRn
A plane W in Rn is a translate of a subspace of dimension 2. As a set, it
contains all the points x of the form
x = x0 + t1v1 + t2v2, t1, t2 ∈ R,
where x0 is a fixed point in Rn, and v1 and v2 are two linearly independent
vectors in Rn.
Besides being a real vector space, Rn has an additional structure. Its definition
is motivated as follows. Let P (x1, x2, x3) and Q(y1, y2, y3) be two points in R3.
By Pythagoras theorem, the distance between P and Q is given by
PQ =
√
(x1 − y1)2 + (x2 − y2)2 + (x3 − y3)2.
Figure 1.2: Distance between two points in R2.
Consider the triangleOPQwith verticesO, P ,Q, whereO is the origin. Then
OP =
√
x21 + x22 + x23, OQ =
√
y21 + y22 + y23.
Let θ be the minor angle between OP and OQ. By cosine rule,
PQ2 = OP 2 +OQ2 − 2×OP ×OQ× cos θ.
A straightforward computation gives
OP 2 +OQ2 − PQ2 = 2(x1y1 + x2y2 + x3y3).
Chapter 1. Euclidean Spaces 8
Figure 1.3: Cosine rule.
Hence,
cos θ =
x1y1 + x2y2 + x3y3√
x21 + x22 + x23
√
y21 + y22 + y23
. (1.1)
It is a quotient of x1y1+x2y2+x3y3 by the product of the lengths of OP and OQ.
Generalizing the expression x1y1+x2y2+x3y3 from R3 to Rn defines the dot
product. For any two vectors x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in Rn,
the dot product of x and y is defined as
x · y =
n∑
i=1
xiyi = x1y1 + x2y2 + · · ·+ xnyn.
This is a special case of an inner product.
Definition 1.8 Inner Product Space
A real vector space V is an inner product space if for any two vectors u and
v in V , an inner product ⟨u,v⟩ of u and v is defined, and the following
conditions for any u,v,w in V and α, β ∈ R are satisfied.
1. ⟨u,v⟩ = ⟨v,u⟩.
2. ⟨αu+ βv,w⟩ = α⟨u,w⟩+ β⟨v,w⟩.
3. ⟨v,v⟩ ≥ 0 and ⟨v,v⟩ = 0 if and only if v = 0.
Chapter 1. Euclidean Spaces 9
Proposition 1.2 Euclidean Inner Product on RnRnRn
On Rn,
⟨x,y⟩ = x · y =
n∑
i=1
xiyi = x1y1 + x2y2 + · · ·+ xnyn.
defines an inner product, called the standard inner product or the Euclidean
inner product.
Definition 1.9 Euclidean Space
The vector space Rn with the Euclidean inner product is called the
Euclidean n-space.
In the future, when we do not specify, Rn always means the Euclidean n-space.
One can deduce some useful identities from the three axioms of an inner
product space.
Proposition 1.3
If V is an inner product space, then the following holds.
(a) For any v ∈ V , ⟨0,v⟩ = 0 = ⟨v,0⟩.
(b) For any vectors v1, · · · ,vk, w1, · · · ,wl in V , and for any real numbers
α1, · · · , αk, β1, · · · , βl,〈
k∑
i=1
αivi,
l∑
j=1
βjwj
〉
=
k∑
i=1
l∑
j=1
αiβj⟨vi,wj⟩.
Given that V is an inner product space, ⟨v,v⟩ ≥ 0 for any v in V . For
example, for any x = (x1, x2, . . . , xn) in Rn, under the Euclidean inner product,
⟨x,x⟩ =
n∑
i=1
x2i = x21 + x22 + · · ·+ x2n ≥ 0.
When n = 3, the length of the vectorOP from the point O(0, 0, 0) to the point
Chapter 1. Euclidean Spaces 10
P (x1, x2, x3) is
OP =
√
x21 + x22 + x23 =
√
⟨x,x⟩, where x = (x1, x2, x3).
This motivates us to define to norm of a vector in an inner product space as
follows.
Definition 1.10 Norm of a Vector
Given that V is an inner product space, the norm of a vector v is defined as
∥v∥ =
√
⟨v,v⟩.
The norm of a vector in an inner product space satisfies some properties, which
follow from the axioms for an inner product space.
Proposition 1.4
Let V be an inner product space.
1. For any v in V , ∥v∥ ≥ 0 and ∥v∥ = 0 if and only if v = 0.
2. For any α ∈ R and v ∈ V , ∥αv∥ = |α| ∥v∥.
Motivated by the distance between two points in R3, we make the following
definition.
Definition 1.11 Distance Between Two Points
Given that V is an inner product space, the distance between u and v in V
is defined as
d(u,v) = ∥v − u∥ =
√
⟨v − u,v − u⟩.
For example, the distance between the points x = (x1, . . . , xn) and y =
(y1, . . . , yn) in the Euclidean space Rn is
d(x,y) =
√√√√ n∑
i=1
(xi − yi)2 =
√
(x1 − y1)2 + · · ·+ (xn − yn)2.
Chapter 1. Euclidean Spaces 11
For analysis in R, an important inequality is the triangle inequality which says
that |x + y| ≤ |x| + |y| for any x and y in R. To generalize this inequality to Rn,
we need the celebrated Cauchy-Schwarz inequality. It holds on any inner product
space.
Proposition 1.5 Cauchy-Schwarz Inequality
Given that V is an inner product space, for any u and v in V ,
|⟨u,v⟩| ≤ ∥u∥ ∥v∥.
The equality holds if and only if u and v are parallel.
Proof
It is obvious that if either u = 0 or v = 0,
|⟨u,v⟩| = 0 = ∥u∥ ∥v∥,
and so the equality holds.
Now assume that both u and v are nonzero vectors. Consider the quadratic
function f : R → R defined by
f(t) = ∥tu− v∥2 = ⟨tu− v, tu− v⟩.
Notice that f(t) = at2 + bt+ c, where
a = ⟨u,u⟩ = ∥u∥2, b = −2⟨u,v⟩, c = ⟨v,v⟩ = ∥v∥2.
The 3rd axiom of an inner product says that f(t) ≥ 0 for all t ∈ R. Hence,
we must have b2 − 4ac ≤ 0. This gives
⟨u,v⟩2 ≤ ∥u∥2∥v∥2.
Thus, we obtain the Cauchy-Schwarz inequality
|⟨u,v⟩| ≤ ∥u∥ ∥v∥.
Chapter 1. Euclidean Spaces 12
The equality holds if and only if b2 − 4ac = 0. The latter means that
f(t) = 0 for some t = α, which can happen if and only if
αu− v = 0,
or equivalently, v = αu.
Now we can prove the triangle inequality.
Proposition 1.6 Triangle Inequality
Let V be an inner product space. For any vectors v1,v2, . . . ,vk in V ,
∥v1 + v2 + · · ·+ vk∥ ≤ ∥v1∥+ ∥v2∥+ · · ·+ ∥vk∥.
Proof
It is sufficient to prove the statement when k = 2. The general case follows
from induction. Given v1 and v2 in V ,
∥v1 + v2∥2 = ⟨v1 + v2,v1 + v2⟩
= ⟨v1,v1⟩+ 2⟨v1,v2⟩+ ⟨v2,v2⟩
≤ ∥v1∥2 + 2∥v1∥∥v2∥+ ∥v2∥2
= (∥v1∥+ ∥v2∥)2 .
This proves that
∥v1 + v2∥ ≤ ∥v1∥+ ∥v2∥.
From the triangle inequality, we can deduce the following.
Corollary 1.7
Let V be an inner product space. For any vectors u and v in V ,∣∣∥u∥ − ∥v∥
∣∣ ≤ ∥u− v∥.
Express in terms of distance, the triangle inequality takes the following form.
Chapter 1. Euclidean Spaces 13
Proposition 1.8 Triangle Inequality
Let V be an inner product space. For any three points v1,v2,v3 in V ,
d(v1,v2) ≤ d(v1,v3) + d(v2,v3).
More generally, if v1,v2, . . . ,vk are k vectors in V , then
d(v1,vk) ≤
k∑
i=2
d(vi−1,vi) = d(v1,v2) + · · ·+ d(vk−1,vk).
Since we can define the distance function on an inner product space, inner
product space is a special case of metric spaces.
Definition 1.12 Metric Space
Let X be a set, and let d : X ×X → R be a function defined on X ×X .
We say that d is a metric on X provided that the following conditions are
satisfied.
1. For any x and y in X , d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y.
2. d(x, y) = d(y, x) for any x and y in X .
3. For any x, y and z in X , d(x, y) ≤ d(x, z) + d(y, z).
If d is a metric on X , we say that (X, d) is a metric space.
Metric spaces play important roles in advanced analysis. If V is an innner
product space, it is a metric space with metric
d(u,v) = ∥v − u∥.
Using the Cauchy-Schwarz inequality, one can generalize the concept of angles
to any two vectors in a real inner product space. If u and v are two nonzero vectors
in a real inner product space V , Cauchy-Schwarz inequality implies that
⟨u,v⟩
∥u∥ ∥v∥
Chapter 1. Euclidean Spaces 14
is a real number between −1 and 1. Generalizing the formula (1.1), we define the
angle θ between u and v as
θ = cos−1 ⟨u,v⟩
∥u∥ ∥v∥
.
This is an angle between 0◦ and 180◦. A necessary and sufficient condition for
two vectors u and v to make a 90◦ angle is ⟨u,v⟩ = 0.
Definition 1.13 Orthogonality
Let V be a real inner product space. We say that the two vectors u and v in
V are orthogonal if ⟨u,v⟩ = 0.
Lemma 1.9 Generalized Pythagoras Theorem
Let V be an inner product space. If u and v are orthogonal vectors in V ,
then
∥u+ v∥2 = ∥u∥2 + ∥v∥2.
Now we discuss the projection theorem.
Theorem 1.10 Projection Theorem
Let V be an inner product space, and let w be a nonzero vector in V . If v
is a vector in V , there is a unique way to write v as a sum of two vectors v1
and v2, such that v1 is parallel to w and v2 is orthogonal to w. Moreover,
for any real number α,
∥v − αw∥ ≥ ∥v − v1∥,
and the equality holds if and only if α is equal to the unique real number β
such that v1 = βw.
Chapter 1. Euclidean Spaces 15
Figure 1.4: The projection theorem.
Proof
Assume that v can be written as a sum of two vectors v1 and v2, such that
v1 is parallel to w and v2 is orthogonal to w. Since w is nonzero, there
is a real number β such that v1 = βw. Since v2 = v − v1 = v − βw is
orthogonal to w, we have
0 = ⟨v − βw,w⟩ = ⟨v,w⟩ − β⟨w,w⟩.
This implies that we must have
β =
⟨v,w⟩
⟨w,w⟩
,
and
v1 =
⟨v,w⟩
⟨w,w⟩
w, v2 = v − ⟨v,w⟩
⟨w,w⟩
w.
It is easy to check that v1 and v2 given by these formulas indeed satisfy
the requirements that v1 is parallel to w and v2 is orthogonal to w. This
establishes the existence and uniqueness of v1 and v2.
Now for any real number α,
v − αw = v − v1 + (β − α)w.
Chapter 1. Euclidean Spaces 16
Since v − v1 = v2 is orthogonal to (β − α)w, the generalized Pythagoras
theorem implies that
∥v − αw∥2 = ∥v − v1∥2 + ∥(β − α)w∥2 ≥ ∥v − v1∥2.
This proves that
∥v − αw∥ ≥ ∥v − v1∥.
The equality holds if and only if
∥(β − α)w∥ = |α− β|∥w∥ = 0.
Since ∥w∥ ≠ 0, we must have α = β.
The vector v1 in this theorem is called the projection of v onto the subspace
spanned by w.
There is a more general projection theorem where the subspaceW spanned by
w is replaced by a general subspace. We say that a vector v is orthogonal to the
subspace W if it is orthogonal to each vector w in W .
Theorem 1.11 General Projection Theorem
Let V be an inner product space, and letW be a finite dimensional subspace
of V . If v is a vector in V , there is a unique way to write v as a sum of
two vectors v1 and v2, such that v1 is in W and v2 is orthogonal to W . The
vector v1 is denoted by projWv. For any w ∈ W ,
∥v −w∥ ≥ ∥v − projWv∥,
and the equality holds if and only if w = projWv.
Sketch of Proof
If W is a k- dimensional vector space, it has a basis consists of k linearly
independent vectors w1, . . . ,wk. Since the vector v1 is in W , there are
constants c1, . . . , ck such that
v1 = c1w1 + · · ·+ ckwk.
Chapter 1. Euclidean Spaces 17
The condition v2 = v − v1 is orthogonal to W gives rise to k equations
c1⟨w1,w1⟩+ · · ·+ ck⟨wk,w1⟩ = ⟨v,w1⟩,
...
c1⟨w1,wk⟩+ · · ·+ ck⟨wk,wk⟩ = ⟨v,wk⟩.
(1.2)
Using the fact that w1, . . . ,wk are linearly independent, one can show that
the k × k matrix
A =
⟨w1,w1⟩ · · · ⟨wk,w1⟩
... . . . ...
⟨w1,wk⟩ · · · ⟨wk,wk⟩

is invertible. This shows that there is a unique c = (c1, . . . , ck) satisfying
the linear system (1.2).
If V is an inner product space, a basis that consists of mutually orthogonal
vectors are of special interest.
Definition 1.14 Orthogonal Set and Orthonormal Set
Let V be an inner product space. A subset of vectors S = {u1, . . . ,uk}
is called an orthogonal set if any two distinct vectorsui and uj in S are
orthogonal. Namely,
⟨ui,uj⟩ = 0 if i ̸= j.
S is called an orthonormal set if it is an orthogonal set of unit vectors.
Namely,
⟨ui,uj⟩ =
0 if i ̸= j,
1 if i = j
.
If S = {u1, . . . ,uk} is an orthogonal set of nonzero vectors, it is a linearly
independent set of vectors. One can construct an orthonormal set by normalizing
each vector in the set. There is a standard algorithm, known as the Gram-Schmidt
process, which can turn any linearly independent set of vectors {v1, . . . ,vk} into
Chapter 1. Euclidean Spaces 18
an orthogonal set {u1, . . . ,uk} of nonzero vectors. We start by the following
lemma.
Lemma 1.12
Let V be an inner product space, and let S = {u1, . . . ,uk} be an orthogonal
set of nonzero vectors in V that spans the subspace W . Given any vector v
in V ,
projWv =
k∑
i=1
⟨v,ui⟩
⟨ui,ui⟩
ui.
Proof
By the general projection theorem, v = v1 + v2, where v1 = projWv is in
W and v2 is orthogonal to W . Since S is a basis for W , there exist scalars
c1, c2, . . . , ck such that v1 = c1u1 + · · ·+ ckuk. Therefore,
v = c1u1 + · · ·+ ckuk + v2.
Since S is an orthogonal set of vectors and v2 is orthogonal to each ui, we
find that for 1 ≤ i ≤ k,
⟨v,ui⟩ = ci⟨ui,ui⟩.
This proves the lemma.
Theorem 1.13 Gram-Schmidt Process
Let V be an inner product space, and assume that S = {v1, . . . ,vk} is
a linearly independent set of vectors in V . Define the vectors u1, . . . ,uk
inductively by u1 = v1, and for 2 ≤ j ≤ k,
uj = vj −
j−1∑
i=1
⟨vj,ui⟩
⟨ui,ui⟩
ui.
Then S ′ = {u1, . . . ,uk} is a nonzero set of orthogonal vectors. Moreover,
for each 1 ≤ j ≤ k, the set {ui | 1 ≤ i ≤ j} spans the same subspace as
the set {vi | 1 ≤ i ≤ j}.
Chapter 1. Euclidean Spaces 19
Sketch of Proof
For 1 ≤ j ≤ k, let Wj be the subspace spanned by the set {vi | 1 ≤ i ≤ j}.
The vectors u1, . . . ,uk are constructed by letting u1 = v1, and for 2 ≤ j ≤
k,
uj = vj − projWj−1
vj.
Since {v1, . . . ,vj} is a linearly independent set, uj ̸= 0. Using induction,
one can show that span {u1, . . . ,uj} = span {v1, . . . ,vj}. By projection
theorem, uj is orthogonal to Wj−1. Hence, it is orthogonal to u1, . . . ,uj−1.
This proves the theorem.
A mapping between two vector spaces that respect the linear structures is
called a linear transformation.
Definition 1.15 Linear Transformation
Let V and W be real vector spaces. A mapping T : V → W is called
a linear transformation provided that for any v1, . . . ,vk in V , for any real
numbers c1, . . . , ck,
T (c1v1 + · · ·+ ckvk) = c1T (v1) + · · ·+ ckT (vk).
Linear transformations play important roles in multivariable analysis. In the
following, we first define a special class of linear transformations associated to
special projections.
For 1 ≤ i ≤ n, let Li be the subspace of Rn spanned by the unit vector ei. For
the point x = (x1, . . . , xn),
projLi
x = xiei.
The number xi is the ith-component of x. It will play important roles later. The
mapping from x to xi is a function from Rn to R.
Definition 1.16 Projection Functions
For 1 ≤ i ≤ n, the ith-projection function on Rn is the function πi : Rn →
R defined by
πi(x) = πi(x1, . . . , xn) = xi.
Chapter 1. Euclidean Spaces 20
Figure 1.5: The projection functions.
The following is obvious.
Proposition 1.14
For 1 ≤ i ≤ n, the ith-projection function on Rn is a linear transformation.
Namely, for any x1, . . . ,xk in Rn, and any real numbers c1, . . . , ck,
πi (c1x1 + · · ·+ ckxk) = c1πi(x1) + · · ·+ ckπi(xk).
The following is a useful inequality.
Proposition 1.15
Let x be a vector in Rn. Then
|πi(x)| ≤ ∥x∥.
At the end of this section, let us introduce the concept of hyperplanes.
Definition 1.17 Hyperplanes
In Rn, a hyperplane is a translate of a subspace of dimension n−1. In other
words, H is a hyperplane if there is a point x0 in Rn, and n − 1 linearly
independent vectors v1, v2, . . ., vn−1 such that H contains all points x of
the form
x = x0 + t1v1 + · · ·+ tn−1vn−1, (t− 1, . . . , tn−1) ∈ Rn−1.
Chapter 1. Euclidean Spaces 21
A hyperplane in R1 is a point. A hyperplane in R2 is a line. A hyperplane in
R3 is a plane.
Definition 1.18 Normal Vectors
Let v1, v2, . . ., vn−1 be linearly independent vectors in Rn, and let H be
the hyperplane
H =
{
x0 + t1v1 + · · ·+ tn−1vn−1 | (t1, . . . , tn−1) ∈ Rn−1
}
.
A nonzero vector n that is orthogonal to all the vectors v1, . . . ,vn−1 is
called a normal vector to the hyperplane. If x1 and x2 are two points on H,
then n is orthogonal to the vector v = x2 − x1. Any two normal vectors of
a hyperplane are scalar multiples of each other.
Proposition 1.16
If H is a hyperplane with normal vector n = (a1, a2, . . . , an), and x0 =
(u1, u2, . . . , un) is a point on H, then the equation of H is given by
a1(x1 − u1) + a2(x2 − u2) + · · ·+ an(xn − un) = n · (x− x0) = 0.
Conversely, any equation of the form
a1x1 + a2x2 + · · ·+ anxn = b
is the equation of a hyperplane with normal vector n = (a1, a2, . . . , an).
Example 1.10
Given 1 ≤ i ≤ n, the equation xi = c is a hyperplane with normal vector ei.
It is a hyperplane parallel to the coordinate plane xi = 0, and perpendicular
to the xi-axis.
Chapter 1. Euclidean Spaces 22
Exercises 1.1
Question 1
Let V be an inner product space. If u and v are vectors in V , show that∣∣∥u∥ − ∥v∥
∣∣ ≤ ∥u− v∥.
Question 2
Let V be an inner product space. If u and v are orthogonal vectors in V ,
show that
∥u+ v∥2 = ∥u∥2 + ∥v∥2.
Question 3
Let V be an inner product space, and let u and v be vectors in V . Show
that
⟨u,v⟩ = ∥u+ v∥2 − ∥u− v∥2
4
.
Question 4
Let V be an inner product space, and let {u1, . . . ,uk} be an orthonormal
set of vectors in V . For any real numbers α1, . . . , αk, show that
∥α1u1 + · · ·+ αkuk∥2 = α2
1 + · · ·+ α2
k.
Question 5
Let x1, x2, . . . , xn be real numbers. Show that
(a)
√
x21 + x22 · · ·+ x2n ≤ |x1|+ |x2|+ · · ·+ |xn|;
(b) |x1 + x2 + · · ·+ xn| ≤
√
n
√
x21 + x22 · · ·+ x2n.
Chapter 1. Euclidean Spaces 23
1.2 Convergence of Sequences in RnRnRn
A point in the Euclidean space Rn is denoted by x = (x1, x2, . . . , xn). When
n = 1, we just denote it by x. When n = 2 and n = 3, it is customary to denote a
point in R2 and R3 by (x, y) and (x, y, z) respectively.
The Euclidean inner product between the vectors x = (x1, x2, . . . , xn) and
y = (y1, y2, . . . , yn) is
⟨x,y⟩ = x · y =
n∑
i=1
xiyi.
The norm of x is
∥x∥ =
√
⟨x,x⟩ =
√√√√ n∑
i=1
x2i ,
while the distance between x and y is
d(x,y) = ∥x− y∥ =
√√√√ n∑
i=1
(xi − yi)2.
A sequence in Rn is a function f : Z+ → Rn. For k ∈ Z+, let ak = f(k).
Then we can also denote the sequence by {ak}∞k=1, or simply as {ak}.
Example 1.11
The sequence
{(
k
k + 1
,
2k + 3
k
)}
is a sequence in R2 with
ak =
(
k
k + 1
,
2k + 3
k
)
.
In volume I, we have seen that a sequence of real numbers {ak}∞k=1 is said to
converge to a real number a provided that for any ε > 0, there is a positive integer
K such that
|ak − a| < ε for all k ≥ K.
Notice that |ak − a| is the distance between ak and a. To define the convergence
of a sequence in Rn, we use the Euclidean distance.
Chapter 1. Euclidean Spaces 24
Definition 1.19 Convergence of Sequences
A sequence {ak} in Rn is said to converge to the point a in Rn provided
that for any ε > 0, there is a positive integer K so that for all k ≥ K,
∥ak − a∥ = d(ak, a) < ε.
If {ak} is a sequence that converges to a point a, we say that the sequence
{ak} is convergent. A sequence that does not converge to any point in Rn
is said to be divergent.
Figure 1.6: The convergence of a sequence.
As in the n = 1 case, we have the following.
Proposition 1.17
A sequence in Rn cannot converge to two different points.
Definition 1.20 Limit of a Sequence
If {ak} is a sequence in Rn that converges to the point a, we call a the limit
of the sequence. This can be expressed as
lim
k→∞
ak = a.
The following is easy to establish.
Chapter 1. Euclidean Spaces 25
Proposition 1.18
Let {ak} be a sequence in Rn. Then {ak} converges to a if and only if
lim
k→∞
∥ak− a∥ = 0.
Proof
By definition, the sequence {ak} is convergent if and only if for any ε > 0,
there is a positive integer K so that for all k ≥ K, ∥ak − a∥ < ε. This is
the definition of lim
k→∞
∥ak − a∥ = 0.
As in the n = 1 case, {akj}∞j=1 is a subsequence of {ak} if k1, k2, k3, . . . is a
strictly increasing subsequence of positive integers.
Corollary 1.19
If {ak} is a sequence in Rn that converges to the point a, then any
subsequence of {ak} also converges to a.
Example 1.12
Let us investigate the convergence of the sequence {ak} in R2 with
ak =
(
k
k + 1
,
2k + 3
k
)
that is defined in Example 1.11. Notice that
lim
k→∞
π1(ak) = lim
k→∞
k
k + 1
= 1,
lim
k→∞
π2(ak) = lim
k→∞
2k + 3
k
= 2.
It is natural for us to speculate that the sequence {ak} converges to the point
a = (1, 2).
Chapter 1. Euclidean Spaces 26
For k ∈ Z+,
ak − a =
(
− 1
k + 1
,
3
k
)
.
Thus,
∥ak − a∥ =
√
1
(k + 1)2
+
9
k2
.
By squeeze theorem,
lim
k→∞
∥ak − a∥ = 0.
This proves that the sequence {ak} indeed converges to the point a = (1, 2).
In the example above, we guess the limit of the sequence by looking at each
components of the sequence. This in fact works for any sequences.
Theorem 1.20 Componentwise Convergence of Sequences
A sequence {ak} in Rn converges to the point a if and only if for each
1 ≤ i ≤ n, the sequence {πi(ak)} converges to the point {πi(a)}.
Proof
Given 1 ≤ i ≤ n,
πi(ak)− πi(a) = πi(ak − a).
Thus,
|πi(ak)− πi(a)| = |πi(ak − a)| ≤ ∥ak − a∥.
If the sequence {ak} converges to the point a, then
lim
k→∞
∥ak − a∥ = 0.
By squeeze theorem,
lim
k→∞
|πi(ak)− πi(a)| = 0.
This proves that the sequence {πi(ak)} converges to the point {πi(a)}.
Chapter 1. Euclidean Spaces 27
Conversely, assume that for each 1 ≤ i ≤ n, the sequence {πi(ak)}
converges to the point {πi(a)}. Then
lim
k→∞
|πi(ak)− πi(a)| = 0 for 1 ≤ i ≤ n.
Since
∥ak − a∥ ≤
n∑
i=1
|πi(ak − a)| ,
squeeze theorem implies that
lim
k→∞
∥ak − a∥ = 0.
This proves that the sequence {ak} converges to the point a.
Theorem 1.20 reduces the investigations of convergence of sequences in Rn to
sequences in R. Let us look at a few examples.
Example 1.13
Find the following limit.
lim
k→∞
(
2k + 1
3k
,
(
1 +
1
k
)k
,
k√
k2 + 1
)
.
Solution
We compute the limit componentwise.
lim
k→∞
2k + 1
3k
= lim
k→∞
[(
2
3
)k
+
(
1
3
)k
]
= 0 + 0 = 0,
lim
k→∞
(
1 +
1
k
)k
= e,
lim
k→∞
k√
k2 + 1
= lim
k→∞
k
k
√
1 +
1
k2
= 1.
Chapter 1. Euclidean Spaces 28
Hence,
lim
k→∞
(
2k + 1
3k
,
(
1 +
1
k
)k
,
k√
k2 + 1
)
= (0, e, 1).
Example 1.14
Let {ak} be the sequence with
ak =
(
(−1)k,
(−1)k
k
)
.
Is the sequence convergent? Justify your answer.
Solution
The sequence {π1(ak)} is the sequence {(−1)k}, which is divergent.
Hence, the sequence {ak} is divergent.
Using the componentwise convergence theorem, it is easy to establish the
following.
Proposition 1.21 Linearity
Let {ak} and {bk} be sequences in Rn that converges to a and b
respectively. For any real numbers α and β, the sequence {αak + βbk}
converges to αa+ βb. Namely,
lim
k→∞
(αak + βbk) = αa+ βb.
Example 1.15
If {ak} is a sequence in Rn that converges to a, show that
lim
k→∞
∥ak∥ = ∥a∥.
Chapter 1. Euclidean Spaces 29
Solution
Notice that
∥ak∥ =
√
π1(ak)2 + · · ·+ πn(ak)2.
For 1 ≤ i ≤ n,
lim
k→∞
πi(ak) = πi(a).
Using limit laws for sequences in R, we have
lim
k→∞
(
π1(ak)
2 + · · ·+ πn(ak)
2
)
= π1(a)
2 + · · ·+ πn(a)
2.
Using the fact that square root function is continuous, we find that
lim
k→∞
∥ak∥ = lim
k→∞
√
π1(ak)2 + · · ·+ πn(ak)2
=
√
π1(a)2 + · · ·+ πn(a)2 = ∥a∥.
There is also a Cauchy criterion for convergence of sequences in Rn.
Definition 1.21 Cauchy Sequences
A sequence {ak} in Rn is a Cauchy sequence if for every ε > 0, there is a
positive integer K such that for all l ≥ k ≥ K,
∥al − ak∥ < ε.
Theorem 1.22 Cauchy Criterion
A sequence {ak} in Rn is convergent if and only if it is a Cauchy sequence.
Similar to the n = 1 case, the Cauchy criterion allows us to determine whether
a sequence in Rn is convergent without having to guess what is the limit first.
Chapter 1. Euclidean Spaces 30
Proof
Assume that the sequence {ak} converges to a. Given ε > 0, there is a
positive integer K such that for all k ≥ K, ∥ak − a∥ < ε/2. Then for all
l ≥ k ≥ K,
∥al − ak∥ ≤ ∥al − a∥+ ∥ak − a∥ < ε.
This proves that {ak} is a Cauchy sequence.
Conversely, assume that {ak} is a Cauchy sequence. Given ε > 0, there is
a positive integer K such that for all l ≥ k ≥ K,
∥al − ak∥ < ε.
For each 1 ≤ i ≤ n,
|πi(al)− πi(ak)| = |πi (al − ak)| ≤ ∥al − ak∥.
Hence, {πi(ak)} is a Cauchy sequence in R. Therefore, it is convergent.
By componentwise convergence theorem, the sequence {ak} is convergent.
Chapter 1. Euclidean Spaces 31
Exercises 1.2
Question 1
Show that a sequence in Rn cannot converge to two different points.
Question 2
Find the limit of the sequence {ak}, where
ak =
(
2k + 1
k + 3
,
√
2k2 + k
k
,
(
1 +
2
k
)k
)
.
Question 3
Let {ak} be the sequence with
ak =
(
1 + (−1)k−1k
1 + k
,
1
2k
)
.
Determine whether the sequence is convergent.
Question 4
Let {ak} be the sequence with
ak =
(
k
1 + k
,
k√
k + 1
)
.
Determine whether the sequence is convergent.
Question 5
Let {ak} and {bk} be sequences in Rn that converges to a and b
respectively. Show that
lim
k→∞
⟨ak,bk⟩ = ⟨a,b⟩.
Here ⟨x,y⟩ = x · y is the standard inner product on Rn.
Chapter 1. Euclidean Spaces 32
Question 6
Suppose that {ak} is a sequence in Rn that converges to a, and {ck} is a
sequence of real numbers that converges to c, show that
lim
k→∞
ckak = ca.
Question 7
Suppose that {ak} is a sequence of nonzero vectors in Rn that converges to
a and a ̸= 0, show that
lim
k→∞
ak
∥ak∥
=
a
∥a∥
.
Question 8
Let {ak} and {bk} be sequences in Rn. If {ak} is convergent and {bk} is
divergent, show that the sequence {ak + bk} is divergent.
Question 9
Suppose that {ak} is a sequence in Rn that converges to a. If r = ∥a∥ ≠ 0,
show that there is a positive integer K such that
∥ak∥ >
r
2
for all k ≥ K.
Question 10
Let {ak} be a sequence in Rn and let b be a point in Rn. Assume that the
sequence {ak} does not converge to b. Show that there is an ε > 0 and a
subsequence {akj} of {ak} such that
∥akj − b∥ ≥ ε for all j ∈ Z+.
Chapter 1. Euclidean Spaces 33
1.3 Open Sets and Closed Sets
In volume I, we call an interval of the form (a, b) an open interval. Given a point
x in R, a neighbourhood of x is an open interval (a, b) that contains x. Given a
subset S of R, we say that x is an interior point of S if there is a neighboirhood
of x that is contained in S. We say that S is closed in R provided that if {ak} is
a sequence of points in S that converges to a, then a is also in S. These describe
the topology of R. It is relatively simple.
For n ≥ 2, the topological features of Rn are much more complicated.
An open interval (a, b) in R can be described as a set of the form
B = {x ∈ R | |x− x0| < r} ,
where x0 =
a+ b
2
and r =
b− a
2
.
Figure 1.7: An open interval.
Generalizing this, we define open balls in Rn.
Definition 1.22 Open Balls
Given x0 in Rn and r > 0, an open ball B(x0, r) of radius r with center at
x0 is a subset of Rn of the form
B(x0, r) = {x ∈ Rn | ∥x− x0∥ < r} .
It consists of all points of Rn whose distance to the center x0 is less than r.
Obviously, it 0 < r1 ≤ r2, then B(x0, r1) ⊂ B(x0, r2). The following is a
useful lemma for balls with different centers.
Chapter 1. Euclidean Spaces 34
Figure 1.8: An open ball.
Lemma 1.23
Let x1 be a point in the open ball B(x0, r). Then ∥x1 − x0∥ < r. If r1 is a
positive number satisfying
r1 ≤ r − ∥x1 − x0∥,
then the open ball B(x1, r1) is contained in the open ball B(x0, r).
Figure 1.9: An open ball containing another open ball with different center.
Proof
Let x be a point in B(x1, r1). Then
∥x− x1∥ < r1 ≤ r − ∥x1 − x0∥.
Chapter 1. Euclidean Spaces 35
By triangle inequality,
∥x− x0∥ ≤ ∥x− x1∥+ ∥x1 − x0∥ < r.Therefore, x is a point in B(x0, r). This proves the assertion.
Now we define open sets in Rn.
Definition 1.23 Open Sets
Let S be a subset of Rn. We say that S is an open set if for each x ∈ S,
there is a ball B(x, r) centered at x that is contained in S.
The following example justifies that an open interval of the form (a, b) is an
open set.
Example 1.16
Let S to be the open interval S = (a, b) in R. If x ∈ S, then a < x < b.
Hence, x − a and b − x are positive. Let r = min{x − a, b − x}. Then
r > 0, r ≤ x− a and r ≤ b− x. These imply that a ≤ x− r < x+ r ≤ b.
Hence, B(x, r) = (x− r, x+ r) ⊂ (a, b) = S. This shows that the interval
(a, b) is an open set.
Figure 1.10: The interval (a, b) is an open set.
The following example justifies that an open ball is indeed an open set.
Example 1.17
Let S = B(x0, r) be the open ball with center at x0 and radius r > 0 in Rn.
Show that S is an open set.
Chapter 1. Euclidean Spaces 36
Solution
Given x ∈ S, d = ∥x − x0∥ < r. Let r1 = r − d. Then r1 > 0. Lemma
1.23 implies that the ball B(x, r1) is inside S. Hence, S is an open set.
Example 1.18
As subsets of Rn, ∅ and Rn are open sets.
Example 1.19
A one-point set S = {a} in Rn cannot be open, for there is no r > 0 such
that B(a, r) in contained in S.
Let us look at some other examples of open sets.
Definition 1.24 Open Rectangles
A set of the form
U =
n∏
i=1
(ai, bi) = (a1, b1)× · · · × (an, bn)
in Rn, which is a cartesian product of open bounded intervals, in called an
open rectangle.
Figure 1.11: A rectangle in R2.
Chapter 1. Euclidean Spaces 37
Example 1.20
Let U =
n∏
i=1
(ai, bi) be an open rectangle in Rn. Show that U is an open set.
Solution
Let x = (x1, . . . , xn) be a point in U . Then for 1 ≤ i ≤ n,
ri = min{xi − ai, bi − xi} > 0
and
(xi − ri, xi + ri) ⊂ (ai, bi).
Let r = min{r1, . . . , rn}. Then r > 0. We claim that B(x, r) is contained
in U .
If y ∈ B(x, r), then ∥y − x∥ < r. This implies that
|yi − xi| ≤ ∥y − x∥ < r ≤ ri for all 1 ≤ i ≤ n.
Hence,
yi ∈ (xi − ri, xi + ri) ⊂ (ai, bi) for all 1 ≤ i ≤ n.
This proves that y ∈ U , and thus, completes the proof that B(x, r) is
contained in U . Therefore, U is an open set.
Figure 1.12: An open rectangle is an open set.
Chapter 1. Euclidean Spaces 38
Next, we define closed sets. The definition is a straightforward generalization
of the n = 1 case.
Definition 1.25 Closed Sets
Let S be a subset of Rn. We say that S is closed in Rn provided that if {ak}
is a sequence of points in S that converges to the point a, the point a is also
in S.
Example 1.21
As subsets of Rn, ∅ and Rn are closed sets. Since ∅ and Rn are also open, a
subset S of Rn can be both open and closed.
Example 1.22
Let S = {a} be a one-point set in Rn. A sequence {ak} in S is just the
constant sequence where ak = a for all k ∈ Z+. Hence, it converges to a
which is in S. Thus, a one-point set S is a closed set.
In volume I, we have proved the following.
Proposition 1.24
Let I be intervals of the form (−∞, a], [a,∞) or [a, b]. Then I is a closed
subset of R.
Definition 1.26 Closed Rectangles
A set of the form
R =
n∏
i=1
[ai, bi] = [a1, b1]× · · · × [an, bn]
in Rn, which is a cartesian product of closed and bounded intervals, is called
a closed rectangle.
The following justifies that a closed rectangle is indeed a closed set.
Chapter 1. Euclidean Spaces 39
Example 1.23
Let
R =
n∏
i=1
[ai, bi] = [a1, b1]× · · · × [an, bn]
be a closed rectangle in Rn. Show that R is a closed set.
Solution
Let {ak} be a sequence in R that converges to a point a. For each 1 ≤ i ≤
n, {πi(ak)} is a sequence in [ai, bi] that converges to πi(a). Since [ai, bi] is
a closed set in R, πi(a) ∈ [ai, bi]. Hence, a is in R. This proves that R is a
closed set.
It is not true that a set that is not open is closed.
Example 1.24
Show that an interval of the form I = (a, b] in R is neither open nor closed.
Solution
If I is open, since b is in I , there is an r > 0 such that (b − r, b + r) =
B(b, r) ⊂ I . But then b+r/2 is a point in (b−r, b+r) but not in I = (a, b],
which gives a contradiction. Hence, I is not open.
For k ∈ Z+, let
ak = a+
b− a
k
.
Then {ak} is a sequence in I that converges to a, but a is not in I . Hence,
I is not closed.
Thus, we have seen that a subset S of Rn can be both open and closed, and it
can also be neither open nor closed.
Let us look at some other examples of closed sets.
Chapter 1. Euclidean Spaces 40
Definition 1.27 Closed Balls
Given x0 in Rn and r > 0, a closed ball of radius r with center at x0 is a
subset of Rn of the form
CB(x0, r) = {x ∈ Rn | ∥x− x0∥ ≤ r} .
It consists of all points of Rn whose distance to the center x0 is less than or
equal to r.
The following justifies that a closed ball is indeed a closed set.
Example 1.25
Given x0 ∈ Rn and r > 0, show that the closed ball
CB(x0, r) = {x ∈ Rn | ∥x− x0∥ ≤ r}
is a closed set.
Solution
Let {ak} be a sequence in CB(x0, r) that converges to the point a. Then
lim
k→∞
∥ak − a∥ = 0.
For each k ∈ Z+, ∥ak − x0∥ ≤ r. By triangle inequality,
∥a− x0∥ ≤ ∥ak − x0∥+ ∥ak − a∥ ≤ r + ∥ak − a∥.
Taking the k → ∞ limit, we find that
∥a− x0∥ ≤ r.
Hence, a is in CB(x0, r). This proves that CB(x0, r) is a closed set.
The following theorem gives the relation between open and closed sets.
Chapter 1. Euclidean Spaces 41
Theorem 1.25
Let S be a subset of Rn and let A = Rn \ S be its complement in Rn. Then
S is open if and only if A is closed.
Proof
Assume that S is open. Let {ak} be a sequence in A that converges to the
point a. We want to show that a is in A. Assume to the contrary that a
is not in A. Then a is in S. Since S is open, there is an r > 0 such that
B(a, r) is contained in S. Since the sequence {ak} converges to a, there is
a positive integer K such that for all k ≥ K, ∥ak − a∥ < r. But then this
implies that aK ∈ B(a, r) ⊂ S. This contradicts to aK is in A = Rn \ S.
Hence, we must have a is in A, which proves that A is closed.
Conversely, assume that A is closed. We want to show that S is open.
Assume to the contrary that S is not open. Then there is a point a in S such
that for every r > 0, B(a, r) is not contained in S. For every k ∈ Z+, since
B(a, 1/k) is not contained in S, there is a point ak in B(a, 1/k) such that
ak is not in S. Thus, {ak} is a sequence in A and
∥ak − a∥ < 1
k
.
This shows that {ak} converges to a. Since A is closed, a is in A, which
contradicts to a is in S. Thus, we must have S is open.
Figure 1.13: A sequence outside an open set cannot converge to a point in the
open set.
Chapter 1. Euclidean Spaces 42
Next, we consider unions and intersections of sets.
Theorem 1.26
1. Arbitrary union of open sets is open. Namely, if {Uα |α ∈ J} is a
collection of open sets in Rn, then their union U =
⋃
α∈J
Uα is also an
open set.
2. Finite intersections of open sets is open. Namely, if V1, . . . , Vk are open
sets in Rn, then their intersection V =
k⋂
i=1
Vi is also an open set.
Proof
To prove the first statement, let x be a point in U =
⋃
α∈J
Uα. Then there is
an α ∈ J such that x is in Uα. Since Uα is open, there is an r > 0 such that
B(x, r) ⊂ Uα ⊂ U . Hence, U is open.
For the second statement, let x be a point in V =
k⋂
i=1
Vi. Then for each
1 ≤ i ≤ k, x is in the open set Vi. Hence, there is an ri > 0 such that
B(x, ri) ⊂ Vi. Let r = min{r1, . . . , rk}. Then for 1 ≤ i ≤ k, r ≤ ri and
so B(x, r) ⊂ B(x, ri) ⊂ Vi. Hence, B(x, r) ⊂ V . This proves that V is
open.
As an application of this theorem, let us show that any open interval in R is
indeed an open set.
Proposition 1.27
Let I be an interval of the form (−∞, a), (a,∞) or (a, b). Then I is an
open subset of R.
Chapter 1. Euclidean Spaces 43
Proof
We have shown in Example 1.16 that if I is an interval of the form (a, b),
then I is an open subset of R. Now
(a,∞) =
∞⋃
k=1
(a, a+ k)
is a union of open sets. Hence, (a,∞) is open. In the same way, one can
show that an interval of the form (−∞, a) is open.
The next example shows that arbitrary intersectionsof open sets is not necessary
open.
Example 1.26
For k ∈ Z+, let Uk be the open set in R given by
Uk =
(
−1
k
,
1
k
)
.
Notice that the set
U =
∞⋂
k=1
Uk = {0}
is a one-point set. Hence, it is not open in R.
De Morgan’s law in set theory says that if {Uα |α ∈ J} is a collection of sets
in Rn, then
Rn \
⋃
α∈J
Uα =
⋂
α∈J
(Rn \ Uα) ,
Rn \
⋂
α∈J
Uα =
⋃
α∈J
(Rn \ Uα) .
Thus, we obtain the counterpart of Theorem 1.26 for closed sets.
Chapter 1. Euclidean Spaces 44
Theorem 1.28
1. Arbitrary intersection of closed sets is closed. Namely, if {Aα |α ∈ J}
is a collection of closed sets in Rn, then their intersection A =
⋂
α∈J
Aα is
also a closed set.
2. Finite union of closed sets is closed. Namely, if C1, . . . , Ck are closed
sets in Rn, then their union C =
k⋃
i=1
Ci is also a closed set.
Proof
We prove the first statement. The proof of the second statement is similar.
Given that {Aα |α ∈ J} is a collection of closed sets in Rn, for each α ∈ J ,
let Uα = Rn \ Aα. Then {Uα |α ∈ J} is a collection of open sets in Rn.
By Theorem 1.26, the set
⋃
α∈J
Uα is open. By Theorem 1.25, Rn \
⋃
α∈J
Uα is
closed. By De Morgan’s law,
Rn \
⋃
α∈J
Uα =
⋂
α∈J
(Rn \ Uα) =
⋂
α∈J
Aα.
This proves that
⋂
α∈J
Aα is a closed set.
The following example says that any finite point set is a closed set.
Example 1.27
Let S = {x1, . . . ,xk} be a finite point set in Rn. Then S =
k⋃
i=1
{xi} is a
finite union of one-point sets. Since one-point set is closed, S is closed.
Chapter 1. Euclidean Spaces 45
Exercises 1.3
Question 1
Let A be the subset of R2 given by
A = {(x, y) |x > 0, y > 0} .
Show that A is an open set.
Question 2
Let A be the subset of R2 given by
A = {(x, y) |x ≥ 0, y ≥ 0} .
Show that A is a closed set.
Question 3
Let A be the subset of R2 given by
A = {(x, y) |x > 0, y ≥ 0} .
Is A open? Is A closed? Justify your answers.
Question 4
Let C and U be subsets of Rn. Assume that C is closed and U is open,
show that U \ C is open and C \ U is closed.
Question 5
Let A be a subset of Rn, and let B = A + u be the translate of A by the
vector u.
(a) Show that A is open if and only if B is open.
(b) Show that A is closed if and only if B is closed.
Chapter 1. Euclidean Spaces 46
1.4 Interior, Exterior, Boundary and Closure
First, we introduce the interior of a set.
Definition 1.28 Interior
Let S be a subset of Rn. We say that x ∈ Rn is an interior point of S if
there exists r > 0 such that B(x, r) ⊂ S. The interior of S, denoted by
intS, is defined to be the collection of all the interior points of S.
Figure 1.14: The interior point of a set.
The following gives a characterization of the interior of a set.
Theorem 1.29
Let S be a subset of Rn. Then we have the followings.
1. intS is a subset of S.
2. intS is an open set.
3. S is an open set if and only if S = intS.
4. If U is an open set that is contained in S, then U ⊂ intS.
These imply that intS is the largest open set that is contained in S.
Chapter 1. Euclidean Spaces 47
Proof
Let x be a point in intS. By definition, there exists r > 0 such that
B(x, r) ⊂ S. Since x ∈ B(x, r) and B(x, r) ⊂ S, x is a point in S.
Since we have shown that every point in intS is in S, intS is a subset of S.
If y ∈ B(x, r), Lemma 1.23 says that there is an r′ > 0 such that
B(y, r′) ⊂ B(x, r) ⊂ S. Hence, y is also in intS. This proves thatB(x, r)
is contained in intS. Since we have shown that for any x ∈ intS, there
is an r > 0 such that B(x, r) is contained in intS, this shows that intS is
open.
If S = intS, S is open. Conversely, if S is open, for every x in S, there
is an r > 0 such that B(x, r) ⊂ S. Then x is in intS. Hence, S ⊂ intS.
Since we have shown that intS ⊂ S is always true, we conclude that if S
is open, S = intS.
If U is a subset of S and U is open, for every x in U , there is an r > 0 such
that B(x, r) ⊂ U . But then B(x, r) ⊂ S. This shows that x is in intS.
Since every point of U is in intS, this proves that U ⊂ intS.
Example 1.28
Find the interior of each of the following subsets of R.
(a) A = (a, b) (b) B = (a, b]
(c) C = [a, b] (d) Q
Solution
(a) Since A is an open set, intA = A = (a, b).
(b) Since A is an open set that is contained in B, A = (a, b) is contained
in intB. Since intB ⊂ B, we only left to determine whether b is in
intB. The same argument as given in Example 1.24 shows that b is not
an interior point of B. Hence, intB = A = (a, b).
Chapter 1. Euclidean Spaces 48
(c) Similar arguments as given in (b) show that A ⊂ intC, and both a and
b are not interior points of C. Hence, intC = A = (a, b).
(d) For any x ∈ R and any r > 0, B(x, r) = (x − r, x + r) contains an
irrational number. Hence, B(x, r) is not contained in Q. This shows
that Q does not have interior points. Hence, intQ = ∅.
Definition 1.29 Neighbourhoods
Let x be a point in Rn and let U be a subset of Rn. We say that U is a
neighbourhood of x if U is an open set that contains x.
Notice that this definition is slightly different from the one we use in volume I
for the n = 1 case.
Neighbourhoods
By definition, if U is a neighbourhood of x, then x is an interior point of U ,
and there is an r > 0 such that B(x, r) ⊂ U .
Example 1.29
Consider the point x = (1, 2) and the sets
U =
{
(x1, x2) |x21 + x22 < 9
}
,
V = {(x1, x2) | 0 < x1 < 2,−1 < x2 < 3}
in R2. The sets U and V are neighbourhoods of x.
Next, we introduce the exterior and boundary of a set.
Definition 1.30 Exterior
Let S be a subset of Rn. We say that x ∈ Rn is an exterior point of S if
there exists r > 0 such that B(x, r) ⊂ Rn \ S. The exterior of S, denoted
by extS, is defined to be the collection of all the exterior points of S.
Chapter 1. Euclidean Spaces 49
Figure 1.15: The sets U and V are neighbourhoods of the point x.
Definition 1.31 Boundary
Let S be a subset of Rn. We say that x ∈ Rn is a boundary point of S if for
every r > 0, the ball B(x, r) intersects both S and Rn \S. The boundary of
S, denoted by bdS or ∂S, is defined to be the collection of all the boundary
points of S.
Figure 1.16: P is an interior point, Q is an exterior point, E is a boundary point.
Chapter 1. Euclidean Spaces 50
Theorem 1.30
Let S be a subset of Rn. We have the followings.
(a) ext (S) = int (Rn \ S).
(b) bd (S) = bd (Rn \ S).
(c) intS, extS and bdS are mutually disjoint sets.
(d) Rn = intS ∪ extS ∪ bdS.
Proof
(a) and (b) are obvious from definitions.
For parts (c) and (d), we notice that for a point x ∈ Rn, exactly one of the
following three statements holds.
(i) There exists r > 0 such that B(x, r) ⊂ S.
(ii) There exists r > 0 such that B(x, r) ⊂ Rn \ S.
(iii) For every r > 0, B(x, r) intersects both S and Rn \ S.
Thus, intS, extS and bdS are mutually disjoint sets, and their union is Rn.
Example 1.30
Find the exterior and boundary of each of the following subsets of R.
(a) A = (a, b) (b) B = (a, b]
(c) C = [a, b] (d) Q
Solution
We have seen in Example 1.28 that
intA = intB = intC = (a, b).
Chapter 1. Euclidean Spaces 51
For any r > 0, the ball B(a, r) = (a − r, a + r) contains a point less than
a, and a point larger than a. Hence, a is a boundary point of the sets A, B
and C. Similarly, b is a boundary point of the sets A, B and C.
For every point x which satisfies x < a, let r = a − x. Then r > 0. Since
x+r = a, the ballB(x, r) = (x−r, x+r) is contained in (−∞, a). Hence,
x is an exterior point of the sets A, B and C. Similarly every point x such
that x > b is an exterior point of the sets A, B and C.
Since the interior, exterior and boundary of a set in R are three mutually
disjoint sets whose union is R, we conclude that
bdA = bdB = bdC = {a, b},
extA = extB = extC = (−∞, a) ∪ (b,∞).
For every x ∈ R and every r > 0, the ball B(x, r) = (x−r, x+r) contains
a point in Q and a point not in Q. Therefore, x is a boundary point of Q.
This shows that bdQ = R, and thus, extQ = ∅.
Example 1.31
Let A = B(x0, r), where x0 is a point in Rn, and r is a positive number.
Find the interior, exterior and boundary of A.
Solution
Wehave shown that A is open. Hence, intA = A. Let
U = {x ∈ Rn | ∥x− x0∥ > r} , C = {x ∈ Rn | ∥x− x0∥ = r} .
Notice that A, U and C are mutually disjoint sets whose union is Rn.
If x is in U , d = ∥x−x0∥ > r. Let r′ = d−r. Then r′ > 0. If y ∈ B(x, r′),
then ∥y − x∥ < r′. It follows that
∥y − x0∥ ≥ ∥x− x0∥ − ∥y − x∥ > d− r′ = r.
This proves that y ∈ U . Hence, Bd(x, r
′) ⊂ U ⊂ Rn \A, which shows that
x is an exterior point of A. Thus, U ⊂ extA.
Chapter 1. Euclidean Spaces 52
Now if x ∈ C, ∥x − x0∥ = r. For every r′ > 0, let a =
1
2
min{r′/r, 1}.
Then a ≤ 1
2
and a ≤ r′
2r
. Consider the point
v = x− a(x− x0).
Notice that
∥v − x∥ = ar ≤ r′
2
< r′.
Thus, v is in B(x, r′). On the other hand,
∥v − x0∥ = (1− a)r < r.
Thus, v is inA. This shows thatB(x, r′) intersectsA. Since x is inB(x, r′)
but not inA, we find thatB(x, r′) intersects Rn\A. Hence, x is a boundary
point of A. This shows that C ⊂ bdA.
Since intA, extA and bdA are mutually disjoint sets, we conclude that
intA = A, extA = U and bdA = C.
Now we introduce the closure of a set.
Definition 1.32 Closure
Let S be a subset of Rn. The closure of S, denoted by S, is defined as
S = intS ∪ bdS.
Example 1.32
Example 1.31 shows that the closure of the open ball B(x0, r) is the closed
ball CB(x0, r).
Example 1.33
Consider the sets A = (a, b), B = (a, b] and C = [a, b] in Example 1.28
and Example 1.30. We have shown that intA = intB = intC = (a, b), and
bdA = bdB = bdC = {a, b}. Therefore, A = B = C = [a, b].
Chapter 1. Euclidean Spaces 53
Since Rn is a disjoint union of intS, bdS and extS, we obtain the following
immediately from the definition.
Theorem 1.31
Let S be a subset of Rn. Then S and extS are complement of each other in
Rn.
The following theorem gives a characterization of the closure of a set.
Theorem 1.32
Let S be a subset of Rn, and let x be a point in Rn. The following statements
are equivalent.
(a) x ∈ S.
(b) For every r > 0, B(x, r) intersects S.
(c) There is a sequence {xk} in S that converges to x.
Proof
If x is in S, x is not in int (Rn \ S). Thus, for every r > 0, B(x, r) is not
contained in Rn \ S. Then it must intersect S. This proves (a) implies (b).
If (b) holds, for every k ∈ Z+, take r = 1/k. The ball B(x, 1/k) intersects
S at some point xk. This gives a sequence {xk} satisfying
∥xk − x∥ < 1
k
.
Thus, {xk} is a sequences in S that converges to x. This proves (b) implies
(c).
If (c) holds, for every r > 0, there is a positive integer K such that for all
k ≥ K, ∥xk − x∥ < r, and thus xk ∈ B(x, r). This shows that B(x, r) is
not contained in Rn \ S. Hence, x /∈ extS, and thus we must have x ∈ S.
This proves (c) implies (a).
The following theorem gives further properties of the closure of a set.
Chapter 1. Euclidean Spaces 54
Theorem 1.33
Let S be a subset of Rn.
1. S is a closed set that contains S.
2. S is closed if and only if S = S.
3. If C is a closed subset of Rn and S ⊂ C, then S ⊂ C.
These imply that S is the smallest closed set that contains S.
Proof
These statements are counterparts of the statements in Theorem 1.29.
Since extS = int (Rn \ S), and the interior of a set is open, extS is open.
Since S = Rn \ extS, S is a closed set. Since extS ⊂ Rn \ S, we find that
S = Rn \ extS ⊃ S.
If S = S, then S must be closed since S is closed. Conversely, if S is
closed, Rn \ S is open, and so extS = int (Rn \ S) = Rn \ S. It follows
that S = Rn \ extS = S.
If C is a closed set that contains S, then Rn \ C is an open set that is
contained in Rn \S. Thus, Rn \C ⊂ int (Rn \S) = extS. This shows that
C ⊃ Rn \ extS = S.
Corollary 1.34
If S be a subset of Rn, S = S ∪ bdS.
Proof
Since intS ⊂ S, S = intS ∪ bdS ⊂ S ∪ bdS. Since S and bdS are both
subsets of S, S ∪ bdS ⊂ S. This proves that S = S ∪ bdS.
Chapter 1. Euclidean Spaces 55
Example 1.34
Let U be the open rectangle U =
n∏
i=1
(ai, bi) in Rn. Show that the closure
of U is the closed rectangle R =
n∏
i=1
[ai, bi].
Solution
Since R is a closed set that contains U , U ⊂ R.
If x = (x1, . . . , xn) is a point in R, then xi ∈ [ai, bi] for each 1 ≤ i ≤ n.
Since [ai, bi] is the closure of (ai, bi) in R, there is a sequence {xi,k}∞k=1 in
(ai, bi) that converges to xi. For k ∈ Z+, let
xk = (x1,k, . . . , xn,k).
Then {xk} is a sequence in U that converges to x. This shows that x ∈ U ,
and thus completes the proof that U = R.
The proof of the following theorem shows the usefulness of the characterization
of intS as the largest open set that is contained in S, and S is the smallest closed
set that contains S.
Theorem 1.35
If A and B are subsets of Rn such that A ⊂ B, then
(a) intA ⊂ intB; and
(b) A ⊂ B.
Proof
Since intA is an open set that is contained in A, it is an open set that is
contained in B. By the fourth statement in Theorem 1.29, intA ⊂ intB.
Since B is a closed set that contains B, it is a closed set that contains A. By
the third statement in Theorem 1.33, A ⊂ B.
Notice that as subsets of R, (a, b) ⊂ (a, b] ⊂ [a, b]. We have shown that
Chapter 1. Euclidean Spaces 56
(a, b) = (a, b] = [a, b]. In general, we have the following.
Theorem 1.36
If A and B are subsets of Rn such that A ⊂ B ⊂ A, then A = B.
Proof
By Theorem 1.35, A ⊂ B implies that A ⊂ B, while B ⊂ A implies that
B is contained in A = A. Thus, we have
A ⊂ B ⊂ A,
which proves that B = A.
Example 1.35
In general, if S is a subset of Rn, it is not necessary true that intS = intS,
even when S is an open set. For example, take S = (−1, 0) ∪ (0, 1) in
R. Then S is an open set and S = [−1, 1]. Notice that intS = S =
(−1, 0) ∪ (0, 1), but intS = (−1, 1).
Chapter 1. Euclidean Spaces 57
Exercises 1.4
Question 1
Let S be a subset of Rn. Show that bdS is a closed set.
Question 2
Let A be the subset of R2 given by
A = {(x, y) |x < 0, y ≥ 0} .
Find the interior, exterior, boundary and closure of A.
Question 3
Let x0 be a point in Rn, and let r be a positive number. Consider the subset
of Rn given by
A = {x ∈ Rn | 0 < ∥x− x0∥ ≤ r} .
Find the interior, exterior, boundary and closure of A.
Question 4
Let A be the subset of R2 given by
A = {(x, y) | 1 ≤ x < 3,−2 < y ≤ 5} ∪ {(0, 0), (2,−3)}.
Find the interior, exterior, boundary and closure of A.
Question 5
Let S be a subset of Rn. Show that
bdS = S ∩ Rn \ S.
Chapter 1. Euclidean Spaces 58
Question 6
Let S be a subset of Rn. Show that bdS ⊂ bdS. Give an example where
bdS ̸= bdS.
Question 7
Let S be a subset of Rn.
(a) Show that S is open if and only if S does not contain any of its boundary
points.
(b) Show that S is closed if and only if S contains all its boundary points.
Question 8
Let S be a subset of Rn, and let x be a point in Rn.
(a) Show that x is an interior point of S if and only if there is a
neighbourhood of x that is contained in S.
(b) Show that x ∈ S if and only if every neighbourhood of x intersects S.
(c) Show that x is a boundary point of S if and only if every neighbourhood
of x contains a point in S and a point not in S.
Question 9
Let S be a subset of Rn, and let x = (x1, . . . , xn) be a point in the interior
of S.
(a) Show that there is an r1 > 0 such that CB(x, r1) ⊂ S.
(b) Show that there is an r2 > 0 such that
n∏
i=1
(xi − r2, xi + r2) ⊂ S.
(c) Show that there is an r3 > 0 such that
n∏
i=1
[xi − r3, xi + r3] ⊂ S.
Chapter 1. Euclidean Spaces 59
1.5 Limit Points and Isolated Points
In this section, we generalize the concepts of limit points and isolated points to
subsets of Rn.
Definition 1.33 Limit Points
Let S be a subset of Rn. A point x in Rn is a limit point of S provided that
there is a sequence {xk} in S \ {x} that converges to x. The set of limit
points of S is denoted by S ′.
By Theorem 1.32, we obtain the following immediately.
Theorem 1.37
Let S be a subset of Rn, and let x be a point in Rn. The following are
equivalent.
(a) x is a limit point of S.
(b) x is in S \ {x}.
(c) For every r > 0, B(x, r) intersects S at a point other than x.
Corollary 1.38
If S is a subset of Rn, then S ′ ⊂ S.
Proof
If x ∈ S ′, x∈ S \ {x}. Since S \ {x} ⊂ S, we have S \ {x} ⊂ S.
Therefore, x ∈ S.
The following theorem says that the closure of a set is the union of the set with
all its limit points.
Theorem 1.39
If S is a subset of Rn, then S = S ∪ S ′.
Chapter 1. Euclidean Spaces 60
Proof
By Corollary 1.38, S ′ ⊂ S. Since we also have S ⊂ S, we find that
S ∪ S ′ ⊂ S.
Conversely, if x ∈ S, then by Theorem 1.32, there is a sequence {xk} in S
that converges to x. If x is not in S, then the sequence {xk} is in S \ {x}.
In this case, x is a limit point of S. This shows that S \ S ⊂ S ′, and hence,
S ⊂ S ∪ S ′.
In the proof above, we have shown the following.
Corollary 1.40
Let S be a subset of Rn. Every point in S that is not in S is a limit point of
S. Namely,
S \ S ⊂ S ′.
Now we introduce the definition of isolated points.
Definition 1.34 Isolated Points
Let S be a subset of Rn. A point x in Rn is an isolated point of S if
(a) x is in S;
(b) x is not a limit point of S.
Remark 1.1
By definition, a point x in S is either an isolated point of S or a limit point
of S.
Theorem 1.37 gives the following immediately.
Theorem 1.41
Let S be a subset of Rn and let x be a point in S. Then x is an isolated
point of S if and only if there is an r > 0 such that the ball B(x, r) does
not contain other points of S except the point x.
Chapter 1. Euclidean Spaces 61
Example 1.36
Find the set of limit points and isolated points of the set A = Z2 as a subset
of R2.
Solution
If {xk} is a sequence inA that converges to a point x, then there is a positive
integer K such that for all l ≥ k ≥ K,
∥xl − xk∥ < 1.
This implies that xk = xK for all k ≥ K. Hence, x = xK ∈ A. This shows
that A is closed. Hence, A = A. Therefore, A′ ⊂ A.
For every x = (k, l) ∈ Z2, B(x, 1) intersects A only at the point x itself.
Hence, x is an isolated point of A. This shows that every point of A is an
isolated point. Since A′ ⊂ A, we must have A′ = ∅.
Figure 1.17: The set Z2 does not have limit points.
Let us prove the following useful fact.
Theorem 1.42
If S is a subset of Rn, every interior point of S is a limit point of S.
Chapter 1. Euclidean Spaces 62
Proof
If x is an interior point of S, there exists r0 > 0 such that B(x, r0) ⊂ S.
Given r > 0, let r′ =
1
2
min{r, r0}. Then r′ > 0. Since r′ < r and r′ < r0,
the point
x′ = x+ r′e1
is inB(x, r) and S. Obviously, x′ ̸= x. Therefore, for every r > 0, B(x, r)
intersects S at a point other than x. This proves that x is a limit point of S.
Since S ⊂ intS ∪ bdS, and intS and bdS are disjoint, we deduce the
following.
Corollary 1.43
Let S be a subset of Rn. An isolated point of S must be a boundary point.
Since every point in an open set S is an interior point of S, we obtain the
following.
Corollary 1.44
If S is an open subset of Rn, every point of S is a limit point. Namely,
S ⊂ S ′.
Example 1.37
If I is an interval of the form (a, b), (a, b], [a, b) or [a, b] in R, then bd I =
{a, b}. It is easy to check that a and b are not isolated points of I . Hence, I
has no isolated points. Since I = I ∪ I ′ and I ⊂ I ′, we find that I ′ = I =
[a, b].
In fact, we can prove a general theorem.
Theorem 1.45
Let A and B be subsets of Rn such that A is open and A ⊂ B ⊂ A. Then
B′ = A. In particular, the set of limit points of A is A.
Chapter 1. Euclidean Spaces 63
Proof
By Theorem 1.36, A = B. Since A is open, A ⊂ A′. Since A = A ∪ A′,
we find that A = A′.
In the exercises, one is asked to show that A ⊂ B implies A′ ⊂ B′.
Therefore, A = A′ ⊂ B′ ⊂ B. Since A = B, we must have B′ = B = A.
Example 1.38
Let A be the subset of R2 given by
A = [−1, 1]× (−2, 2] = {(x, y) | − 1 ≤ x ≤ 1,−2 < y ≤ 2} .
Since U = (−1, 1)× (−2, 2) is open, U = [−1, 1]× [−2, 2], and U ⊂ A ⊂
U , the set of limit points of A is U = [−1, 1]× [−2, 2].
Chapter 1. Euclidean Spaces 64
Exercises 1.5
Question 1
Let A and B be subsets of Rn such that A ⊂ B. Show that A′ ⊂ B′.
Question 2
Let x0 be a point in Rn and let r be a positive number. Find the set of limit
points of the open ball B(x0, r).
Question 3
Let A be the subset of R2 given by
A = {(x, y) |x < 0, y ≥ 0} .
Find the set of limit points of A.
Question 4
Let x0 be a point in Rn, and let r is a positive number. Consider the subset
of Rn given by
A = {x ∈ Rn | 0 < ∥x− x0∥ ≤ r} .
(a) Find the set of limit points of A.
(b) Find the set of isolated points of the set S = Rn \ A.
Question 5
Let A be the subset of R2 given by
A = {(x, y) | 1 ≤ x < 3,−2 < y ≤ 5} ∪ {(0, 0), (2,−3)}.
Determine the set of isolated points and the set of limit points of A.
Chapter 1. Euclidean Spaces 65
Question 6
Let A = Q2 as a subset of R2.
(a) Find the interior, exterior, boundary and closure of A.
(b) Determine the set of isolated points and the set of limit points of A.
Question 7
Let S be a subset of Rn. Show that S is closed if and only if it contains all
its limit points.
Question 8
Let S be a subset of Rn, and let x be a point in Rn. Show that x is a limit
point of S if and only if every neighbourhood of x intersects S at a point
other than itself.
Question 9
Let x1, x2, . . ., xk be points in Rn and let A = Rn \ {x1,x2, . . . ,xk}. Find
the set of limit points of A.
Chapter 2. Limits of Multivariable Functions and Continuity 66
Chapter 2
Limits of Multivariable Functions and Continuity
We are interested in functions F : D → Rm that are defined on subsets D of Rn,
taking values in Rm. When n ≥ 2, these are called multivariable functions. When
m ≥ 2, they are called vector-valued functions. When m = 1, we usually write
the function as f : D → R.
2.1 Multivariable Functions
In this section, let us define some special classes of multivariable functions.
2.1.1 Polynomials and Rational Functions
A special class of functions is the set of polynomials in n variables.
Definition 2.1 Polynomials
Let k = (k1, . . . , kn) be an n-tuple of nonnegative integers. Associated
to this n-tuple k, there is a monomial pk : Rn → R of degree |k| =
k1 + · · ·+ kn of the form pk(x) = xk11 · · ·xknn .
A polynomial in n variables is a function p : Rn → R that is a finite linear
combination of monomials in n variables. It takes the form
p(x) =
m∑
j=1
ckj
pkj
(x),
where k1,k2, . . . ,km are distinct n-tuples of nonnegative integers, and
ck1 , ck2 , . . . , ckm are nonzero real numbers. The degree of the polynomial
p(x) is max{|k1|, |k2|, . . . , |km|}.
Chapter 2. Limits of Multivariable Functions and Continuity 67
Example 2.1
The following are examples of polynomials in three variables.
(a) p(x1, x2, x3) = x21 + x22 + x23
(b) p(x1, x2, x3) = 4x21x2 − 3x1x3 + x1x2x3
Example 2.2
The function f : Rn → R,
f(x) = ∥x∥ =
√
x21 + · · ·+ x2n
is not a polynomial.
When the domain of a function is not specified, we always assume that the
domain is the largest set on which the function can be defined.
Definition 2.2 Rational Functions
A rational function f : D → R is the quotient of two polynomials p : Rn →
R and q : Rn → R. Namely,
f(x) =
p(x)
q(x)
.
Its domain D is the set
D = {x ∈ Rn | q(x) ̸= 0} .
Example 2.3
The function
f(x1, x2) =
x1x2 + 3x21
x1 − x2
is a rational function defined on the set
D =
{
(x1, x2) ∈ R2 |x1 ̸= x2
}
.
Chapter 2. Limits of Multivariable Functions and Continuity 68
2.1.2 Component Functions of a Mapping
If the codomain Rm of the function F : D → Rm has dimension m ≥ 2, we
usually call the function a mapping. In this case, it would be good to consider the
component functions.
For 1 ≤ j ≤ m, the projection function πj : Rm → R is the function
πj(x1, . . . , xm) = xj.
Definition 2.3 Component Functions
Let F : D → Rm be a function defined on D ⊂ Rn. For 1 ≤ j ≤ m, the
j th component function of F is the function Fj : D → R defined as
Fj = (πj ◦ F) : D → R.
For each x ∈ D,
F(x) = (F1(x), . . . , Fm(x)).
Example 2.4
For the function F : R3 → R3, F(x) = −3x, the component functions are
F1(x1, x2, x3) = −3x1, F2(x1, x2, x3) = −3x2, F3(x1, x2, x3) = −3x3.
For convenience, we also define the notion of polynomialmappings.
Definition 2.4 Polynomial Mappings
We call a function F : Rn → Rm a polynomial mapping if each of its
components Fj : Rn → R, 1 ≤ j ≤ m, is a polynomial function. The
degree of the polynomial mapping F is the maximum of the degrees of the
polynomials F1, F2, . . . , Fm.
Example 2.5
The mapping F : R3 → R2,
F(x, y, z) = (x2y + 3xz, 8yz3 − 7x)
is a polynomial mapping of degree 4.
Chapter 2. Limits of Multivariable Functions and Continuity 69
2.1.3 Invertible Mappings
The invertibility of a function F : D → Rm is defined in the following way.
Definition 2.5 Inverse Functions
Let D be a subset of Rn, and let F : D → Rm be a function defined on
D. We say that F is invertible if F is one-to-one. In this case, the inverse
function F−1 : F(D) → D is defined so that for each y ∈ F(D),
F−1(y) = x if and only if F(x) = y.
Example 2.6
Let D = {(x, y) |x > 0, y > 0} and let F : D → R2 be the function
defined as
F(x, y) = (x− y, x+ y).
Show that F is invertible and find its inverse.
Solution
Let u = x− y and v = x+ y. Then
x =
u+ v
2
, y =
v − u
2
.
This shows that for any (u, v) ∈ R2, there is at most one pair of (x, y) such
that F(x, y) = (u, v). Thus, F is one-to-one, and hence, it is invertible.
Observe that
F(D) = {(u, v) | v > 0,−v < u < v.} .
The inverse mapping is given by F−1 : F(D) → R2,
F−1(u, v) =
(
u+ v
2
,
v − u
2
)
.
Chapter 2. Limits of Multivariable Functions and Continuity 70
2.1.4 Linear Transformations
Another special class of functions consists of linear transformations. A function
T : Rn → Rm is a linear transformation if for any x1, . . . ,xk in Rn, and for any
c1, . . . , ck in R,
T(c1x1 + · · ·+ ckxk) = c1T(x1) + · · ·+ ckT(xk).
Linear transformations are closely related to matrices.
An m × n matrix A is an array with m rows and n columns of real numbers.
It has the form
A = [aij] =

a11 a12 · · · a1n
a21 a22 · · · a2n
...
... . . . ...
am1 am2 · · · amn
 .
IfA = [aij] andB = [bij] arem×nmatrices, α and β are real numbers, αA+βB
is defined to be the m× n matrix C = αA+ βB = [cij] with
cij = αaij + βbij.
If A = [ail] is a m × k matrix, B = [blj] is a k × n matrix, the product AB is
defined to be the m× n matrix C = AB = [cij], where
cij =
k∑
l=1
ailblj.
It is easy to verify that matrix multiplications are associative.
Given x = (x1, . . . , xn) in Rn, we identify it with the column vector
x =

x1
x2
...
xn
 ,
which is an n × 1 matrix. If A is an m × n matrix, and x is a vector in Rn, then
y = Ax is the vector in Rm given by
y = Ax =

a11 a12 · · · a1n
a21 a22 · · · a2n
...
... . . . ...
am1 am2 · · · amn


x1
x2
...
xn
 =

a11x1 + a12x2 + · · ·+ a1nxn
a21x1 + a22x2 + · · ·+ a2nxn
...
am1x1 + am2x2 + · · ·+ amnxn
 .
Chapter 2. Limits of Multivariable Functions and Continuity 71
The following is a standard result in linear algebra.
Theorem 2.1
A function T : Rn → Rm is a linear transformation if and only if there
exists an m× n matrix A = [aij] such that
T(x) = Ax.
In this case, A is called the matrix associated to the linear transformation
T : Rn → Rm.
Sketch of Proof
It is easy to verify that the mapping T : Rn → Rm, T(x) = Ax is a linear
transformation if A is an m× n matrix.
Conversely, if T : Rn → Rm is a linear transformation, then for any x ∈
Rn,
T(x) = T(x1e1+x2e2+· · ·+xnen) = x1T(e1)+x2T(e2)+· · ·+xnT(en).
Define the vectors a1, a2, . . ., an in Rm by
a1 = T(e1), a2 = T(e2), . . . , an = T(en).
Let A be the m× n matrix with column vectors a1, a2, . . ., an. Namely,
A =
[
a1 a2 · · · an
]
.
Then we have T(x) = Ax.
Example 2.7
Let F : R2 → R2 be the function defined as
F(x, y) = (x− y, x+ y).
Then F is a linear transformation with matrix A =
[
1 −1
1 1
]
.
Chapter 2. Limits of Multivariable Functions and Continuity 72
For the linear transformation T : Rn → Rm, T(x) = Ax, the component
functions are
T1(x) = a11x1 + a12x2 + · · ·+ a1nxn,
T2(x) = a21x1 + a22x2 + · · ·+ a2nxn,
...
Tm(x) = am1x1 + am2x2 + · · ·+ amnxn.
Each of them is a polynomial of degree at most one. Thus, a linear transformation
is a polynomial mapping of degree at most one. It is easy to deduce the following.
Corollary 2.2
A mapping T : Rn → Rm is a linear transformation if and only if each
component function is a linear transformation.
The followings are some standard results about linear transformations.
Theorem 2.3
If S : Rn → Rm and T : Rn → Rm are linear transformations with
matrices A and B respectively, then for any real numbers α and β, αS +
βT : Rn → Rm is a linear transformation with matrix αA+ βB.
Theorem 2.4
If S : Rn → Rm and T : Rm → Rk are linear transformations with matrices
A and B, then T ◦S : Rn → Rk is a linear transformation with matrix BA.
Sketch of Proof
This follows from
(T ◦ S)(x) = T(S(x)) = B(Ax) = (BA)x.
In the particular case when m = n, we have the following.
Chapter 2. Limits of Multivariable Functions and Continuity 73
Theorem 2.5
Let T : Rn → Rn be a linear transformation represented by the matrix A.
The following are equivalent.
(a) The mapping T : Rn → Rn is one-to-one.
(b) The mapping T : Rn → Rn is onto.
(c) The matrix A is invertible.
(d) detA ̸= 0.
In other words, if the linear transformation T : Rn → Rn is one-to-one or
onto, then it is bijective. In this case, the linear transformation is invertible, and
we can define the inverse function T−1 : Rn → Rn.
Theorem 2.6
Let T : Rn → Rn be an invertible linear transformation represented by
the matrix A. Then the inverse mapping T−1 : Rn → Rn is also a linear
transformation and
T−1(x) = A−1x.
Example 2.8
Let T : R2 → R2 be the linear transformation
T(x, y) = (x− y, x+ y).
The matrix associated with T is A =
[
1 −1
1 1
]
. Since detA = 2 ̸= 0, T is
invertible. Since A−1 =
1
2
[
1 1
−1 1
]
, we have
T−1(x, y) =
(
x+ y
2
,
−x+ y
2
)
.
Chapter 2. Limits of Multivariable Functions and Continuity 74
2.1.5 Quadratic Forms
Given an m × n matrix A = [aij], its transpose is the n ×m matrix AT = [bij],
where
bij = aji for all 1 ≤ i ≤ n, 1 ≤ j ≤ m.
An n× n matrix A is symmetric if
A = AT .
An n× n matrix P is orthogonal if
P TP = PP T = I.
If the column vectors of P are v1, v2, . . ., vn, so that
P =
[
v1 v2 · · · vn
]
, (2.1)
then P is orthogonal if and only if {v1, . . . ,vn} is an orthonormal set of vectors
in Rn.
If A is an n× n symmetric matrix, its characteristic polynomial
p(λ) = det(λIn − A)
is a monic polynomial of degree n with n real roots λ1, λ2, . . . , λn, counting
with multiplicities. These roots are called the eigenvalues of A. There is an
orthonormal set of vectors {v1, . . . ,vn} in Rn such that
Avi = λivi for all 1 ≤ i ≤ n. (2.2)
Let D be the diagonal matrix
D =

λ1 0 · · · 0
0 λ2 · · · 0
...
... . . . ...
0 0 · · · λn
 , (2.3)
and let P be the orthogonal matrix (2.1). Then (2.2) is equivalent to AP = PD,
or equivalently,
A = PDP T = PDP−1.
Chapter 2. Limits of Multivariable Functions and Continuity 75
This is known as the orthogonal diagonalization of the real symmetric matrix A.
A quadratic form in Rn is a polynomial function Q : Rn → R of the form
Q(x) =
∑
1≤i<j≤n
cijxixj.
An n× n symmetric matrix A = [aij] defines a quadratic form QA : Rn → R by
QA(x) = xTAx =
n∑
i=1
n∑
j=1
aijxixj.
Example 2.9
The symmetric matrix A =
[
1 −2
−2 5
]
defines the quadratic form
QA(x, y) = x2 − 4xy + 5y2.
Conversely, given a quadratic form
Q(x) =
∑
1≤i<j≤n
cijxixj,
then Q = QA, where the entries of A = [aij] are
aij =

cii, if i = j,
cij/2, if i < j,
cji/2, if i > j.
Thus, there is a one-to-one correspondence between quadratic forms and symmetric
matrices.
If A = PDP T is an orthogonal diagonalization of A, under the change of
variables
y = P Tx, or equivalently, x = Py
we find that
QA = yTDy = λ1y
2
1 + · · ·+ λny
2
n. (2.4)
A consequence of (2.4) is the following.
Chapter 2. Limits of Multivariable Functions and Continuity 76
Theorem 2.7
Let A bean n × n symmetric matrix, and let QA(x) = xTAx be the
associated quadratic form. Let λ1, λ2, . . . , λn be the eigenvalues of A.
Assume that
λn ≤ · · · ≤ λ2 ≤ λ1.
Then for any x ∈ Rn,
λn∥x∥2 ≤ QA(x) ≤ λ1∥x∥2.
Sketch of Proof
Given x ∈ Rn, let y = P Tx. Then
∥y∥2 = yTy = xTPP Tx = xTx = ∥x∥2.
By (2.4) ,
QA(x) = λ1y
2
1 + · · ·+ λny
2
n.
Since λn ≤ · · · ≤ λ2 ≤ λ1, we find that
λn(y
2
1 + . . .+ y2n) ≤ QA(x) ≤ λ1(y
2
1 + . . .+ y2n).
The assertion follows.
At the end of this section, let us recall the classification of quadratic forms.
Definiteness of Symmetric Matrices
Given an n× n symmetric matrix A = [aij], let QA : Rn → R,
QA(x) = xTAx =
n∑
i=1
n∑
j=1
aijxixj
be the associated quadratic form.
Chapter 2. Limits of Multivariable Functions and Continuity 77
1. We say that the matrix A is positive definite, or the quadratic form QA
is positive definite, if QA(x) > 0 for all x ̸= 0 in Rn.
2. We say that the matrix A is negative definite, or the quadratic form QA
is negative definite, if QA(x) < 0 for all x ̸= 0 in Rn.
3. We say that the matrix A is indefinite, or the quadratic form QA is
indefinite, if there exist u and v in Rn such that QA(u) > 0 and
QA(v) < 0.
4. We say that the matrix A is positive semi-definite, or the quadratic form
QA is positive semi-definite, if QA(x) ≥ 0 for all x in Rn.
5. We say that the matrix A is negative semi-definite, or the quadratic form
QA is negative semi-definite, if QA(x) ≤ 0 for all x in Rn.
Obviously, a symmetric matrix A is negative definite if and only if −A is
positive definite.
The following is a standard result in linear algebra, which can be deduced
from (2.4).
Theorem 2.8
Let A be an n × n symmetric matrix, and let QA(x) = xTAx be the
associated quadratic form. Let {λ1, . . . , λn} be the set of eigenvalues of
A, repeated with multiplicities.
(a) QA is positive definite if and only if λi > 0 for all 1 ≤ i ≤ n.
(b) QA is negative definite if and only if λi < 0 for all 1 ≤ i ≤ n.
(c) QA is indefinite if there exist i and j so that λi > 0 and λj < 0.
(d) QA is positive semi-definite if and only if λi ≥ 0 for all 1 ≤ i ≤ n.
(e) QA is negative semi-definite if and only if λi ≤ 0 for all 1 ≤ i ≤ n.
From Theorem 2.7 and Theorem 2.8, we obtain the following.
Chapter 2. Limits of Multivariable Functions and Continuity 78
Corollary 2.9
Let Q : Rn → R be a quadratic form. If Q is positive definite, then there
exists a positive constant c such that
Q(x) ≥ c∥x∥2 for all x ∈ Rn.
In fact, c can be any positive number that is less than or equal to the smallest
eigenvalue of the symmetric matrix A associated to the quadratic form Q.
Chapter 2. Limits of Multivariable Functions and Continuity 79
2.2 Limits of Functions
In this section, we study limits of multivariable functions.
Definition 2.6 Limits of Functions
Let D be a subset of Rn and let x0 be a limit point of D. Given a function
F : D → Rm, we say that the limit of F(x) as x approaches x0 is v,
provided that whenever {xk} is a sequence of points in D \ {x0} that
converges to x0, the sequence {F(xk)} of points in Rm converges to the
point v.
If the limit of F : D → Rm as x approaches x0 is v, we write
lim
x→x0
F(x) = v.
Example 2.10
For 1 ≤ i ≤ n, let πi : Rn → R be the projection function πi(x1, . . . , xn) =
xi. By the theorem on componentwise convergence of sequences, if {xk}
is a sequence in Rn \ {x0} that converges to the point x0, then
lim
k→∞
πi(xk) = πi(x0).
This means that
lim
x→x0
πi(x) = πi(x0).
From the theorem on componentwise convergence of sequences, we also obtain
the following immediately.
Chapter 2. Limits of Multivariable Functions and Continuity 80
Proposition 2.10
Let D be a subset of Rn and let x0 be a limit point of D. Given a function
F : D → Rm,
lim
x→x0
F(x) = v
if and only if for each 1 ≤ j ≤ m,
lim
x→x0
Fj(x) = πj(v).
Example 2.11
Let f : Rn → R be the function defined as f(x) = ∥x∥. If x0 is a point in
Rn, find lim
x→x0
f(x).
Solution
We have shown in Example 1.15 that If {xk} is a sequence in Rn \ {x0}
that converges to x0, then
lim
k→∞
∥xk∥ = ∥x0∥.
Therefore, lim
x→x0
f(x) = ∥x0∥.
By the limit laws for sequences, we also have the followings.
Proposition 2.11
Let F : D → Rm and G : D → Rm be functions defined on D ⊂ Rn. If x0
is a limit point of D and
lim
x→x0
F(x) = u, lim
x→x0
G(x) = v,
then for any real numbers α and β,
lim
x→x0
(αF+ βG)(x) = αu+ βv.
Chapter 2. Limits of Multivariable Functions and Continuity 81
Proposition 2.12
Let f : D → R and g : D → R be functions defined on D ⊂ Rn. If x0 is a
limit point of D and
lim
x→x0
f(x) = u, lim
x→x0
g(x) = v,
then
lim
x→x0
(fg)(x) = uv.
If g(x) ̸= 0 for all x ∈ D, and v ̸= 0, then
lim
x→x0
(
f
g
)
(x) =
u
v
.
Example 2.12
If k = (k1, . . . , kn) is a k-tuple of nonnegative integers, the monomial
pk : Rn → R,
pk(x) = xk11 · · ·xknn
can be written as a product of the projection functions πi : Rn → R,
πi(x) = xi, 1 ≤ i ≤ n. By Proposition 2.12,
lim
x→x0
pk(x) = pk(x0)
for any x0 in Rn. If p : Rn → R is a polynomial, it is a finite linear
combination of monomials. Proposition 2.11 then implies that for any x0
in Rn,
lim
x→x0
p(x) = p(x0).
If f : D → R, f(x) = p(x)/q(x) is a rational function which is equal to the
quotient of the polynomial p(x) by the polynomial q(x), then Proposition
2.12 implies that
lim
x→x0
f(x) = f(x0)
for any x0 ∈ D = {x ∈ Rn | q(x) ̸= 0}.
Chapter 2. Limits of Multivariable Functions and Continuity 82
Example 2.13
Find lim
(x,y)→(1,−1)
x2 + 3xy + 2y2
x2 + y2
.
Solution
Since
lim
(x,y)→(1,−1)
(x2 + 3xy + 2y2) = 1− 3 + 2 = 0,
lim
(x,y)→(1,−1)
(x2 + y2) = 1 + 1 = 2,
we find that
lim
(x,y)→(1,−1)
x2 + 3xy + 2y2
x2 + y2
=
0
2
= 0.
It is easy to deduce the limit law for composite functions.
Proposition 2.13
Let D be a subset of Rn, and let U be a subset of Rk. Given the two
functions F : D → Rk and G : U → Rm, if F(D) ⊂ U , we can define the
composite function H = G ◦ F : D → Rm by H(x) = G(F(x)). If x0 is
a limit point of D, y0 is a limit point of U , F(D \ {x0}) ⊂ U \ {y0},
lim
x→x0
F(x) = y0, lim
y→y0
G(y) = v,
then
lim
x→x0
H(x) = lim
x→x0
(G ◦ F)(x) = v.
The proof repeats verbatim the proof of the corresponding theorem for single
variable functions.
Example 2.14
Find the limit lim
(x,y)→(0,0)
sin(2x2 + 3y2)
2x2 + 3y2
.
Chapter 2. Limits of Multivariable Functions and Continuity 83
Figure 2.1: The function f(x, y) =
x2 + 3xy + 2y2
x2 + y2
in Example 2.13.
Solution
Since
lim
(x,y)→(0,0)
(2x2 + 3y2) = 2× 0 + 3× 0 = 0, lim
u→0
sinu
u
= 1,
the limit law for composite functions implies that
lim
(x,y)→(0,0)
sin(2x2 + 3y2)
2x2 + 3y2
= 1.
Figure 2.2: The function f(x, y) =
sin(2x2 + 3y2)
2x2 + 3y2
in Example 2.14.
Let us look at some examples where the rules we have studied cannot be
applied.
Chapter 2. Limits of Multivariable Functions and Continuity 84
Example 2.15
Determine whether the limit lim
(x,y)→(0,0)
x2 − 2y2
x2 + y2
exists.
Solution
Let
f(x, y) =
x2 − 2y2
x2 + y2
=
p(x, y)
q(x, y)
.
When (x, y) → (0, 0), q(x, y) = x2 + y2 → 0. Hence, we cannot apply
limit law for quotients of functions.
Consider the sequences of points {uk} and {vk} in R2 \ {0, 0} given by
uk =
(
1
k
, 0
)
, vk =
(
0,
1
k
)
.
Notice that both the sequences {uk} and {vk} converge to (0, 0). If
lim
(x,y)→(0,0)
f(x, y) = a, then both the sequences {f(uk)} and {f(vk)}
should converge to a. Since
f(uk) = 1, f(vk) = −2 for all k ∈ Z+,
the sequence {f(uk)} converges to 1, while the sequence {f(vk)}
converges to −2. These imply that a = 1 and a = −2, which is a
contradiction. Hence, the limit lim
(x,y)→(0,0)
x2 − 2y2
x2 + y2
does not exist.
Example 2.16
Determine whether the limit lim
(x,y)→(0,0)
xy
x2 + 2y2
exists.
Chapter 2. Limits of Multivariable Functions and Continuity 85
Figure 2.3: The function f(x, y) =
x2 − 2y2
x2 + y2
in Example 2.15.
Solution
Let
f(x, y) =
xy
x2 + 2y2
.
Consider the sequences of points {uk} and {vk} in R2 \ {0, 0}given by
uk =
(
1
k
, 0
)
, vk =
(
1
k
,
1
k
)
,
Notice that both the sequences {uk} and {vk} converge to (0, 0). If
lim
(x,y)→(0,0)
f(x, y) = a, then both the sequences {f(uk)} and {f(vk)}
should converge to a. Since
f(uk) = 0, f(vk) =
1
3
for all k ∈ Z+,
the sequence {f(uk)} converges to 0, while the sequence {f(vk)}
converges to 1/3. These imply that a = 0 and a = 1/3, which is a
contradiction. Hence, the limit lim
(x,y)→(0,0)
xy
x2 + 2y2
does not exist.
Chapter 2. Limits of Multivariable Functions and Continuity 86
Figure 2.4: The function f(x, y) =
xy
x2 + 2y2
in Example 2.16.
Example 2.17
Determine whether the limit lim
(x,y)→(0,0)
xy2
x2 + 2y4
exists.
Solution
Let
f(x, y) =
xy2
x2 + 2y4
.
Consider the sequences of points {uk} and {vk} in R2 \ {0, 0} given by
uk =
(
1
k
, 0
)
, vk =
(
1
k2
,
1
k
)
,
Notice that both the sequences {uk} and {vk} converge to (0, 0). If
lim
(x,y)→(0,0)
f(x, y) = a, then both the sequences {f(uk)} and {f(vk)}
should converge to a. Since
f(uk) = 0, f(vk) =
1
3
for all k ∈ Z+,
the sequence {f(uk)} converges to 0, while the sequence {f(vk)}
converges to 1/3. These imply that a = 0 and a = 1/3, which is a
contradiction. Hence, the limit lim
(x,y)→(0,0)
xy2
x2 + 2y4
does not exist.
Chapter 2. Limits of Multivariable Functions and Continuity 87
Figure 2.5: The function f(x, y) =
xy2
x2 + 2y4
in Example 2.17.
Example 2.18
Determine whether the limit lim
(x,y)→(0,0)
xy2
x2 + 2y2
exists.
Solution
Let
f(x, y) =
xy2
x2 + 2y2
.
If {(xk, yk)} is a sequence of points in R2 \ {0, 0} that converges to (0, 0),
then
|f(xk, yk)| = |xk|
y2k
x2k + 2y2k
≤ |xk|.
The sequence {xk} converges to 0. By squeeze theorem, the sequence
{f(xk, yk)} also converges to 0. This proves that
lim
(x,y)→(0,0)
xy2
x2 + 2y2
= 0.
Similar to the single variable case, there is an equivalent definition of limits in
terms of ε and δ.
Chapter 2. Limits of Multivariable Functions and Continuity 88
Figure 2.6: The function f(x, y) =
xy2
x2 + 2y2
in Example 2.18.
Theorem 2.14 Equivalent Definitions for Limits
Let D be a subset of Rn, and let x0 be a limit point of D. Given a function
F : D → Rm, the following two definitions for
lim
x→x0
F(x) = v
are equivalent.
(i) Whenever {xk} is a sequence of points in D \ {x0} that converges to
x0, the sequence {F(xk)} converges to v.
(ii) For any ε > 0, there is a δ > 0 such that if the point x is in D and
0 < ∥x− x0∥ < δ, then ∥F(x)− v∥ < ε.
Proof
We will prove that if (ii) holds, then (i) holds; and if (ii) does not hold, then
(i) also does not hold.
First assume that (ii) holds. If {xk} is a sequence in D\{x0} that converges
to the point x0, we need to show that the sequence {F(xk)} converges to v.
Given ε > 0, (ii) implies that there is a δ > 0 such that for all x that is in
D \ {x0} with ∥x− x0∥ < δ, we have ∥F(x)− v∥ < ε.
Chapter 2. Limits of Multivariable Functions and Continuity 89
Since {xk} converges to x0, there is a positive integer K such that for all
k ≥ K, ∥xk − x0∥ < δ. Therefore, for all k ≥ K, ∥F(xk)− v∥ < ε. This
shows that the sequence {F(xk)} indeed converges to v.
Now assume that (ii) does not hold. Then there is an ε > 0 such that for any
δ > 0, there is a point x in D\{x0} with ∥x−x0∥ < δ but ∥F(xk)−v∥ ≥ ε.
For this ε > 0, we construct a sequence {xk} in D \ {x0} in the following
way. For each positive integer k, there is a point xk in D \ {x0} such that
∥x−x0∥ < 1/k but ∥F(xk)−v∥ ≥ ε. Then {xk} is a sequence in D\{x0}
that satisfies
∥x− x0∥ < 1/k for all k ∈ Z+.
Hence, it converges to x0. Since ∥F(xk) − v∥ ≥ ε for all k ∈ Z+, the
sequence {F(xk)} cannot converge to v. This proves that (i) does not hold.
We can give an alternative solution to Example 2.18 as follows.
Alternative Solution to Example 2.18
Let
f(x, y) =
xy2
x2 + 2y2
.
Given ε > 0, let δ = ε. If (x, y) is a point in R2 \ {(0, 0)} such that√
x2 + y2 = ∥(x, y)− (0, 0)∥ < δ = ε,
then |x| < ε. This implies that
|f(x, y)− 0| = |x| y2
x2 + 2y2
≤ |x| < ε.
Hence,
lim
(x,y)→(0,0)
xy2
x2 + 2y2
= 0.
Chapter 2. Limits of Multivariable Functions and Continuity 90
Exercises 2.2
Question 1
Determine whether the limit exists. If it exists, find the limit.
(a) lim
(x,y)→(1,2)
4x2 − y2
x2 + y2
(b) lim
(x,y)→(1,2)
√
4x2 − y2
x2 + y2
(c) lim
(x,y)→(1,2)
√
4x2 + y2
x2 + y2
Question 2
Determine whether the limit exists. If it exists, find the limit.
(a) lim
(x,y)→(0,0)
x3 + y3
x2 + y2
(b) lim
(x,y)→(0,0)
x2 + y3
x2 + y2
(c) lim
(x,y)→(0,0)
e4x
2+y2 − 1
4x2 + y2
(d) lim
(x,y)→(0,0)
ex
2+y2 − 1
4x2 + y2
Question 3
Determine whether the limit
lim
(x,y)→(0,0)
x2 + 4y4
4x2 + y4
exists. If it exists, find the limit.
Chapter 2. Limits of Multivariable Functions and Continuity 91
Question 4
Determine whether the limit
lim
(x,y)→(1,1)
cos(x2 + y2 − 2)− 1
(x2 + y2 − 2)2
exists. If it exists, find the limit.
Question 5
Let x0 be a point in Rn. Find the limit lim
x→x0
x
∥x∥
.
Question 6
Let D be a subset of Rn, and let f : D → R and G : D → Rm be functions
defined on D. We can define the function H : D → Rm by
H(x) = f(x)G(x) for all x ∈ D.
If x0 is a point in D and
lim
x→x0
f(x) = a, lim
x→x0
G(x) = v,
show that
lim
x→x0
H(x) = av.
Chapter 2. Limits of Multivariable Functions and Continuity 92
2.3 Continuity
The definition of continuity is a direct generalization of the single variable case.
Definition 2.7 Continuity
Let D be a subset of Rn that contains the point x0, and let F : D → Rm be
a function defined on D. We say that the function F is continuous at x0
provided that whenever {xk} is a sequence of points in D that converges to
x0, the sequence {F(xk)} converges to F(x0).
We say that F : D → Rm is a continuous function if it is continuous at
every point of its domain D.
From the definition, we obtain the following immediately.
Proposition 2.15 Limits and Continuity
Let D be a subset of Rn that contains the point x0, and let F : D → Rm be
a function defined on D.
1. If x0 is an isolated point of D, then F is continuous at x0.
2. If x0 is a limit point of D, then F is continuous at x0 if and only if
lim
x→x0
F(x) = F(x0).
Example 2.19
Example 2.10 says that for each 1 ≤ i ≤ n, the projection function πi :
Rn → R, πi(x) = xi, is a continuous function.
Example 2.20
Example 2.11 says that the norm function f : Rn → R, f(x) = ∥x∥, is a
continuous function.
From Proposition 2.10, we have the following.
Chapter 2. Limits of Multivariable Functions and Continuity 93
Proposition 2.16
Let D be a subset of Rn that contains the point x0, and let F : D → Rm
be a function defined on D. The function F : D → Rm is continuous at x0
if and only if each of the component functions Fj = (πj ◦ F) : D → R,
1 ≤ j ≤ m, is continuous at x0.
Example 2.21
The function F : R3 → R2,
F(x, y, z) = (x, z),
is a continuous function since each component function is continuous.
Proposition 2.11 gives the following.
Proposition 2.17
Let F : D → Rm and G : D → Rm be functions defined on D ⊂ Rn, and
let x0 be a point in D. If F : D → Rm and G : D → Rm are continuous at
x0, then for any real numbers α and β, the function (αF+ βG) : D → Rm
is continuous at x0.
Proposition 2.12 gives the following.
Proposition 2.18
Let f : D → R and g : D → R be functions defined on D ⊂ Rn, and let
x0 be a point in D. Assume that the functions f : D → R and g : D → R
are continuous at x0.
1. The function (fg) : D → R is continuous at x0.
2. If g(x) ̸= 0 for all x ∈ D, then the function (f/g) : D → R is
continuous at x0.
Example 2.12 gives the following.
Chapter 2. Limits of Multivariable Functions and Continuity 94
Proposition 2.19
Polynomials and rational functions are continuous functions.
Since each component of a linear transformation T : Rn → Rm is a polynomial,
we have the following.
Proposition 2.20
A linear transformation T : Rn → Rm is a continuous function.
Since a quadratic form Q : Rn → R is a polynomial, we have the following.
Proposition 2.21
A quadraticform Q : Rn → R given by
Q(x) =
n∑
i=1
n∑
j=1
aijxixj
is a continuous function.
The following is obvious from the definition of continuity.
Proposition 2.22
Let D be a subset of Rn, and let F : D → Rm be a function that is
continuous at the point x0 ∈ D. If D1 is a subset of D that contains x0,
then the function F : D1 → Rm is also continuous at x0.
Example 2.22
Let D be the set
D =
{
(x, y) |x2 + y2 < 1
}
,
and let f : D → R be the function defined as
f(x, y) =
xy
1− x2 − y2
.
Chapter 2. Limits of Multivariable Functions and Continuity 95
Since f1(x, y) = xy and f2(x, y) = 1 − x2 − y2 are polynomials, they
are continuous. Since f2(x, y) ̸= 0 for all (x, y) ∈ D, f : D → R is a
continuous function.
Figure 2.7: The function f(x, y) =
xy
1− x2 − y2
in Example 2.22.
Proposition 2.13 implies the following.
Proposition 2.23
Let D be a subset of Rn, and let U be a subset of Rk. If F : D → Rk
and G : U → Rm are functions such that F(D) ⊂ U , F : D → Rk is
continuous at x0, G : U → Rm is continuous at y0, then the composite
function H = (G ◦ F) : D → Rm is continuous at x0.
A direct proof of this theorem using the definition of continuity is actually
much simpler.
Proof
If {xk} is a sequence of points in D that converges to x0, then since F :
D → Rk is continuous at x0, {F(xk)} is a sequence of points in U that
converges to y0. Since G : U → Rm is continuous at y0, {G(F(xk))} is a
sequence of points in Rm that converges to G(y0) = G(F(x0)).
Chapter 2. Limits of Multivariable Functions and Continuity 96
In other words, the sequence {H(xk)} converges to H(x0). This shows that
the function H = (G ◦ F) : D → Rm is continuous at x0.
Figure 2.8: Composition of functions.
Corollary 2.24
Let D be a subset of Rn, and let x0 be a point in D. If the function F :
D → Rm is continuous at x0 ∈ D, then the function ∥F∥ : D → R is also
continuous at x0.
Figure 2.9: The function f(x, y) = |x2 − y2|.
Chapter 2. Limits of Multivariable Functions and Continuity 97
Example 2.23
The function f : R2 → R, f(x, y) = |x2 − y2| is a continuous function
since f(x, y) = |p(x, y)|, where p(x, y) = x2−y2 is a polynomial function,
which is continuous.
Example 2.24
Consider the function f : R2 → R, f(x, y) =
√
e2xy + x2 + y2. Notice
that f(x, y) = ∥F(x, y)∥, where F : R2 → R3 is the function given by
F(x, y) = (exy, x, y) .
Since g(x, y) = xy is a polynomial function, it is continuous. Being a
composition of the continuous function h(x) = ex with the continuous
function g(x, y) = xy, F1(x, y) = (h ◦ g)(x, y) = exy is a continuous
function. The functions F2(x, y) = x and F3(x, y) = y are continuous
functions. Hence, F : R2 → R3 is a continuous function. This implies that
f : R2 → R is also a continuous function.
Figure 2.10: The function f(x, y) =
√
e2xy + x2 + y2.
Chapter 2. Limits of Multivariable Functions and Continuity 98
Example 2.25
We have shown in volume I that the function f : R → R,
f(x) =

sinx
x
, if x ̸= 0,
1, if x = 0,
is a continuous function. Define the function h : R3 → R by
h(x, y, z) =

sin(x2 + y2 + z2)
x2 + y2 + z2
, if (x, y, z) ̸= (0, 0, 0),
1, if (x, y, z) = (0, 0, 0).
Since h = f ◦ g, where g : R3 → R is the polynomial function g(x, y, z) =
x2 + y2 + z2, which is continuous, the function h : R3 → R is continuous.
The following gives an equivalent definition of continuity in terms of ε and δ.
Theorem 2.25 Equivalent Definitions of Continuity
Let D be a subset of Rn, and let x0 be a limit point of D. Given a function
F : D → Rm, the following two definitions for the continuity of F at x0
are equivalent.
(i) Whenever {xk} is a sequence of points in D that converges to x0, the
sequence {F(xk)} converges to F(x0).
(ii) For any ε > 0, there is a δ > 0 such that if the point x is in D and
∥x− x0∥ < δ, then ∥F(x)− F(x0)∥ < ε.
The proof is left as an exercise. Notice that statement (ii) can be reformulated
as follows. For any ε > 0, there is a δ > 0 such that if the point x is in D and
x ∈ B(x0, δ), then F(x) ∈ B(F(x0), ε).
Now we want to explore another important property of continuity.
Chapter 2. Limits of Multivariable Functions and Continuity 99
Figure 2.11: The definition of continuity in terms of ε and δ.
Theorem 2.26
Let O be an open subset of Rn, and let F : O → Rm be a function defined
on O. The following are equivalent.
(a) F : O → Rm is continuous.
(b) For every open subset V of Rm, F−1(V ) is an open subset of Rn.
Note that for this theorem to hold, it is important that the domain of the
function F is an open set.
Proof
Assume that (a) holds. Let V be an open subset of Rm, and let
U = F−1(V ) = {x ∈ O |F(x) ∈ V } .
We need to show that U is an open subset of Rn. If x0 is in U , then it
is in O. Since O is open, there exists r0 > 0 such that B(x0, r0) ⊂ O.
Since y0 = F(x0) is in V and V is open, there exists ε > 0 such that
B(y0, ε) ⊂ V . By (a), there exists δ > 0 such that for any x ∈ O, if
∥x− x0∥ < δ, then ∥F(x)− F(x0)∥ < ε.
Chapter 2. Limits of Multivariable Functions and Continuity 100
Take r = min{δ, r0}. Then r > 0, r ≤ r0 and r ≤ δ. If x is in B(x0, r),
then x ∈ O and ∥x − x0∥ < r ≤ δ. It follows that ∥F(x) − F(x0)∥ < ε.
This implies that F(x) ⊂ B(y0, ε) ⊂ V . Thus, x ∈ U . In other words,
we have shown that B(x0, r) is contained in U . This proves that U is open,
which is the assertion of (b).
Conversely, assume that (b) holds. Let x0 be a point in O, and let y0 =
F(x0). Given ε > 0, the ball V = B(y0, ε) is an open subset of Rm. By
(b), U = F−1(V ) is open in Rn. By definition, U is a subset of O. Since
F(x0) is in V , x0 is in U . Since U is open and it contains x0, there is an
r > 0 such that B(x0, r) ⊂ U . Take δ = r. Then if x is a point in O and
∥x−x0∥ < r, x ∈ B(x0, r) ⊂ U . This implies that F(x) ∈ V = B(y0, ε).
Namely, ∥F(x)−F(x0)∥ < ε. This proves that F : O → Rm is continuous
at x0. Since x0 is an arbitrary point in O, F : O → Rm is continuous.
Using the fact that a set is open if and only if its complement is closed, it is
natural to expect the following.
Theorem 2.27
Let A be a closed subset of Rn, and let F : A → Rm be a function defined
on A. The following are equivalent.
(a) F : A → Rm is continuous.
(b) For every closed subset C of Rm, F−1(C) is a closed subset of Rn.
Proof
Assume that (a) holds. Let C be a closed subset of Rm, and let
D = F−1(C) = {x ∈ A |F(x) ∈ D} .
We need to show that D is a closed subset of Rn. If {xk} is a sequence in
D that converges to the point x0 in Rn, since D ⊂ A and A is closed, x0 is
in A. Since F is continuous at x0, the sequence {F(xk)} is a sequence in
C that converges to the point F(x0) in Rm. Since C is closed, F(x0) is in
C. Therefore, x0 is in D. This proves that D is closed.
Chapter 2. Limits of Multivariable Functions and Continuity 101
Conversely, assume that (a) does not hold. Then F : A → Rm is not
continuous at some x0 ∈ A. Thus, there exists ε > 0 such that for any δ >
0, there exists a point x in A∩B(x0, δ) such that ∥F(x)−F(x0)∥ ≥ ε. For
k ∈ Z+, let xk be a point in A∩B(x0, 1/k) such that ∥F(xk)−F(x0)∥ ≥ ε.
Since
∥xk − x0∥ <
1
k
for all k ∈ Z+,
the sequence {xk} is a sequence in A that converges to x0. Let
C = {y ∈ Rm | ∥y − F(x0)∥ ≥ ε} .
Then C is the complement of the open setB(F(x0), ε). Hence, C is closed.
It contains F(xk) for all k ∈ Z+, but it does not contain F(x0). Thus, the
set D = F−1(C) contains the sequence {xk}, but does not contain its limit
x0. This means D is not closed. Therefore, (b) does not hold.
There is a much easier proof of Theorem 2.27 if A = Rn, using Theorem 2.26,
and the fact that a set is closed if and only if its complement is open.
Theorem 2.26 and Theorem 2.27 provide useful tools to justfy that a set is
open or closed in Rn, using our known library of continuous functions.
Example 2.26
Let A be the subset of R2 given by
A =
{
(x, y) |x2 + y2 < 20, y > x2
}
.
Show that A is open.
Solution
Let O = {(x, y) |x2 + y2 < 20}. This is a ball of radius
√
20 centered
at the origin. Hence, O isopen. Define the function f : O → R by
f(x, y) = y − x2. Since f is a polynomial, it is continuous. Notice that
y > x2 if and only if f(x, y) > 0, if and only if f(x, y) ∈ (0,∞). This
shows that A = f−1((0,∞)). Since (0,∞) is open in R, Theorem 2.26
implies that A is an open set.
Chapter 2. Limits of Multivariable Functions and Continuity 102
Figure 2.12: The set A in Example 2.26.
Example 2.27
Let C be the subset of R3 given by
C =
{
(x, y, z) |x ≥ 0, y ≥ 0, y2 + z2 ≤ 20.
}
.
Show that C is closed.
Solution
Let πx : R3 → R and πy : R3 → R be the projection functions
πx(x, y, z) = x and πy(x, y, z) = y, and consider the function g : R3 → R
defined as
g(x, y, z) = 20− (y2 + z2).
Notice that y2 + z2 ≤ 20 if and only if g(x, y, z) ≥ 0, if and only if
g(x, y, z) ∈ I = [0,∞). The projection functions πx and πy are continuous.
Since g is a polynomial, it is also continuous. The set I = [0,∞) is closed
in R. Therefore, the sets π−1
x (I), π−1
y (I) and g−1(I) are closed in R3. Since
A = π−1
x (I) ∩ π−1
y (I) ∩ g−1(I),
being an intersection of three closed sets, A is closed in R3.
Using the same reasonings, we obtain the following.
Chapter 2. Limits of Multivariable Functions and Continuity 103
Theorem 2.28
Let I1, . . . , In be intervals in R.
1. If each of I1, . . . , In are open intervals of the form (a, b), (a,∞),
(−∞, a) or R, then I1 × · · · × In is an open subset of Rn.
2. If each of I1, . . . , In are closed intervals of the form [a, b], [a,∞),
(−∞, a] or R, then I1 × · · · × In is a closed subset of Rn.
Sketch of Proof
Use the fact that
I1 × · · · × In =
n⋂
i=1
π−1
i (Ii),
where πi : Rn → R is the projection function πi(x1, . . . , xn) = xi.
Example 2.28
The set
A = {(x, y, z) |x < 0, y > 2,−10 < z < −3}
is open in R3, since
A = (−∞, 0)× (2,∞)× (−10,−3).
The set
C = {(x, y, z) |x ≤ 0, y ≥ 2,−10 ≤ z ≤ −3}
is closed in R3, since
C = (−∞, 0]× [2,∞)× [−10,−3].
We also have the following.
Chapter 2. Limits of Multivariable Functions and Continuity 104
Theorem 2.29
Let a and b be real numbers, and assume that f : Rn → R is a continuous
function. Define the sets A,B,C,D,E and F as follows.
(a) A = {x ∈ Rn | f(x) > a}
(b) B = {x ∈ Rn | f(x) ≥ a}
(c) C = {x ∈ Rn | f(x) < a}
(d) D = {x ∈ Rn | f(x) ≤ a}
(e) E = {x ∈ Rn | a < f(x) < b}
(f) F = {x ∈ Rn | a ≤ f(x) ≤ b}
Then A,C and E are open sets, while B, D and F are closed sets.
The proof is left as an exercise.
Example 2.29
Find the interior, exterior and boundary of each of the following sets.
(a) A =
{
(x, y) | 0 < x2 + 4y2 < 4
}
(b) B =
{
(x, y) | 0 < x2 + 4y2 ≤ 4
}
(c) C =
{
(x, y) |x2 + 4y2 ≤ 4
}
Figure 2.13: The sets A, B and C defined in Example 2.29.
Chapter 2. Limits of Multivariable Functions and Continuity 105
Solution
Let
D =
{
(x, y) |x2 + 4y2 < 4
}
, E =
{
(x, y) |x2 + 4y2 > 4
}
,
and let f : R2 → R be the function defined as
f(x, y) = x2 + 4y2.
Since f is a polynomial, it is continuous. By Theorem 2.29, A,D and E
are open sets and C is a closed set. Since A ⊂ B and D ⊂ C, we have
A = intA ⊂ intB ⊂ B, D ⊂ intC.
Since E = R2 \ C ⊂ R2 \B ⊂ R2 \ A, We have
E = extC ⊂ extB ⊂ extA.
Let
F =
{
(x, y) |x2 + 4y2 = 4
}
.
Then Rn is a disjoint union of D, E and F . If u0 = (x0, y0) ∈ F , either
x0 ̸= 0 or y0 ̸= 0, but not both. If x0 ̸= 0, define the sequences {uk} and
{vk} by
uk =
(
k
k + 1
x0, y0
)
, vk =
(
k + 1
k
x0, y0
)
.
If x0 = 0, then y0 ̸= 0. Define the sequences {uk} and {vk} by
uk =
(
x0,
k
k + 1
y0
)
, vk =
(
x0,
k + 1
k
y0
)
.
In either case, {uk} is a sequence of points in A that converges to u0, while
{vk} is a sequence of points in E that converges to u0. This proves that u0
is a boundary point of A, B and C. For the point 0, since it is not in A and
B, it is not an interior point of A and B, but it is the limit of the sequence
{(1/k, 0)} that is in both A and B. Hence, 0 is in the closure of A and B,
and hence, is a boundary point of A and B. We conclude that
Chapter 2. Limits of Multivariable Functions and Continuity 106
intA = intB =
{
(x, y) | 0 < x2 + 4y2 < 4
}
,
intC =
{
(x, y) |x2 + 4y2 < 4
}
,
extA = extB = extC =
{
(x, y) |x2 + 4y2 > 4
}
,
bdA = bdB =
{
(x, y) |x2 + 4y2 = 4
}
∪ {0},
bdC =
{
(x, y) |x2 + 4y2 = 4
}
.
Remark 2.1
Let f : Rn → R be a continuous function and let
C = {x ∈ Rn | a ≤ f(x) ≤ b} .
One is tempting to say that
bdC = {x ∈ Rn | f(x) = a or f(x) = b} .
This is not necessary true. For example, consider the set C in Example
2.29. It can be written as
C =
{
(x, y) | 0 ≤ x2 + 4y2 ≤ 4
}
However, the point where f(x, y) = x2 + 4y2 = 0 is not a boundary point
of C.
Now we return to continuous functions.
Theorem 2.30 Pasting of Continuous Functions
Let A and B be closed subsets of Rn, and let S = A ∪ B. If F : S → Rm
is a function such that FA = F|A : A→ Rm and FB = F|B : B → Rm are
both continuous, then F : S → Rm is continuous.
Chapter 2. Limits of Multivariable Functions and Continuity 107
Proof
Since S is a union of two closed sets, it is closed. Applying Theorem 2.27,
it suffices to show that if C is a closed subset of Rm, then F−1(C) is closed
in Rn. Notice that
F−1(C) = {x ∈ S |F(x) ∈ C}
= {x ∈ A |F(x) ∈ C} ∪ {x ∈ B |F(x) ∈ C}
= F−1
A (C) ∪ F−1
B (C).
Since FA : A → Rm and FB : B → Rm are both continuous functions,
F−1
A (C) and F−1
B (C) are closed subsets of Rn. Being a union of two closed
subsets, F−1(C) is closed. This completes the proof.
Example 2.30
Let f : R2 → R be the function defined as
f(x, y) =
x2 + y2, if x2 + y2 < 1
1, if x2 + y2 ≥ 1.
Show that f is a continuous function.
Solution
Let A = {(x, y) |x2 + y2 ≤ 1} and B = {(x, y) |x2 + y2 ≥ 1}. Then A
andB are closed subsets of R2 and R2 = A∪B. Notice that f |A : A→ R is
the function f(x, y) = x2+y2, which is continuous since it is a polynomial.
By definition, f |B : B → R is the constant function fB(x, y) = 1, which is
also continuous. By Theorem 2.30, the function f : R2 → R is continuous.
Given positive integers n and m, there is a natural bijective correspondence
between Rn × Rm and Rn+m given by T : Rn × Rm → Rn+m,
(x,y) 7→ (x1, . . . , xn, y1, . . . , ym),
where
x = (x1, . . . , xn) and y = (y1, . . . , ym).
Chapter 2. Limits of Multivariable Functions and Continuity 108
Hence, sometimes we will denote a point in Rn+m as (x,y), where x ∈ Rn and
y ∈ Rm. By generalized Pythagoras theorem,
∥(x,y)∥2 = ∥x∥2 + ∥y∥2.
If A is a subset of Rn, B is a subset of Rm, A × B can be considered as a subset
of Rn+m given by
A×B = {(x,y) |x ∈ A,y ∈ B} .
The following is more general than Proposition 2.16.
Proposition 2.31
Let D be a subset of Rn, and let F : D → Rk and G : D → Rl be functions
defined on D. Define the function H : D → Rk+l by
H(x) = (F(x),G(x)).
Then the function H : D → Rk+l is continuous if and only if the functions
F : D → Rk and G : D → Rl are continuous.
Sketch of Proof
This proposition follows immediately from Proposition 2.16, since
H(x) = (F1(x), . . . , Fk(x), G1(x), . . . , Gl(x)).
For a function defined on a subset of Rn, we can define its graph in the
following way.
Definition 2.8 The Graph of a Function
Let F : D → Rm be a function defined on D ⊂ Rn. The graph of F,
denoted by GF, is the subset of Rn+m defined as
GF = {(x,y) |x ∈ D,y = F(x)} .
Chapter 2. Limits of Multivariable Functions and Continuity 109
Example 2.31
Let D = {(x, y) |x2 + y2 ≤ 1}, and let f : D → R be the function defined
as
f(x, y) =
√
1− x2 − y2.
The graph of f is
Gf =
{
(x, y, z) |x2 + y2 ≤ 1, z =
√
1− x2 − y2
}
,
which is the upper hemisphere.
Figure 2.14: The upper hemisphere is the graph of a function.
Notice that if D is a subset of Rn, then the graph of the function F : D → Rm
is the image of the function H : D → Rn+m defined as
H(x) = (x,F(x)) .
From Proposition 2.31, we obtain the following.
Corollary 2.32
Let D be a subset of Rn, and let F : D → Rm be a function defined on D.
The image of the function H : D → Rn+m,
H(x) = (x,F(x)) ,
is the graph of F. If the function F : D →Rm is continuous, then the
function H : D → Rn+m is continuous.
Now we consider a special class of functions called Lipschitz functions.
Chapter 2. Limits of Multivariable Functions and Continuity 110
Definition 2.9
Let D be a subset of Rn. A function F : D → Rm is Lipschitz provided
that there exists a positive constant c such that
∥F(u)− F(v)∥ ≤ c∥u− v∥ for all u,v ∈ D.
The constant c is called a Lipschitz constant of the function. If c < 1, then
F : D → Rm is called a contraction.
The following is easy to establish.
Proposition 2.33
Let D be a subset of Rn, and let F : D → Rm be a Lipschitz function.
Then F : D → Rm is continuous.
Example 2.32
A linear transformation of the form T : Rn → Rn, T(x) = ax, is a
Lipschitz function with Lipschitz constant |a|.
In fact, we have the following.
Theorem 2.34
A linear transformation T : Rn → Rm is a Lipschitz function.
Proof
Let A be the m× n matrix such that T(x) = Ax. When x is in Rn,
∥T(x)∥2 = (Ax)T (Ax) = xT (ATA)x.
The matrix B = ATA is a positive semi-definite n × n symmetric matrix.
By Theorem 2.7,
xT (ATA)x ≤ λmax∥x∥2,
where λmax is the largest eigenvalue of ATA.
Chapter 2. Limits of Multivariable Functions and Continuity 111
Therefore, for any x ∈ Rn,
∥T(x)∥ ≤
√
λmax∥x∥.
It follows that for any u and v in Rn,
∥T(u)−T(v)∥ = ∥T(u− v)∥ ≤
√
λmax∥u− v∥.
Hence, T : Rn → Rm is a Lipschitz mapping with Lipschitz constant√
λmax.
Example 2.33
Let T : R2 → R2 be the mapping defined as
T(x, y) = (x− 3y, 7x+ 4y).
Find the smallest constant c such that
∥T(u)−T(v)∥ ≤ c∥u− v∥
for all u and v in R2.
Solution
Notice that T(u) = Au, whereA is the 2×2 matrixA =
[
1 −3
7 4
]
. Hence,
∥T(u)∥2 = uTATAu = uTCu,
where
C =
[
1 7
−3 4
][
1 −3
7 4
]
=
[
50 25
25 25
]
= 25
[
2 1
1 1
]
.
For the matrix G =
[
2 1
1 1
]
, the eigenvalues are the solutions of
Chapter 2. Limits of Multivariable Functions and Continuity 112
λ2 − 3λ+ 1 = 0,
which are
λ1 =
3 +
√
5
2
and λ2 =
3−
√
5
2
.
Hence,
∥T(u)∥2 ≤ 25(3 +
√
5)
2
∥u∥2.
The smallest c such that ∥T(u)−T(v)∥ ≤ c∥u− v∥ for all u and v in R2
is
c =
√
25(3 +
√
5)
2
= 8.0902.
Remark 2.2
If A is an m × n matrix, the matrix B = ATA is a positive semi-definite
n × n symmetric matrix. Thus, all its eigenvalues are nonnegative. Let
λ1, . . . , λn be its eigenvalues with
0 = λn = · · · = λr+1 < λr ≤ λr−1 ≤ · · · ≤ λ1.
Then λ1, · · · , λr are the nonzero eigenvalues of ATA. The singular values
of A are the numbers σ1, . . . , σr, where
σi =
√
λi, 1 ≤ i ≤ r.
Theorem 2.34 says that σ1 is a Lipschitz constant of the linear
transformation T(x) = Ax.
At the end of this section, we want to discuss the vector space of m × n
matrices Mm,n. There is a natural vector space isomorphism between Mm,n and
Rmn, by mapping the matrix A = [aij] to x = (xk), where
x(i−1)n+j = aij for 1 ≤ i ≤ m, 1 ≤ j ≤ n.
Chapter 2. Limits of Multivariable Functions and Continuity 113
In other words, if
a1 = (a11, a12, . . . , a1n),
a2 = (a21, a22, . . . , a2n),
...
am = (am,1, am,2, . . . , am,n)
are the row vectors of A, then A is mapped to the vector (a1, a2, . . . , am) in Rmn.
Under this isomorphism, the norm of a matrix A = [aij] is
∥A∥ =
√√√√ m∑
i=1
n∑
j=1
a2ij =
√√√√ m∑
i=1
∥ai∥2,
and the distance between two matrices A = [aij] and B = [bij] is
d(A,B) = ∥A−B∥ =
√√√√ n∑
i=1
n∑
j=1
(aij − bij)2.
The following proposition can be used to give an alternative proof of Theorem
2.34.
Proposition 2.35
Let A be an m× n matrix. If x is in Rn, then
∥Ax∥ ≤ ∥A∥∥x∥.
Proof
Let a1, . . ., am be the row vectors of A, and let w = Ax. Then
wi = ⟨ai,x⟩ for 1 ≤ i ≤ m.
By Cauchy-Schwarz inequality,
|wi| ≤ ∥ai∥∥x∥ for 1 ≤ i ≤ m.
Chapter 2. Limits of Multivariable Functions and Continuity 114
Thus,
∥w∥ =
√
w2
1 + w2
2 + · · ·+ w2
m
≤ ∥x∥
√
∥a1∥2 + ∥a2∥2 + · · ·+ ∥am∥2 = ∥A∥∥x∥.
The difference between the proofs of Theorem 2.34 and Proposition 2.35 is
that, in the proof of Theorem 2.34, we find that the smallest possible c such that
∥Ax∥ ≤ c∥x∥ for all x in Rn is the largest singular value of the matrix A. In
Proposition 2.35, we find a candidate for c, which is the norm of the matrix A, but
this is usually not the optimal one.
When m = n, we denote the space of n × n matrices Mn,n simply as Mn .
The determinant of the matrix A = [aij] ∈ Mn is given by
detA =
∑
σ
sgn(σ)a1σ(1)a2σ(2) · · · anσ(n).
Here the summation is over all the n! permutations σ of the set Sn = {1, 2, . . . , n},
and sgn(σ) is the sign of the permutation σ, which is equal to 1 or −1, depending
on whether σ can be written as the product of an even number or an odd number
of transpositions. For example, when n = 1, det[a] = a. When n = 2,
det
[
a11 a12
a21 a22
]
= a11a22 − a12a21.
When n = 3,
det
a11 a12 a13
a21 a22 a23
a31 a32 a33
 = a11a22a33 + a12a23a31 + a13a21a32
− a11a23a32 − a13a22a31 − a12a21a33.
The determinant function det : Mn → R is a polynomial function on the
variables (aij). Hence, it is a continuous function. Recall that a matrix A ∈ Mn
is invertible if and only if detA ̸= 0. Let
GL (n,R) = {A ∈ Mn | detA ̸= 0}
Chapter 2. Limits of Multivariable Functions and Continuity 115
be the subset of Mn that consist of invertible n× n matrices. It is a group under
matrix multiplication, called the general linear group. By definition,
GL (n,R) = det−1(R \ {0}).
Since R \ {0} is an open subset of R, GL (n,R) is an open subset of Mn. This
gives the following.
Proposition 2.36
Given that A is an invertible n× n matrix, there exists r > 0 such that if B
is an n× n matrix with ∥B − A∥ < r, then B is also invertible.
Sketch of Proof
This is simply a rephrase of the statement that if A is a point in the open set
GL (n,R), then there is a ball B(A, r) with center at A that is contained in
GL (n,R).
Let A be an n× n matrix. For 1 ≤ i, j ≤ n, the (i, j)-minor of A, denoted by
Mi,j , is the determinant of the (n− 1)× (n− 1) matrix obtained by deleting the
ith-row and j th- column of A. Using the same reasoning as above, we find that the
function Mi,j : Mn → R is a continuous function. The (i, j) cofactor Ci,j of A is
given by Ci,j = (−1)i+jMi,j . The cofactor matrix of A is CA = [Cij]. Since each
of the components is continuous, the function C : Mn → Mn taking A to CA is
a continuous function.
If A is invertible,
A−1 =
1
detA
CT
A .
Since both C : Mn → Mn and det : Mn → R are continuous functions, and
det : GL (n,R) → R is a function that is never equal to 0, we obtain the following.
Theorem 2.37
The map I : GL (n,R) → GL (n,R) that takes A to A−1 is continuous.
Chapter 2. Limits of Multivariable Functions and Continuity 116
Exercises 2.3
Question 1
Let x0 be a point in Rn. Define the function f : Rn → R by
f(x) = ∥x− x0∥.
Show that f is a continuous function.
Question 2
Let O = R3 \ {(0, 0, 0)} and define the function F : O → R2 by
F(x, y, z) =
(
y
x2 + y2 + z2
,
z
x2 + y2 + z2
)
.
Show that F is a continuous function.
Question 3
Let f : Rn → R be the function defined as
f(x) =
1, if at least one of the xi is rational,
0, otherwise.
At which point of Rn is the function f continuous?
Question 4
Let f : Rn → R be the function defined as
f(x) =
x21 + · · ·+ x2n, if at least one of the xi is rational,
0, otherwise.
At which point of Rn is the function f continuous?
Chapter 2. Limits of Multivariable Functions and Continuity 117
Question 5
Let f : R3 → R be the function defined by
f(x, y, z) =

sin(x2 + 4y2 + z2)
x2 + 4y2 + z2
, if (x, y, z) ̸= (0, 0, 0),
a, if (x, y, z) = (0, 0, 0).
Show that there exists a value a such that f is a continuous function, and
find this value of a.
Question 6
Let a and b be positive numbers, and let O be the subset of Rn defined as
O = {x ∈ Rn | a < ∥x∥ < b} .
Show that O is open.
Question 7
Let A be the subset of R2 given by
A = {(x, y) | sin(x+ y) + xy > 1} .
Show that A is an open set.
Question 8
Let A be the subset of R3 given by
A = {(x, y, z) |x ≥ 0, y ≤ 1, exy ≤ z} .
Show thatA is a closed set.
Chapter 2. Limits of Multivariable Functions and Continuity 118
Question 9
A plane in R3 is the set of all points (x, y, z) satisfying an equation of the
form
ax+ by + cz = d,
where (a, b, c) ̸= (0, 0, 0). Show that a plane is a closed subset of R3.
Question 10
Define the sets A,B,C and D as follows.
(a) A =
{
(x, y, z) |x2 + 4y2 + 9z2 < 36
}
(b) B =
{
(x, y, z) |x2 + 4y2 + 9z2 ≤ 36
}
(c) C =
{
(x, y, z) | 0 < x2 + 4y2 + 9z2 < 36
}
(d) D =
{
(x, y, z) | 0 < x2 + 4y2 + 9z2 ≤ 36
}
For each of these sets, find its interior, exterior and boundary.
Question 11
Let a and b be real numbers, and assume that f : Rn → R is a continuous
function. Consider the following subsets of Rn.
(a) A = {x ∈ Rn | f(x) > a}
(b) B = {x ∈ Rn | f(x) ≥ a}
(c) C = {x ∈ Rn | f(x) < a}
(d) D = {x ∈ Rn | f(x) ≤ a}
(e) E = {x ∈ Rn | a < f(x) < b}
(f) F = {x ∈ Rn | a ≤ f(x) ≤ b}
Show that A,C and E are open sets, while B, D and F are closed sets.
Chapter 2. Limits of Multivariable Functions and Continuity 119
Question 12
Let f : R2 → R be the function defined as
f(x, y) =
x2 + y2, if x2 + y2 < 4
8− x2 − y2, if x2 + y2 ≥ 4.
Show that f is a continuous function.
Question 13
Show that the distance function on Rn, d : Rn × Rn → R,
d(u,v) = ∥u− v∥,
is continuous in the following sense. If {uk} and {vk} are sequences in
Rn that converges to u and v respectively, then the sequence {d(uk,vk)}
converges to d(u,v).
Question 14
Let T : R2 → R3 be the mapping
T(x, y) = (x+ y, 3x− y, 6x+ 5y).
Show that T : R2 → R3 is a Lipschitz mapping, and find the smallest
Lipschitz constant for this mapping.
Question 15
Given that A is a subset of Rm and B is a subset of Rn, let C = A × B.
Then C is a subset of Rm+n.
(a) If A is open in Rm and B is open in Rn, show that A × B is open in
Rm+n.
(b) If A is closed in Rm and B is closed in Rn, show that A× B is closed
in Rm+n.
Chapter 2. Limits of Multivariable Functions and Continuity 120
Question 16
Let D be a subset of Rn, and let f : D → R be a continuous function
defined on D. Let A = D× R and define the function g : A→ R by
g(x, y) = y − f(x).
Show that g : A→ R is continuous.
Question 17
Let U be an open subset of Rn, and let f : U → R be a continuous function
defined on U . Show that the sets
O1 = {(x, y) |x ∈ U, y < f(x)} , O2 = {(x, y) |x ∈ U, y > f(x)}
are open subsets of Rn+1.
Question 18
Let C be a closed subset of Rn, and let f : C → R be a continuous function
defined on C. Show that the sets
A1 = {(x, y) |x ∈ C, y ≤ f(x)} , A2 = {(x, y) |x ∈ C, y ≥ f(x)}
are closed subsets of Rn+1.
Chapter 2. Limits of Multivariable Functions and Continuity 121
2.4 Uniform Continuity
In volume I, we have seen that uniform continuity plays important role in single
variable analysis. In this section, we extend this concept to multivariable functions.
Definition 2.10 Continuity
Let D be a subset of Rn, and let F : D → Rm be a function defined on
D. We say that the function F is uniformly continuous provided that for
any ε > 0, there exists δ > 0 such that if u and v are points in D and
∥u− v∥ < δ, then
∥F(u)− F(v)∥ < ε.
The following two propositions are obvious.
Proposition 2.38
A uniformly continuous function is continuous.
Proposition 2.39
Given that D is a subset of Rn, and D′ is a subset of D, if the function
F : D → Rm is uniformly continuous, then the function F : D′ → Rm is
also uniformly continuous.
A special class of uniformly continuous functions is the class of Lipschitz
functions.
Theorem 2.40
Let D be a subset of Rn, and let F : D → Rm be a function defined on D.
If F : D → Rm is Lipschitz, then it is uniformly continuous.
The proof is straightforward.
Remark 2.3
Theorem 2.34 and Theorem 2.40 imply that a linear transformation is
uniformly continuous.
Chapter 2. Limits of Multivariable Functions and Continuity 122
There is an equivalent definition for uniform continuity in terms of sequences.
Theorem 2.41
Let D be a subset of Rn, and let F : D → Rm be a function defined on D.
Then the following are equivalent.
(i) F : D → Rm is uniformly continuous. Namely, given ε > 0, there
exists δ > 0 such that if u and v are points in D and ∥u − v∥ < δ,
then
∥F(u)− F(v)∥ < ε.
(ii) If {uk} and {vk} are two sequences in D such that
lim
k→∞
(uk − vk) = 0,
then
lim
k→∞
(F(uk)− F(vk)) = 0.
Let us give a proof of this theorem here.
Proof
Assume that (i) holds, and {uk} and {vk} are two sequences in D such that
lim
k→∞
(uk − vk) = 0.
Given ε > 0, (i) implies that there exists δ > 0 such that if u and v are
points in D and ∥u− v∥ < δ, then
∥F(u)− F(v)∥ < ε.
Since lim
k→∞
(uk − vk) = 0, there is a positive integer K such that for all
k ≥ K, ∥uk − vk∥ < δ. It follows that
∥F(uk)− F(vk)∥ < ε for all k ≥ K.
Chapter 2. Limits of Multivariable Functions and Continuity 123
This shows that
lim
k→∞
(F(uk)− F(vk)) = 0,
and thus completes the proof of (i) implies (ii).
Conversely, assume that (i) does not hold. This means there exists an ε > 0,
for all δ > 0, there exist points u and v in D such that ∥u − v∥ < δ and
∥F(u) − F(v)∥ ≥ ε. Thus, for every k ∈ Z+, there exists uk and vk in D
such that
∥uk − vk∥ <
1
k
, (2.5)
and ∥F(uk)−F(vk)∥ ≥ ε. Notice that {uk} and {vk} are sequences in D.
Eq. (2.5) implies that lim
k→∞
(uk − vk) = 0. Since ∥F(uk)− F(vk)∥ ≥ ε,
lim
k→∞
(F(uk)− F(vk)) ̸= 0.
This shows that if (i) does not hold, then (ii) does not hold.
From Theorem 2.41, we can deduce the following.
Proposition 2.42
Let D be a subset of Rn, and let F : D → Rm be a function defined on
D. Then F : D → Rm is uniformly continuous if and only if each of the
component functions Fj = (πj ◦ F) : D → R, 1 ≤ j ≤ m, is uniformly
continuous.
Let us look at some more examples.
Example 2.34
Let D be the open rectangle D = (0, 5)× (0, 7), and consider the function
f : D → R defined by
f(x, y) = xy.
Determine whether f : D → R is uniformly continuous.
Chapter 2. Limits of Multivariable Functions and Continuity 124
Solution
For any two points u1 = (x1, y1) and u2 = (x2, y2) in D, 0 < x1, x2 < 5
and 0 < y1, y2 < 7. Since
f(u1)− f(u2) = x1y1 − x2y2 = x1(y1 − y2) + y2(x1 − x2),
we find that
|f(u1)− f(u2)| ≤ |x1||y1 − y2|+ |y2||x1 − x2|
≤ 5∥u1 − u2∥+ 7∥u1 − u2∥ = 12∥u1 − u2∥.
This shows that f : D → R is a Lipschitz function. Hence, it is uniformly
continuous.
Example 2.35
Consider the function f : R2 → R defined by
f(x, y) = xy.
Determine whether f : R2 → R is uniformly continuous.
Solution
For k ∈ Z+, let
uk =
(
k +
1
k
, k
)
, vk = (k, k).
Then {uk} and {vk} are sequences of points in R2 and
lim
k→∞
(uk − vk) = lim
k→∞
(
1
k
, 0
)
= (0, 0).
However,
f(uk)− f(vk) = k
(
k +
1
k
)
− k2 = 1.
Chapter 2. Limits of Multivariable Functions and Continuity 125
Thus,
lim
k→∞
(f(uk)− f(vk)) = 1 ̸= 0.
Therefore, the function f : R2 → R is not uniformly continuous.
Example 2.34 and 2.35 show that whether a function is uniformly continuous
depends on the domain of the function.
Chapter 2. Limits of Multivariable Functions and Continuity 126
Exercises 2.4
Question 1
Let F : R3 → R2 be the function defined as
F(x, y, z) = (3x− 2z + 7, x+ y + z − 4).
Show that F : R3 → R2 is uniformly continuous.
Question 2
Let D = (0, 1)× (0, 2). Consider the function f : D → R defined as
f(x, y) = x2 + 3y.
Determine whether f is uniformly continuous.
Question 3
Let D = (1,∞)× (1,∞). Consider the function f : D → R defined as
f(x, y) =
√
x+ y.
Determine whether f is uniformly continuous.
Question 4
Let D = (0, 1)× (0, 2). Consider the function f : D → R defined as
f(x, y) =
1√
x+ y
.
Determine whether f is uniformly continuous.
Chapter 2. Limits of Multivariable Functions and Continuity 127
2.5 Contraction Mapping Theorem
Among the Lipschitz functions, there is a subset called contractions.
Definition 2.11 Contractions
Let D be a subset of Rn. A function F : D → Rm is called a contraction if
there exists a constant 0 ≤ c < 1 such that
∥F(u)− F(v)∥ ≤ c∥u− v∥ for all u,v ∈ D.In other words, a contraction is a Lipschitz function which has a Lipschitz
constant that is less than 1.
Example 2.36
Let b be a point in Rn, and let F : Rn → Rn be the function defined as
F(x) = cx+ b.
The mapping F is a contraction if and only if |c| < 1.
The contraction mapping theorem is an important result in analysis. Extended
to metric spaces, it is an important tool to prove the existence and uniqueness of
solutions of ordinary differential equations.
Theorem 2.43 Contraction Mapping Theorem
Let D be a closed subset of Rn, and let F : D → D be a contraction. Then
F has a unique fixed point. Namely, there is a unique u in D such that
F(u) = u.
Proof
By definition, there is a constant c ∈ [0, 1) such that
∥F(u)− F(v)∥ ≤ c∥u− v∥ for all u,v ∈ D.
Chapter 2. Limits of Multivariable Functions and Continuity 128
We start with any point x0 in D and construct the sequence {xk} inductively
by
xk+1 = F(xk) for all k ≥ 0.
Notice that for all k ∈ Z+,
∥xk+1 − xk∥ = ∥F(xk)− F(xk−1)∥ ≤ c∥xk − xk−1∥.
By iterating, we find that
∥xk+1 − xk∥ ≤ ck∥x1 − x0∥.
Therefore, if l > k ≥ 0, triangle inequality implies that
∥xl − xk∥ ≤ ∥xl − xl−1∥+ · · ·+ ∥xk+1 − xk∥
≤ (cl−1 + . . .+ ck)∥x1 − x0∥.
Since c ∈ [0, 1),
cl−1 + . . .+ ck = ck(1 + c+ · · ·+ cl−k−1) <
ck
1− c
.
Therefore, for all l > k ≥ 0,
∥xl − xk∥ <
ck
1− c
∥x1 − x0∥.
Given ε > 0, there exists a positive integer K such that for all k ≥ K,
ck
1− c
∥x1 − x0∥ < ε.
This implies that for all l > k ≥ K,
∥xl − xk∥ < ε.
Chapter 2. Limits of Multivariable Functions and Continuity 129
In other words, we have shown that {xk} is a Cauchy sequence. Therefore,
it converges to a point u in Rn. Since D is closed, u is in D.
Since F is continuous, the sequence {F(xk)} converges to F(u). But
F(xk) = xk+1. Being a subsequence of {xk}, the sequence {xk+1}
converges to u as well. This shows that
F(u) = u,
which says that u is a fixed point of F. Now if v is another point in D such
that F(v) = v, then
∥u− v∥ = ∥F(u)− F(v)∥ ≤ c∥u− v∥.
Since c ∈ [0, 1), this can only be true if ∥u − v∥ = 0, which implies that
v = u. Hence, the fixed point of F is unique.
As an application of the contraction mapping theorem, we prove the following.
Theorem 2.44
Let r be a positive number and let G : B(0, r) → Rn be a mapping such
that G(0) = 0, and
∥G(u)−G(v)∥ ≤ 1
2
∥u− v∥ for all u,v ∈ B(0, r).
If F : B(0, r) → Rn is the function defined as
F(x) = x+G(x),
then F is a one-to-one continuous mapping whose image contains the open
ball B(0, r/2).
Chapter 2. Limits of Multivariable Functions and Continuity 130
Proof
By definition, G is a contraction. Hence, it is continuous. Therefore, F :
B(0, r) → Rn is also continuous. If F(u) = F(v), then
u− v = G(v)−G(u).
Therefore,
∥u− v∥ = ∥G(v)−G(u)∥ ≤ 1
2
∥u− v∥.
This implies that ∥u− v∥ = 0, and thus, u = v. Hence, F is one-to-one.
Given y ∈ B(0, r/2), let r1 = 2∥y∥. Then r1 < r. Consider the map
H : CB(0, r1) → Rn defined as
H(x) = y −G(x).
For any u and v in CB(0, r1),
∥H(u)−H(v)∥ = ∥G(u)−G(v)∥ ≤ 1
2
∥u− v∥.
Therefore, H is also a contraction. Notice that if x ∈ CB(0, r1),
∥H(x)∥ ≤ ∥y∥+ ∥G(x)−G(0)∥ ≤ r1
2
+
1
2
∥x∥ ≤ r1
2
+
r1
2
= r1.
Therefore, H is a contraction that maps the closed set CB(0, r1) into itself.
By the contraction mapping theorem, there exists u in CB(0, r1) such that
H(u) = u. This gives
y −G(u) = u,
or equivalently,
y = u+G(u) = F(u).
In other words, we have shown that there exists u ∈ CB(0, r1) ⊂ B(0, r)
such that F(u) = y. This proves that the image of the map F : B(0, r) →
Rn contains the open ball B(0, r/2).
Chapter 2. Limits of Multivariable Functions and Continuity 131
Exercises 2.5
Question 1
Let
Sn =
{
(x1, . . . , xn, xn+1) ∈ Rn+1 |x21 + · · ·+ x2n + x2n+1 = 1
}
be the n-sphere, and let F : Sn → Sn be a mapping such that
∥F(u)− F(v)∥ ≤ 2
3
∥u− v∥ for all u,v ∈ Sn.
Show that there is a unique w ∈ Sn such that F(w) = w.
Question 2
Let r be a positive number, and let c be a positive number less than 1.
Assume that G : B(0, r) → Rn is a mapping such that G(0) = 0, and
∥G(u)−G(v)∥ ≤ c∥u− v∥ for all u,v ∈ B(0, r).
If F : B(0, r) → Rn is the function defined as
F(x) = x+G(x),
show that F is a one-to-one continuous mapping whose image contains the
open ball B(0, ar), where a = 1− c.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 132
Chapter 3
Continuous Functions on Connected Sets and
Compact Sets
In volume I, we have seen that intermediate value theorem and extreme value
theorem play important roles in analysis. In order to extend these two theorems
to multivariable functions, we need to consider two topological properties of sets
– the connectedness and compactness.
3.1 Path-Connectedness and Intermediate Value Theorem
We want to extend the intermediate value theorem to multivariable functions. For
this, we need to consider a topological property called connectedness. In this
section, we will discuss the topological property called path-connectedness first,
which is a more natural concept.
Definition 3.1 Path
Let S be a subset of Rn, and let u and v be two points in S. A path in S
joining u to v is a continuos function γ : [a, b] → S such that γ(a) = u
and γ(b) = v.
For any real numbers a and b with a < b, the map u : [0, 1] → [a, b] defined
by
u(t) = a+ t(b− a)
is a continuous bijection. Its inverse u−1 : [a, b] → [0, 1] is
u−1(t) =
t− a
b− a
,
which is also continuous. Hence, in the definition of a path, we can let the domain
be any [a, b] with a < b.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 133
Figure 3.1: A path in S joining u to v.
Example 3.1
Given a set S and a point x0 in S, the constant function γ : [a, b] → S,
γ(t) = x0, is a path in S.
If γ : [a, b] → S is a path in S ⊂ Rn, and S ′ is any other subset of Rn that
contains the image of γ, then γ is also a path in S ′.
Example 3.2
Let R be the rectangle R = [−2, 2]× [−2, 2]. The function γ : [0, 1] → R2,
γ(t) = (cos(πt), sin(πt)) is a path in R joining u = (1, 0) to v = (−1, 0).
Figure 3.2: The path in Example 3.2.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 134
Example 3.3
Let S be a subset of Rn. If γ : [a, b] → S is a path in S joining u to v, then
γ̃ : [−b,−a] → S, γ̃(t) = γ(−t), is a path in S joining v to u.
Now we define path-connectedness.
Definition 3.2 Path-Connected
Let S be a subset of Rn. We say that S is path-connected if any two points
u and v in S can be joined by a path in S.
It is easy to characterize a path-connected subset of R. In volume I, we have
defined the concept of convex sets. A subset S of R is a convex set provided that
for any u and v in S and any t ∈ [0, 1], (1−t)u+tv is also in S. This is equivalent
to if u and v are points in S with u < v, all the points w satisfying u < w < v is
also in S. We have shown that a subset S of R is a convex set if and only if it is
an interval.
The following theorem characterize a path-connected subset of R.
Theorem 3.1
Let S be a subset of R. Then S is path-connected if and only if S is an
interval.
Proof
If S is an interval, then for any u and v in S, and for any t ∈ [0, 1], (1 −
t)u + tv is in S. Hence, the function γ : [0, 1] → S, γ(t) = (1 − t)u + tv
is a path in S that joins u to v.
Conversely, assume that S is a path-connected subset of R. To show that
S is an interval, we need to show that for any u and v in S with u < v,
any w that is in the interval [u, v] is also in S. Since S is path-connected,
there is a path γ : [0, 1] → S such that γ(0) = u and γ(1) = v. Since γ is
continuous, and w is in between γ(0) and γ(1), intermediate value theorem
implies that there is a c ∈ [0, 1] so that γ(c) = w. Thus, w is in S.
To explore path-connected subsets of Rn with n ≥ 2, we first extend the
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 135
concept of convex sets to Rn. Given two points u and v in Rn, when t runs
through all the points in the interval [0, 1], (1 − t)u + tv describes all the pointson the line segment between u and v.
Definition 3.3 Convex Sets
Let S be a subset of Rn. We say that S is convex if for any two points u and
v in S, the line segment between u and v lies entirely in S. Equivalently,
S is convex provided that for any two points u and v in S, the point (1 −
t)u+ tv is in S for any t ∈ [0, 1].
Figure 3.3: A is a convex set, B is not.
If u = (u1, . . . , un) and v = (v1, . . . , vn) are two points in Rn, the map
γ : [0, 1] → Rn,
γ(t) = (1− t)u+ tv = ((1− t)u1 + tv1, . . . , (1− t)un + tvn)
is a continuous functions, since each of its components is continuous. Thus, we
have the following.
Theorem 3.2
Let S be a subset of Rn. If S is convex, then it is path-connected.
Let us look at some examples of convex sets.
Example 3.4
Let I1, . . ., In be intervals in R. Show that the set S = I1 × · · · × In is
path-connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 136
Solution
We claim that S is convex. Then Theorem 3.2 implies that S is path-
connected.
Given that u = (u1, . . . , un) and v = (v1, . . . , vn) are two points in S, for
each 1 ≤ i ≤ n, ui and vi are in Ii. Since Ii is an interval, for any t ∈ [0, 1],
(1− t)ui + tvi is in Ii. Hence,
(1− t)u+ tv = ((1− t)u1 + tv1, . . . , (1− t)un + tvn)
is in S. This shows that S is convex.
Special cases of sets of the form S = I1 × · · · × In are open and closed
rectangles.
Example 3.5
An open rectangle
U = (a1, b1)× · · · × (an, bn)
and its closure
R = [a1, b1]× · · · × [an, bn]
are convex sets. Hence, they are path-connected.
Example 3.6
Let x0 be a point in Rn, and let r be a positive number. Show that the open
ball B(x0, r) and the closed ball CB(x0, r) are path-connected sets.
Solution
Let u and v be two points inB(x0, r). Then ∥u−x0∥ < r and ∥v−x0∥ < r.
For any t ∈ [0, 1], t ≥ 0 and 1− t ≥ 0. By triangle inequality,
∥(1− t)u+ tv − x0∥ ≤ ∥(1− t)(u− x0)∥+ ∥t(v − x0)∥
= (1− t)∥u− x0∥+ t∥v − x0∥
< (1− t)r + tr = r.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 137
This shows that (1− t)u+ tv is in B(x0, r). Hence, B(x0, r) is convex.
Replacing < by ≤, one can show that CB(x0, r) is convex.
By Theorem 3.2, the open ball B(x0, r) and the closed ball CB(x0, r) are
path-connected sets.
Not all the path-connected sets are convex. Before we give an example, let us
first prove the following useful lemma.
Lemma 3.3
Let A and B be path-connected subsets of Rn. If A ∩ B is nonempty, then
S = A ∪B is path-connected.
Proof
Let u and v be two points in S. If both u and v are in the set A, then
they can be joined by a path in A, which is also in S. Similarly, if both u
and v are in the set B, then they can be joined by a path in S. If u is in
A and v is in B, let x0 be any point in A ∩ B. Then u and x0 are both
in the path-connected set A, and v and x0 are both in the path-connected
set B. Therefore, there exist continuous functions γ1 : [0, 1] → A and
γ2 : [1, 2] → B such that γ1(0) = u, γ1(1) = x0, γ2(1) = x0 and
γ2(2) = v. Define the function γ : [0, 2] → A ∪B by
γ(t) =
γ1(t), if 0 ≤ t ≤ 1,
γ2(t), if 1 ≤ t ≤ 2.
Since [0, 1] and [1, 2] are closed subsets of R, the function γ : [0, 2] → S
is continuous. Thus, γ is a path in S from u to v. This proves that S is
path-connected.
Now we can give an example of a path-connected set that is not convex.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 138
Figure 3.4: If two sets A and B are path-connected and A ∩ B is nonempty, then
A ∪B is also path-connected.
Example 3.7
Show that the set
S = {(x, y) | 0 ≤ x ≤ 1,−2 ≤ y ≤ 2} ∪
{
(x, y) | (x− 2)2 + y2 ≤ 1
}
is path-connected, but not convex.
Solution
The set
A = {(x, y) | 0 ≤ x ≤ 1,−2 ≤ y ≤ 2} = [0, 1]× [−2, 2]
is a closed rectangle. Therefore, it is path- connected. The set
B =
{
(x, y) | (x− 2)2 + y2 ≤ 1
}
is a closed ball with center at (2, 0) and radius 1. Hence, it is also path-
connected. Since the point x0 = (1, 0) is in both A and B, S = A ∪ B is
path-connected.
The points u = (1, 2) and v = (2, 1) are in S. Consider the point
w =
1
2
u+
1
2
v =
(
3
2
,
3
2
)
.
It is not in S. This shows that S is not convex.
Let us now prove the following important theorem which says that continuous
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 139
Figure 3.5: The set A ∪B is path-connected but not convex.
functions preserve path-connectedness.
Theorem 3.4
Let D be a path-connected subset of Rn. If F : D → Rm is a continuous
function, then F(D) is path-connected.
Proof
Let v1 and v2 be two points in F(D). Then there exist u1 and u2 in D such
that F(u1) = v1 and F(u2) = v2. Since D is path-connected, there is a
continuous function γ : [0, 1] → D such that γ(0) = u1 and γ(1) = u2.
The map α = (F ◦ γ) : [0, 1] → F(D) is then a conitnuous map with
α(0) = v1 and α(1) = v2. This shows that F(D) is path-connected.
From Theorem 3.4, we obtain the following.
Theorem 3.5 Intermediate Value Theorem for Path-Connected Sets
Let D be a path-connected subset of Rn, and let f : D → R be a function
defined on D. If f is continuous, then f(D) is an interval.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 140
Proof
By Theorem 3.4, f(D) is a path-connected subset of R. By Theorem 3.1,
f(D) is an interval.
We can also use Theorem 3.4 to establish more examples of path-connected
sets. Let us first look at an example.
Example 3.8
Show that the circle
S1 =
{
(x, y) |x2 + y2 = 1
}
is path-connected.
Solution
Define the function f : [0, 2π] → R2 by
f(t) = (cos t, sin t).
Notice that S1 = f([0, 2π)]. Since each component of f is a continuous
function, f is a continuous function. Since [0, 2π] is an interval, it is path-
connected. By Theorem 3.4, S1 = f([0, 2π]) is path-connected.
A more general theorem is as follows.
Theorem 3.6
Let D be a path-connected subset of Rn, and let F : D → Rm be a function
defined on D. If F : D → Rm is continuous, then the graph of F,
GF = {(x,y) |x ∈ D,y = F(x)}
is a path-connected subset of Rn+m.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 141
Proof
By Corollary 2.32, the function H : D → Rn+m, H(x) = (x,F(x)), is
continuous. Since H(D) = GF, Theorem 3.4 implies that GF is a path-
connected subset of Rn+m.
Now let us consider spheres, which are boundary of balls.
Definition 3.4 The Standard Unit nnn-Sphere SnSnSn
A standard unit n-sphere Sn is a subset of Rn+1 consists of all points x =
(x1, . . . , xn, xn+1) in Rn+1 satisfying the equation ∥x∥ = 1, namely,
x21 + · · ·+ x2n + x2n+1 = 1.
The n-sphere Sn is the boundary of the (n + 1) open ball Bn+1 = B(0, 1)
with center at the origin and radius 1.
Figure 3.6: A sphere.
Example 3.9
Show that the standard unit n-sphere Sn is path-connected.
Solution
Notice that Sn = Sn
+∪Sn
−, where Sn
+ and Sn
− are respectively the upper and
lower hemispheres with xn+1 ≥ 0 and xn+1 ≤ 0 respectively.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 142
If x ∈ Sn
+, then
xn+1 =
√
1− x21 − . . .− x2n;
whereas if x ∈ Sn
−,
xn+1 = −
√
1− x21 − . . .− x2n.
Let
CBn =
{
(x1, . . . , xn) |x21 + · · ·+ x2n ≤ 1
}
be the closed ball in Rn with center at the origin and radius 1. Define the
functions f± : CBn → R by
f±(x1, . . . , xn) = ±
√
1− x21 − . . .− x2n.
Notice that Sn
+ and Sn
− are respectively the graphs of f+ and f−. Since they
are compositions of the square root function and a polynomial function,
which are both continuous, f+ and f− are continuous functions. The closed
ball CBn is path-connected. Theorem 3.6 then implies that Sn
+ and Sn
− are
path-connected.
Since both Sn
+ and Sn
− contain the unit vector e1 in Rn+1, the set Sn
+ ∩ Sn
−
is nonempty. By Lemma 3.3, Sn = Sn
+ ∪ Sn
− is path-connected.
Remark 3.1
There is an alternative way to prove that the n-sphere Sn is path-connected.
Given two distinct points u and v in Sn, they are unit vectors in Rn+1. We
want to show that there is a path in Sn joining u to v.
Notice that the line segmentL = {(1− t)u+ tv | 0 ≤ t ≤ 1} in Rn+1
contains the origin if and only if u and v are parallel, if and only if v = −u.
Thus, we discuss two cases.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 143
Case I: v ̸= −u.
In this case, let γ : [0, 1] → Rn+1 be the function defined as
γ(t) =
(1− t)u+ tv
∥(1− t)u+ tv∥
.
Since (1− t)u+ tv ̸= 0 for all 0 ≤ t ≤ 1, γ is a continuous function. It is
easy to check that its image lies in Sn. Hence, γ is a path in Sn joining u
to v.
Case 2: v = −u.
In this case, let w be a unit vector orthogonal to u, and let γ : [0, π] → Rn+1
be the function defined as
γ(t) = (cos t)u+ (sin t)w.
Since sin t and cos t are continuous functions, γ is a continuous function.
Since u and w are orthogonal, the generalized Pythagoras theorem implies
that
∥γ(t)∥2 = cos2 t∥u∥2 + sin2 t∥w∥2 = cos2 t+ sin2 t = 1.
Therefore, the image of γ lies in Sn. It is easy to see that γ(0) = u and
γ(π) = −u = v. Hence, γ is a path in Sn joining u to v.
Example 3.10
Let f : Sn → R be a continuous function. Show that there is a point u0 on
Sn such that f(u0) = f(−u0).
Solution
The function g : Rn+1 → Rn+1, g(u) = −u is a linear transformation.
Hence, it is continuous. Restricted to Sn, g(Sn) = Sn. Thus, the function
f1 : S
n → R, f1(u) = f(−u), is also continuous.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 144
It follows that the function h : Sn → R defined by
h(u) = f(u)− f(−u)
is continuous. Notice that
h(−u) = f(−u)− f(u) = −h(u).
This implies that if the number a is in the range of h, so does the number
−a. Since the number 0 is in between a and −a for any a, and Sn is path-
connected, intermediate value theorem implies that the number 0 is also in
the range of h. This means that there is an u0 on Sn such that h(u0) = 0.
Equivalently, f(u0) = f(−u0).
Theorem 3.5 says that a continuous function defined on a path-connected set
satisfies the intermediate value theorem. We make the following definition.
Definition 3.5 Intermediate Value Property
Let S be a subset of Rn. We say that S has intermediate value property
provided that whenever f : S → R is a continuous function, then f(S) is
an interval.
Theorem 3.5 says that if S is a path-connected set, then it has intermediate
value property. It is natural to ask whether it is true that any set S that has the
intermediate value property must be path-connected. Unfortunately, it turns out
that the answer is yes only when S is a subset of R. If S is a subset of Rn with
n ≥ 2, this is not true. This leads us to define a new property of sets called
connectedness in the next section.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 145
Exercises 3.1
Question 1
Is the set A = (−1, 2) ∪ (2, 5] path-connected? Justify your answer.
Question 2
Let a and b be positive numbers, and let A be the subset of R2 given by
A =
{
(x, y)
∣∣∣∣ x2a2 +
y2
b2
≤ 1
}
.
Show that A is convex, and deduce that it is path-connected.
Question 3
Let (a, b, c) be a nonzero vector, and let P be the plane in R3 given by
P = {(x, y, z) | ax+ by + cz = d} ,
where d is a constant. Show that P is convex, and deduce that it is path-
connected.
Question 4
Let S be the subset of R3 given by
S = {(x, y, z) |x > 0, y ≤ 1, 2 ≤ z < 7} .
Show that S is path-connected.
Question 5
Let a, b and c be positive numbers, and let S be the subset of R3 given by
S =
{
(x, y, z)
∣∣∣∣ x2a2 +
y2
b2
+
z2
c2
= 1
}
.
Show that S is path-connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 146
Question 6
Let u = (3, 0) and let A be the subset of R2 given by
A =
{
(x, y) |x2 + y2 ≤ 1
}
.
Define the function f : A→ R by f(x) = d(x,u).
(a) Find f(x1) and f(x2), where x1 = (1, 0) and x2 = (−1, 0).
(b) Use intermediate value theorem to justify that there is a point x0 in A
such that d(x0,u) = π.
Question 7
Let A and B be subsets of Rn. If A and B are convex, show that A ∩ B is
also convex.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 147
3.2 Connectedness and Intermediate Value Property
In this section, we study a property of sets which is known as connectedness. Let
us first look at the path-connected subsets of R from a different perpective. We
have shown in the previous section that a subset of R is path-connected if and only
if it is an interval. A set of the form
A = (−2, 2] \ {0} = (−2, 0) ∪ (0, 2]
is not path-connected, since it contains the points −1 and 1, but it does not contain
the point 0 that is in between. Intuitively, there is no way to go from the point −1
to 1 continuously without leaving the set A.
Let U = (−∞, 0) and V = (0,∞). Notice that U and V are open subsets of
R which both intersect the set A. Moreover,
A = (A ∩ U) ∪ (A ∩ V ),
or equivalently,
A ⊂ U ∪ V.
We say that A is separated by the open sets U and V .
Definition 3.6 Separation of a Set
Let A be a subset of Rn. A separation of A is a pair (U, V ) of subsets of
Rn which satisfies the following conditions.
(a) U and V are open sets.
(b) A ∩ U ̸= ∅ and A ∩ V ̸= ∅.
(c) A ⊂ U ∪ V , or equivalently, A is the union of A ∩ U and A ∩ V .
(d) A is disjoint from U ∩V , or equivalently, A∩U and A∩V are disjoint.
If (U, V ) is a separation of A, we say that A is separated by the open sets
U and V , or the open sets U and V separate A.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 148
Example 3.11
Let A = (−2, 0) ∪ (0, 2], and let U = (−∞, 0) and V = (0,∞). Then the
open sets U and V separate A.
Let U1 = (−3, 0) and V1 = (0, 3). The open sets U1 and V1 also separate
A.
Now we define connectedness.
Definition 3.7 Connected Sets
Let A be a subset of Rn. We say that A is connected if there does not exist
a pair of open sets U and V that separate A.
Example 3.12
Determine whether the set
A = {(x, y) | y = 0} ∪
{
(x, y)
∣∣∣∣ y =
2
1 + x2
}
is connected.
Solution
Let f : R2 → R be the function defined as
f(x, y) = y(x2 + 1).
Since f is a polynomial function, it is continuous. The intervals V1 =
(−1, 1) and V2 = (1, 3) are open sets in R. Hence, the sets U1 = f−1(V1)
and U2 = f−1(V2) are disjoint and they are open in R2. Notice that
A ∩ U1 = {(x, y) | y = 0} , A ∩ U2 =
{
(x, y)
∣∣∣∣ y =
2
1 + x2
}
.
Thus, A ∩ U1 and A ∩ U2 are nonempty, A ∩ U1 and A ∩ U2 are disjoint,
and A is a union of A ∩ U1 and A ∩ U2. This shows that the open sets U1
and U2 separate A. Hence, A is not connected.
Now let us explore the relation between path-connected and connected. We
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 149
Figure 3.7: The set A defined in Example 3.12 is not connected.
first prove the following.
Theorem 3.7
Let A be a subset of Rn, and assume that the open sets U and V separate
A. Define the function f : A→ R by
f(x) =
0, if x ∈ A ∩ U,
1, if x ∈ A ∩ V.
Then f is continuous.
Notice that the function f is well defined since A ∩ U and A ∩ V are disjoint.
Proof
Let x0 be a point in A. We want to prove that f is continuous at x0. Since
A is contained in U ∪ V , x0 is in U or in V . It suffices to consider the case
where x0 is in U . The case where x0 is in V is similar.
If x0 is in U , since U is open, there is an r > 0 such that B(x0, r) ⊂
U . If {xk} is a sequence in A that converges x0, there exists a positive
integer K such that for all k ≥ K, ∥xk − x0∥ < r. Thus, for all k ≥ K,
xk ∈ B(x0, r) ⊂ U , and hence, f(xk) = 0. This proves that the sequence
{f(xk)} converges to 0, which is f(x0). Therefore, f is continuous at x0.
Now we can prove the theorem which says that a path-connected set is connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 150
Theorem 3.8
Let A be a subset of Rn. If A is path-connected, then it is connected.
Proof
We prove the contrapositive, which says that if A is not connected, then it
is not path-connected.
If A is not connected, there is a pair of open sets U and V that separate A.
By Theorem 3.7, the function f : A→ R defined by
f(x)=
0, if x ∈ A ∩ U,
1, if x ∈ A ∩ V
is continuous. Since f(A) = {0, 1} is not an interval, by the contrapositive
of the intermediate value theorem for path-connected sets, A is not path-
connected.
Theorem 3.8 provides us a large library of connected sets.
Example 3.13
The following sets are path-connected. Hence, they are also connected.
1. A set S in Rn of the form S = I1 × · · · × In, where I1, . . . , In are
intervals in R.
2. Open rectangles and closed rectangles.
3. Open balls and closed balls.
4. The n-sphere Sn.
The following theorem says that path-connectedness and connectedness are
equivalent in R.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 151
Theorem 3.9
Let S be a subset of R. Then the following are equivalent.
(a) S is an interval.
(b) S is path-connected.
(c) S is connected.
Proof
We have proved (a) ⇐⇒ (b) in the previous section. In particular, (a)
implies (b). Theorem 3.8 says that (b) implies (c). Now we only need to
prove that (c) implies (a).
Assume that (a) is not true. Namely, S is not an interval. Then there are
points u and v in S with u < v, such that there is a w ∈ (u, v) that is not
in S. Let U = (−∞, w) and V = (w,∞). Then U and V are disjoint open
subsets of R. Since w /∈ S, S ⊂ U ∪ V . Since u ∈ S ∩ U and v ∈ S ∩ V ,
S∩U and S∩V are nonempty. Hence, U and V are open sets that separate
S. This shows that S is not connected. Thus, we have proved that if (a) is
not true, then (c) is not true. This is equivalent to (c) implies (a).
Connectedness is also preserved by continuous functions.
Theorem 3.10
Let D be a connected subset of Rn. If F : D → Rm is a continuous
function, then F(D) is connected.
Proof
We prove the contra-positive. Assume that F(D) is not connected. Then
there are open sets V1 and V2 in Rm that separate F(D). Let
D1 = {x ∈ D |F(x) ∈ V1} ,
D2 = {x ∈ D |F(x) ∈ V2} .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 152
Since F(D) ∩ V1 and F(D) ∩ V2 are nonempty, D1 and D2 are nonempty.
Since F(D) ⊂ V1∪V2, D = D1∪D2. Since V1∩V2 is disjoint from F(D),
D1 and D2 are disjoint. However, D1 and D2 are not necessary open sets.
We will define two open sets U1 and U2 in Rn such that D1 = D ∩ U1 and
D2 = D ∩ U2. Then U1 and U2 are open sets that separate D.
For each x0 in D1, F(x0) ∈ V1. Since V1 is open, there exists εx0 > 0
such that the ball B(F(x0), εx0) is contained in V1. By the continuity of F
at x0, there exists δx0 > 0 such that for all x in D, if x ∈ B(x0, δx0), then
F(x) ∈ B(F(x0), εx0) ⊂ V1. In other words,
D ∩B(x0, δx0) ⊂ F−1(V1) = D1.
Notice that B(x0, δx0) is an open set. Define
U1 =
⋃
x0∈D1
B(x0, δx0).
Being a union of open sets, U1 is open. Since
D ∩ U1 =
⋃
x0∈D1
(D ∩B(x0, δx0)) ⊂ D1,
and
D1 =
⋃
x0∈D1
{x0} ⊂
⋃
x0∈D1
(D ∩B(x0, δx0)) = D ∩ U1,
we find that D ∩ U1 = D1. Similarly, define
U2 =
⋃
x0∈D2
B(x0, δx0).
Then U2 is an open set and D ∩ U2 = D2. This completes the construction
of the open sets U1 and U2 that separate D. Thus, D is not connected.
From Theorem 3.9 and Theorem 3.10, we also have an intermediate value
theorem for connected sets.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 153
Theorem 3.11 Intermediate Value Theorem for Connected Sets
Let D be a connected subset of Rn, and let f : D → R be a function defined
on D. If f is continuous, then f(D) is an interval.
Proof
By Theorem 3.10, f(D) is a connected subset of R. By Theorem 3.9, f(D)
is an interval.
Now we can prove the following.
Theorem 3.12
Let S be a subset of Rn. Then S is connected if and only if it has the
intermediate value property.
Proof
If S is connected and f : S → R is continuous, Theorem 3.11 implies that
f(S) is an interval. Hence, S has the intermediate value property.
If S is not connected, Theorem 3.7 gives a continuous function f : S → R
such that f(S) = {0, 1} is not an interval. Thus, S does not have the
intermediate value property.
To give an example of a connected set that is not path-connected, we need a
lemma.
Lemma 3.13
Let A be a subset of Rn that is separated by the open sets U and V . If C is
a connected subset of A, then C ∩ U = ∅ or C ∩ V = ∅.
Proof
Since C ⊂ A, C ⊂ U ∪V , and C is disjoint from U ∩V . If C ∩U ̸= ∅ and
C ∩ V ̸= ∅, then the open sets U and V also separate C. This contradicts
to C is connected. Thus, we must have C ∩ U = ∅ or C ∩ V = ∅.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 154
Theorem 3.14
Let A be a connected subset of Rn. If B is a subset of Rn such that
A ⊂ B ⊂ A,
then B is also connected.
Proof
If B is not connected, there exist open sets U and V in Rn that separate A.
Since A is connected, Lemma 3.13 says that A ∩ U = ∅ or A ∩ V = ∅.
Without loss of generality, assume that A ∩ V = ∅. Then A ⊂ Rn \ V .
Thus, Rn \V is a closed set that contains A. This implies that A ⊂ Rn \V .
Hence, we also have B ⊂ Rn \ V , which contradicts to the fact that the set
B ∩ V is not empty.
Example 3.14 The Topologist’s Sine Curve
Let S be the subset of R2 given by S = A ∪ L, where
A =
{
(x, y)
∣∣∣∣ 0 < x ≤ 1, y = sin
(
1
x
)}
,
and
L = {(x, y) |x = 0,−1 ≤ y ≤ 1} .
(a) Show that S ⊂ A.
(b) Show that S is connected.
(c) Show that S is not path-connected.
Solution
(a) Since A ⊂ A, it suffices to show that L ⊂ A. Given (0, u) ∈ L,
−1 ≤ u ≤ 1. Thus, a = sin−1 u ∈ [−π/2, π/2]. Let
xk =
1
a+ 2πk
for k ∈ Z+.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 155
Notice that xk ∈ (0, 1] and
sin
1
xk
= sin a = u.
Thus, {(xk, sin(1/xk))} is a sequence of points in A that converges to
(0, u). This proves that (0, u) ∈ A. Hence, L ⊂ A.
(b) The interval (0, 1] is path-connected and the function f : (0, 1] → R,
f(x) = sin
(
1
x
)
is continuous. Thus, A = Gf is path-connected, and
hence it is connected. Since A ⊂ S ⊂ A, Theorem 3.14 implies that S
is connected.
(c) If S is path connected, there is a path γ : [0, 1] → S such that γ(0) =
(0, 0) and γ(1) = (1, sin 1). Let γ(t) = (γ1(t), γ2(t)). Then γ1 :
[0, 1] → R and γ2 : [0, 1] → R are continuous functions. Consider the
sequence {xk} with
xk =
1
π
2
+ πk
, k ∈ Z+.
Notice that {xk} is a decreasing sequence of points in [0, 1] that
converges to 0. For each k ∈ Z+, (xk, yk) ∈ S if and onlly if
yk = sin(1/xk).
Since γ1 : [0, 1] → R is continuous, γ1(0) = 0 and γ1(1) = 1,
intermediate value theorem implies that there exists t1 ∈ [0, 1] such that
γ1(t1) = x1. Similarly, there exists t2 ∈ [0, t1] such that γ1(t2) = x2.
Continue the argument gives a decreasing sequence {tk} in [0, 1] such
that γ1(tk) = xk for all k ∈ Z+. Since the sequence {tk} is bounded
below, it converges to some t0 in [0, 1]. Since γ2 : [0, 1] → R is also
continuous, the sequence {γ2(tk)} should converge to γ2(t0).
Since γ(tk) ∈ S and γ1(tk) = xk, we must have γ2(tk) = yk =
(−1)k. But then the sequence {γ2(tk)} is not convergent. This gives a
contradiction. Hence, there does not exist a path in S that joins the point
(0, 0) to the point (1, sin 1). This proves that S is not path-connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 156
Figure 3.8: The topologist’s sine curve.
Remark 3.2
Example 3.14 gives a set that is connected but not path-connected.
1. One can in fact show that S = A.
2. To show that A is connected, we can also use the fact that if D is a
connected subset of Rn, and F : D → Rm is a continuous function, then
the graph of F is connected. The proof of this fact is left as an exercise.
At the end of this section, we want to give a sufficient condition for a connected
subset of Rn to be path-connected.
First we define the meaning of a polygonal path.
Definition 3.8 Polygonal Path
Let S be a subset of Rn, and let u and v be two points in S. A path
γ : [a, b] → S in S that joins u to v is a polygonal path provided that
there is a partition P = {t0, t1, . . . , tk} of [a, b] such that for 1 ≤ i ≤ k,
γ(t) = xi−1 +
t− ti−1
ti − ti−1
(xi − xi−1) , when ti−1 ≤ t ≤ ti.
Obviously, we havethe following.
Proposition 3.15
If S is a convex subset of Rn, then any two points in S can be joined by a
polygonal path in Rn.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 157
Figure 3.9: A polygonal path.
If γ1 : [a, c] → A is a polygonal path in A that joins u to w, γ2 : [c, b] → B
is a polygonal path in B that joins w to v, then the path γ : [a, b] → A ∪B,
γ(t) =
γ1(t), if a ≤ t ≤ c,
γ2(t), if c ≤ t ≤ b,
is a polygonal path in A ∪ B that joins u to v. Using this, we can prove the
following useful theorem.
Theorem 3.16
Let S be a connected subset of Rn. If S is an open set, then any two points in
S can be joined by a polygonal path in S. In particular, S is path connected.
Proof
We use proof by contradiction. Supposed that S is open but there are two
points u and v in S that cannot be joined by a polygonal path in S. Consider
the sets
U = {x ∈ S | there is a polygonal path in S that joins u to x} ,
V = {x ∈ S | there is no polygonal path in S that joins u to x} .
Obviously u is in U and v is in V , and S = U ∪ V . We claim that both U
and V are open sets.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 158
If x is in the open set S, there is an r > 0 such that B(x, r) ⊂ S. Since
B(x, r) is convex, any point w inB(x, r) can be joined by a polygonal path
in B(x, r) to x. Hence, if x is in U , w is in U . If x is in V , w is in V . This
shows that if x is in U , then B(x, r) ⊂ U . If x is in V , then B(x, r) ⊂ V .
Hence, U and V are open sets.
Since U and V are nonempty open sets and S = U ∪ V , they form a
separation of S. This contradicts to S is connected. Hence, any two points
in S can be joined by a polygonal path in S.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 159
Exercises 3.2
Question 1
Determine whether the set
A = {(x, y) | y = 0} ∪
{
(x, y)
∣∣∣∣x > 0, y =
2
x
}
is connected.
Question 2
Let D be a connected subset of Rn, and let F : D → Rm be a function
defined on D. If F : D → Rm is continuous, show that the graph of F,
GF = {(x,y) |x ∈ D,y = F(x)}
is also connected.
Question 3
Determine whether the set
A = {(x, y) | 0 ≤ x < 1,−1 < y ≤ 1} ∪ {(1, 0), (1, 1)}
is connected.
Question 4
Assume that A is a connected subset of R3 that contains the points u =
(0, 2, 0) and v = (2,−6, 3).
(a) Show that there is a point x = (x, y, z) inA that lies in the plane y = 0.
(b) Show that there exists a point x = (x, y, z) in A that lies on the sphere
x2 + y2 + z2 = 25.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 160
Question 5
Let A and B be connected subsets of Rn. If A ∩ B is nonempty, show that
S = A ∪B is connected.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 161
3.3 Sequential Compactness and Compactness
In volume I, we have seen that sequential compactness plays important role in
extreme value theorem. In this section, we extend the definition of sequential
compactness to subsets of Rn. We will also consider another concept called
compactness.
Let us start with the definition of bounded sets.
Definition 3.9 Bounded Sets
Let S be a subset of Rn. We say that S is bounded if there exists a positive
number M such that
∥x∥ ≤M for all x ∈ S.
Remark 3.3
Let S be a subset of Rn. If S is bounded and S ′ is a subset of S, then it is
obvious that S ′ is also bounded.
Example 3.15
Show that a ball B(x0, r) in Rn is bounded.
Solution
Given x ∈ B(x0, r), ∥x− x0∥ < r. Thus,
∥x∥ ≤ ∥x0∥+ ∥x− x0∥ < ∥x0∥+ r.
Since M = ∥x0∥ + r is a constant independent of the points in the ball
B(x0, r), the ball B(x0, r) is bounded.
Notice that if x1 and x2 are points in Rn, and S is a set in Rn such that
∥x− x1∥ < r1 for all x ∈ S,
then
∥x− x2∥ < r1 + ∥x2 − x1∥ for all x ∈ S.
Thus, we have the following.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 162
Proposition 3.17
Let S be a subset in Rn. The following are equivalent.
(a) S is bounded.
(b) There is a point x0 in Rn and a positive constant M such that
∥x− x0∥ ≤M for all x ∈ S.
(c) For any x0 in Rn, there is a positive constant M such that
∥x− x0∥ ≤M for all x ∈ S.
Figure 3.10: The set S is bounded.
We say that a sequence {xk} is bounded if the set {xk | k ∈ Z+} is bounded.
The following is a standard theorem about convergent sequences.
Proposition 3.18
If {xk} is a sequence in Rn that is convergent, then it is bounded.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 163
Proof
Assume that the sequence {xk} converges to the point x0. Then there is a
positive integer K such that
∥xk − x0∥ < 1 for all k ≥ K.
Let
M = max{∥xk − x0∥ | 1 ≤ k ≤ K − 1}+ 1.
Then M is finite and
∥xk − x0∥ ≤M for all k ∈ Z+.
Hence, the sequence {xk} is bounded.
Figure 3.11: A convergent sequence is bounded.
Let us now define the diameter of a bounded set. If S is a subset of Rn that is
bounded, there is a positive number M such that
∥x∥ ≤M for all x ∈ S.
It follows from triangle inequality that for any u and v in S,
∥u− v∥ ≤ ∥u∥+ ∥v∥ ≤ 2M.
Thus, the set
DS = {d(u,v) |u,v ∈ S} = {∥u− v∥ |u,v ∈ S} (3.1)
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 164
is a set of nonnegative real numbers that is bounded above. In fact, for any subset
S of Rn, one can define the set of real numbers DS by (3.1). Then S is a bounded
set if and only if the set DS is bounded above.
Definition 3.10 Diameter of a Bounded Set
Let S be a bounded subset of Rn. The diameter of S, denoted by diamS, is
defined as
diamS = sup {d(u,v) |u,v ∈ S} = sup {∥u− v∥ |u,v ∈ S} .
Example 3.16
Consider the rectangle R = [a1, b1] × · · · × [an, bn]. If u and v are two
points in R, for each 1 ≤ i ≤ n, ui, vi ∈ [ai, bi]. Thus,
|ui − vi| ≤ bi − ai.
It follows that
∥u− v∥ ≤
√
(b1 − a1)2 + · · ·+ (bn − an)2.
If u0 = a = (a1, . . . , an) and v0 = b = (b1, . . . , bn), then u0 and v0 are in
R, and
∥u0 − v0∥ =
√
(b1 − a1)2 + · · ·+ (bn − an)2.
This shows that the diameter of R is
diamR = ∥b− a∥ =
√
(b1 − a1)2 + · · ·+ (bn − an)2.
Figure 3.12: The diameter of a rectangle.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 165
Intuitively, the diameter of the open rectangle U = (a1, b1)× · · · × (an, bn) is
also equal to
d =
√
(b1 − a1)2 + · · ·+ (bn − an)2.
However, the points a = (a1, . . . , an) and b = (b1, . . . , bn) are not in U . There
does not exist two points in U whose distance is d, but there are sequences of
points {uk} and {vk} such that their distances {∥uk−vk∥} approach d as k → ∞.
We will formulate this as a more general theorem.
Theorem 3.19
Let S be a subset of Rn. If S is bounded, then its closure S is also bounded.
Moreover, diamS = diamS.
Proof
If u and v are two points in S, there exist sequences {uk} and {vk} in S
that converge respectively to u and v. Then
d(u,v) = lim
k→∞
d(uk,vk). (3.2)
For each k ∈ Z+, since uk and vk are in S,
d(uk,vk) ≤ diamS.
Eq. (3.2) implies that
d(u,v) ≤ diamS.
Since this is true for any u and v in S, S is bounded and
diamS ≤ diamS.
Since S ⊂ S, we also have diamS ≤ diamS. We conclude that diamS =
diamS.
The following example justifies that the diameter of a ball of radius r is indeed
2r.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 166
Example 3.17
Find the diameter of the open ball B(x0, r) in Rn.
Solution
By Theorem 3.19, the diameter of the open ball B(x0, r) is the same as
the diameter of its closure, the closed ball CB(x0, r). Given u and v in
CB(x0, r), ∥u− x0∥ ≤ r and ∥v − x0∥ ≤ r. Therefore,
∥u− v∥ ≤ ∥u− x0∥+ ∥v − x0∥ ≤ 2r.
This shows that diamCB(x0, r) ≤ 2r. The points u0 = x0 + re1 and
v0 = x0 − re1 are in the closed ball CB(x0, r). Since
∥u0 − v0∥ = ∥2re1∥ = 2r,
diamCB(x0, r) ≥ 2r. Therefore, the diameter of the closed ballCB(x0, r)
is exactly 2r. By Theorem 3.19, the diameter of the open ball B(x0, r) is
also 2r.
Figure 3.13: The diameter of a ball.
In volume I, we have shown that a bounded sequence in R has a convergent
subsequence.This is achieved by using the monotone convergence theorem,
which says that a bounded monotone sequence in R is convergent. For points
in Rn with n ≥ 2, we cannot apply monotone convergence theorem, as we cannot
define a simple order on the points in Rn when n ≥ 2. Nevertheless, we can use
the result of n = 1 and the componentwise convergence theorem to show that a
bounded sequence in Rn has a convergent subsequence.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 167
Theorem 3.20
Let {uk} be a sequence in Rn. If {uk} is bounded, then there is a
subsequence that is convergent.
Sketch of Proof
The n = 1 case is already established in volume I. Here we prove the n = 2
case. The n ≥ 3 case can be proved by induction using the same reasoning.
For k ∈ Z+, let uk = (xk, yk). Since
|xk| ≤ ∥uk∥ and |yk| ≤ ∥uk∥,
the sequences {xk} and {yk} are bounded sequences. Thus, there is
a subsequence {xkj}∞j=1 of {xk}∞k=1 that converges to a point x0 in R.
Consider the subsequence {ykj}∞j=1 of the sequence {yk}∞k=1. It is also
bounded. Hence, there is a subsequence {ykjl}
∞
l=1 that converges to
a point y0 in R. Notice that the subsequence {xkjl}
∞
l=1 of {xk}∞k=1
is also a subsequence of {xkj}∞j=1. Hence, it also converges to x0.
By componentwise convergence theorem, {ukjl
}∞l=1 is a subsequence of
{uk}∞k=1 that converges to (x0, y0). This proves the theorem when n = 2.
Now we study the concept of sequential compactness. It is the same as the
n = 1 case.
Definition 3.11 Sequentially Compact
Let S be a subset of Rn. We say that S is sequentially compact provided
that every sequence in S has a subsequence that converges to a point in S.
In volume I, we proved the Bolzano-Weierstrass theorem, which says that a
subset of R is sequentially compact if and only if it is closed and bounded. In fact,
the same is true for the n ≥ 2 case. Let us first look at some examples.
Example 3.18
Show that the set A = {(x, y) |x2 + y2 < 1} is not sequentially compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 168
Solution
For k ∈ Z+, let
uk =
(
k
k + 1
, 0
)
.
Then {uk} is a sequence in A that converges to the point u0 = (1, 0) that
is not in A. Thus, every subsequence of {uk} converges to the point u0,
which is not in A. This means the sequence {uk} in A does not have a
subsequence that converges to a point in A. Hence, A is not sequentially
compact.
Note that the set A in Example 3.18 is not closed.
Example 3.19
Show that the set C = {(x, y) | 1 ≤ x ≤ 3, y ≥ 0} is not sequentially
compact.
Solution
For k ∈ Z+, let uk = (2, k). Then {uk} is a sequence in C. If {ukj}∞j=1 is
a subsequence of {uk}, then k1, k2, k3, . . . is a strictly increasing sequence
of positive integers. Therefore kj ≥ j for all j ∈ Z+. It follows that
∥ukj∥ = ∥(2, kj)∥ ≥ kj ≥ j for all j ∈ Z+.
Hence, the subsequence {ukj} is not bounded. Therefore, it is not
convergent. This means that the sequence {uk} in C does not have a
convergent subsequence. Therefore, C is not sequentially compact.
Note that the set C in Example 3.19 is not bounded.
Now we prove the main theorem.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 169
Theorem 3.21 Bolzano-Weierstrass Theorem
Let S be a subset of Rn. The following are equivalent.
(a) S is closed and bounded.
(b) S is sequentially compact.
Proof
First assume that S is closed and bounded. Let {xk} be a sequence in S.
Then {xk} is also bounded. By Theorem 3.20, there is subsequence {xkj}
that converges to some x0. Since S is closed, we must have x0 is in S. This
proves that every sequence in S has a subsequence that converges to a point
in S. Hence, S is sequentially compact. This completes the proof of (a)
implies (b).
To prove that (b) implies (a), it suffices to show that if S is not closed or S
is not bounded, then S is not sequentially compact.
If S is not closed, there is a sequence {xk} in S that converges to a point
x0, but x0 is not in S. Then every subsequence of {xk} converges to the
point x0, which is not in S. Thus, {xk} is a sequence in S that does not
have any subsequence that converges to a point in S. This shows that S is
not sequentially compact.
If S is not bounded, for each positive integer k, there is a point xk in S such
that ∥xk∥ ≥ k. If {xkj}∞j=1 is a subsequence of {xk}, then k1, k2, k3, . . .
is a strictly increasing sequence of positive integers. Therefore kj ≥ j for
all j ∈ Z+. It follows that ∥xkj∥ ≥ kj ≥ j for al j ∈ Z+. Hence, the
subsequence {xkj} is not bounded. Therefore, it is not convergent. This
means that the sequence {xk} in S does not have a convergent subsequence.
Therefore, S is not sequentially compact.
Corollary 3.22
A closed rectangle R = [a1, b1] × · · · × [an, bn] in Rn is sequentially
compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 170
Proof
We have shown in Chapter 1 that R is closed. Example 3.16 shows that R
is bounded. Thus, R is sequentially compact.
An interesting consequence of Theorem 3.19 is the following.
Corollary 3.23
If S be a bounded subset of Rn, then its closure S is sequentially compact.
Example 3.20
Determine whether the following subsets of R3 is sequentially compact.
(a) A = {(x, y, z) |xyz = 1}.
(b) B = {(x, y, z) |x2 + 4y2 + 9z2 ≤ 36}.
(c) C = {(x, y, z) | 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, 0 < xyz ≤ 4}.
Solution
(a) For any k ∈ Z+, let
uk =
(
k,
1
k
, 1
)
.
Then {uk} is a sequence in A, and ∥uk∥ ≥ k. Therefore, A is not
bounded. Hence, A is not sequentially compact.
(b) For any u = (x, y, z) ∈ B,
∥u∥2 = x2 + y2 + z2 ≤ x2 + 4y2 + 9z2 ≤ 36.
Hence, B is bounded. The function f : R3 → R, f(x, y, z) = x2 +
4y2 + 9z2 is a polynomial. Hence, it is continuous. Since the set I =
(−∞, 36] is closed in R, and B = f−1(I), B is closed in R3. Since B
is closed and bounded, it is sequentially compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 171
(c) For any k ∈ Z+, let
uk =
(
1, 1,
1
k
)
.
Then {uk} is a sequence of points in C that converges to the point
u0 = (1, 1, 0), which is not in C. Thus, C is not closed, and so C is not
sequentially compact.
The following theorem asserts that continuous functions preserve sequential
compctness.
Theorem 3.24
Let D be a sequentially compact subset of Rn. If the function F : D → Rm
is continuous, then F(D) is a sequentially compact subset of Rm.
The proof of this theorem is identical to the n = 1 case.
Proof
Let {yk} be a sequence in F(D). For each k ∈ Z+, there exists xk ∈
D such that F(xk) = yk. Since D is sequentially compact, there is a
subsequence {xkj} of {xk} that converges to a point x0 in D. Since F is
continuous, the sequence {F(xkj)} converges to F(x0). Note that F(x0)
is in F(D). In other words, {ykj} is a subsequence of the sequence {yk}
that converges to F(x0) in F(D). This shows that every sequence in F(D)
has a subsequence that converges to a point in F(D). Thus, F(D) is a
sequentially compact subset of Rm.
We are going to discuss important consequences of Theorem 3.24 in the coming
section. For the rest of this section, we introduce the concept of compactness,
which plays a central role in modern analysis. We start with the definition of an
open covering.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 172
Definition 3.12 Open Covering
Let S be a subset of Rn, and let A = {Uα |α ∈ J} be a collection of open
sets in Rn indexed by the set J . We say that A is an open covering of S
provided that
S ⊂
⋃
α∈J
Uα.
Example 3.21
For each k ∈ Z+, let Uk = (1/k, 1). Then Uk is an open set in R and
∞⋃
k=1
Uk = (0, 1).
Hence, A = {Uk | k ∈ Z+} is an open covering of the set S = (0, 1).
Remark 3.4
If A = {Uα |α ∈ J} is an open covering of S and S ′ is a subset of S, then
A = {Uα |α ∈ J} is also an open covering of S ′.
Example 3.22
For each k ∈ Z+, let Uk = B(0, k) be the ball in Rn centered at the origin
and having radius k. Then
∞⋃
k=1
Uk = Rn.
Thus, A = {Uk | k ∈ Z+} is an open covering of any subset S ofRn.
Definition 3.13 Subcover
Let S be a subset of Rn, and let A = {Uα |α ∈ J} be an open covering
of S. A subcover is a subcollection of A which is also a covering of S. A
finite subcover is a subcover that contains only finitely many elements.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 173
Example 3.23
For each k ∈ Z, let Uk = (k, k + 2). Then
∞⋃
k=−∞
Uk = R. Thus, A =
{Uk | k ∈ Z} is an open covering of the set S = [−3, 4). There is a finite
subcover of S given by
B = {U−4, U−3, U−2, U−1, U0, U1, U2}.
Definition 3.14 Compact Sets
Let S be a subset of Rn. We say that S is compact provided that every open
covering of S has a finite subcover. Namely, if A = {Uα |α ∈ J} is an
open covering of S, then there exist α1, . . . , αk ∈ J such that
S ⊂
k⋃
j=1
Uαj
.
Example 3.24
The subset S = (0, 1) of R is not compact. For k ∈ Z+, let Uk = (1/k, 1).
Example 3.21 says that A = {Uk | k ∈ Z+} is an open covering of the set
S. We claim that there is no finite subcollection of A that covers S.
Assume to the contrary that there exists a finite subcollection of A that
covers S. Then there are positive integers k1, . . . , km such that
(0, 1) ⊂
m⋃
j=1
Ukj =
m⋃
j=1
(
1
kj
, 1
)
.
Notice that if ki ≤ kj , then Uki ⊂ Ukj . Thus, if K = max{k1, . . . , km},
then
m⋃
j=1
Ukj = UK =
(
1
K
, 1
)
,
and so S = (0, 1) is not contained inUK . This gives a contradiction. Hence,
S is not compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 174
Example 3.25
As a subset of itself, Rn is not compact. For k ∈ Z+, let Uk = B(0, k) be
the ball in Rn centered at the origin and having radius k. Example 3.22 says
that A = {Uk | k ∈ Z+} is an open covering of Rn. We claim that there is
no finite subcover.
Assume to the contrary that there is a finite subcover. Then there exist
positive integers k1, . . . , km such that
Rn =
m⋃
j=1
Uk.
Notice that if ki ≤ kj , then Uki ⊂ Ukj . Thus, if K = max{k1, . . . , km},
then
m⋃
j=1
Ukj = UK = B(0, K).
Obviously, B(0, K) is not equal to Rn. This gives a contradiction. Hence,
Rn is not compact.
Our goal is to prove the Heine-Borel theorem, which says that a subset of Rn is
compact if and only if it is closed and bounded. We first prove the easier direction.
Theorem 3.25
Let S be a subset of Rn. If S is compact, then it is closed and bounded.
Proof
We show that if S is compact, then it is bounded; and if S is compact, then
it is closed.
First we prove that if S is compact, then it is bounded. For k ∈ Z+, let
Uk = B(0, k) be the ball in Rn centered at the origin and having radius
k. Example 3.22 says that A = {Uk | k ∈ Z+} is an open covering of S.
Since S is compact, there exist positive integers k1, . . . , km such that
S ⊂
m⋃
j=1
Ukj = UK = B(0, K),
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 175
where K = max{k1, . . . , km}. This shows that
∥x∥ ≤ K for all x ∈ S.
Hence, S is bounded.
Now we prove that if S is compact, then it is closed. For this, it suffices
to show that S ⊂ S, or equivalently, any point that is not in S is not in S.
Assume that x0 is not in S. For each k ∈ Z+, let
Vk = extB(x0, 1/k) =
{
x ∈ Rn
∣∣∣∣ ∥x− x0∥ >
1
k
}
.
Then Vk is open in Rn. If x is a point in Rn and x ̸= x0, then r = ∥x −
x0∥ > 0. There is a k ∈ Z+ such that 1/k < r. Then x is in Vk. This shows
that
∞⋃
k=1
Vk = Rn \ {x0}.
Therefore, A = {Vk | k ∈ Z+} is an open covering of S. Since S is
compact, there is a finite subcover. Namely, there exist positive integers
k1, . . . , km such that
S ⊂
m⋃
j=1
Vkj = VK ,
where K = max{k1, . . . , km}. Since B(x0, 1/K) is disjoint from VK , it
does not contain any point of S. This shows that x0 is not in S, and thus
the proof is completed.
Example 3.26
The set
A = {(x, y, z) |xyz = 1}
in Example 3.20 is not compact because it is not bounded. The set
C = {(x, y, z) | 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, 0 < xyz ≤ 4}
is not compact because it is not closed.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 176
We are now left to show that a closed and bounded subset of Rn is compact.
We start by proving a special case.
Theorem 3.26
A closed rectangle R = [a1, b1]× · · · × [an, bn] in Rn is compact.
Proof
We will prove by contradiction. Assume that R is not compact, and we
show that this will lead to a contradiction. The idea is to use the bisection
method.
If R is not compact, there is an open covering A = {Uα |α ∈ J} of R
which does not have a finite subcover.
Let R1 = R, and let d1 = diamR1. For 1 ≤ i ≤ n, let ai,1 = ai and bi,1 =
bi, and letmi,1 to be the midpoint of the interval [ai,1, bi,1]. The hyperplanes
xi = mi,1, 1 ≤ i ≤ n, divides the rectangle R1 into 2n subrectangles.
Notice that A is also an open covering of each of these subrectangles. If
each of these subrectangles can be covered by a finite subcollection of open
sets in A , then R also can be covered by a finite subcollection of open sets
in A . Since we assume R cannot be covered by any finite subcollection of
open sets in A , there is at least one of the 2n subrectangles which cannot
be covered by any finite subcollection of open sets in A . Choose one of
these, and denote it by R2.
Define ai,2, bi,2 for 1 ≤ i ≤ n so that
R2 = [a1,2, b1,2]× · · · × [an,2, bn,2].
Note that
bi,2 − ai,2 =
bi,1 − ai,1
2
for 1 ≤ i ≤ n.
Therefore, d2 = diamR2 = d1/2.
We continue this bisection process to obtain the rectangles R1, R2, · · · , so
that Rk+1 ⊂ Rk for all k ∈ Z+, and Rk cannot be covered by any finite
subcollections of A .
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 177
Figure 3.14: Bisection method.
Define ai,k, bi,k for 1 ≤ i ≤ n so that
Rk = [a1,k, b1,k]× · · · × [an,k, bn,k].
Then for all k ∈ Z+,
bi,k+1 − ai,k+1 =
bi,k − ai,k
2
for 1 ≤ i ≤ n.
It follows that dk+1 = diamRk+1 = dk/2.
For any 1 ≤ i ≤ n, {ai,k}∞k=1 is an increasing sequence that is bounded
above by bi, and {bi,k}∞k=1 is a decreasing sequence that is bounded
below by ai. By monotone convergence theorem, the sequence {ai,k}∞k=1
converges to ai,0 = sup
k∈Z+
ai,k; while the sequence {bi,k}∞k=1 converges to
bi,0 = inf
k∈Z+
bi,k. Since
bi,k − ai,k =
bi − ai
2k−1
for all k ∈ Z+,
we find that ai,0 = bi,0. Let ci = ai,0 = bi,0. Then ai,k ≤ ci ≤ bi,k for all
1 ≤ i ≤ n and all k ∈ Z+. Thus, c = (c1, . . . , cn) is a point in Rk for all
k ∈ Z+. By assumption that A is an open covering of R = R1, there exists
β ∈ J such that c ∈ Uβ . Since Uβ is an open set, there is an r > 0 such
that B(c, r) ⊂ Uβ . Since
dk = diamRk =
d1
2k−1
for all k ∈ Z+,
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 178
we find that lim
k→∞
dk = 0. Hence, there is a positive integer K such that
dK < r. If x ∈ RK , then
∥x− c∥ ≤ diamRK = dK < r.
This implies that x is in B(c, r). Thus, we have shown that RK ⊂
B(c, r). Therefore, RK is contained in the single element Uβ of A , which
contradicts to RK cannot be covered by any finite subcollection of open
sets in A .
We conclude that R must be compact.
Now we can prove the Heine-Borel theorem.
Theorem 3.27 Heine-Borel Theorem
Let S be a subset of Rn. Then S is compact if and only if it is closed and
bounded.
Proof
We have shown in Theorem 3.25 that if S is compact, then it must be closed
and bounded.
Now assume that S is closed and bounded, and let A = {Uα |α ∈ J} be
an open covering of S. Since S is bounded, there exists a positive number
M such that
∥x∥ ≤M for all x ∈ S.
Thus, if x = (x1, . . . , xn) is in S, then for all 1 ≤ i ≤ n, |xi| ≤ ∥x∥ ≤ M .
This implies that S is contained in the closed rectangle
R = [−M,M ]× · · · × [−M,M ].
Let V = Rn\S. Since S is closed, V is an open set. Then Ã = A ∪{V } is
an open covering of Rn, and hence it is an open covering ofR. By Theorem
3.26, R is compact. Thus, there exists B̃ ⊂ Ã which is a finite subcover
of R. Then B = B̃ \{V } is a finite subcollection of A that covers S. This
proves that S is compact.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 179Example 3.27
We have shown in Example 3.20 that the set
B =
{
(x, y, z) |x2 + 4y2 + 9z2 ≤ 36
}
is closed and bounded. Hence, it is compact.
Now we can conclude our main theorem from the Bolzano-Weierstrass theorem
and the Heine-Borel theorem.
Theorem 3.28
Let S be a subset of Rn. Then the following are equivalent.
(a) S is sequentially compact.
(b) S is closed and bounded.
(c) S is compact.
Remark 3.5
Henceforth, when we say a subset S of Rn is compact, we mean it is a
closed and bounded set, and it is sequentially compact. By Theorem 3.19,
a subset S of Rn has compact closure if and only if it is a bounded set.
Finally, we can conclude the following, which says that continuous functions
preserve compactness.
Theorem 3.29
Let D be a compact subset of Rn. If the function F : D → Rm is
continuous, then F(D) is a compact subset of Rm.
Proof
Since D is compact, it is sequentially compact. By Theorem 3.24, F(D) is
a sequentially compact subset of Rm. Hence, F(D) is a compact subset of
Rm.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 180
Exercises 3.3
Question 1
Determine whether the following subsets of R2 is sequentially compact.
(a) A = {(x, y) |x2 + y2 = 9}.
(b) B = {(x, y) | 0 < x2 + 4y2 ≤ 36}.
(c) C = {(x, y) |x ≥ 0, 0 ≤ y ≤ x2}.
Question 2
Determine whether the following subsets of R3 is compact.
(a) A = {(x, y, z) | 1 ≤ x ≤ 2}.
(b) B = {(x, y, z) | |x|+ |y|+ |z| ≤ 10}.
(c) C = {(x, y, z) | 4 ≤ x2 + y2 + z2 ≤ 9}.
Question 3
Given that A is a compact subset of Rn and B is a subset of A, show that
B is compact if and only if it is closed.
Question 4
If S1, . . . , Sk are compact subsets of Rn, show that S = S1 ∪ · · · ∪ Sn is
also compact.
Question 5
If A is a compact subset of Rm, B is a compact subset of Rn, show that
A×B is a compact subset of Rm+n.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 181
3.4 Applications of Compactness
In this section, we consider the applications of compactness. We are going to use
repeatedly the fact that a subset S of Rn is compact if and only if it is closed and
bounded, if and only if it is sequentially compact.
3.4.1 The Extreme Value Theorem
First we define bounded functions.
Definition 3.15 Bounded Functions
Let D be a subset of Rn, and let F : D → Rm be a function defined on D.
We say that the function F is bounded if the set F(D) is a bounded subset
of Rm. In other words, the function F : D → Rm is bounded if there is
positive number M such that
∥F(x)∥ ≤M for all x ∈ D.
Example 3.28
Let D = {(x, y, z) | 0 < x2 + y2 + z2 < 4}, and let F : D → R2 be the
function defined as
F(x, y, z) =
(
1
x2 + y2 + z2
, x+ y + z
)
.
For k ∈ Z+, the point uk = (1/k, 0, 0) is in D and
F(uk) =
(
k2,
1
k
)
.
Thus, ∥F(uk)∥ ≥ k2. This shows that F is not bounded, even though D is
a bounded set.
Theorem 3.24 gives the following.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 182
Theorem 3.30
Let D be a compact subset of Rn. If the function F : D → Rm is
continuous, then it is bounded.
Proof
By Theorem 3.29, F(D) is compact. Hence, it is bounded.
Example 3.29
Let D = {(x, y, z) | 1 < x2 + y2 + z2 < 4}, and let F : D → R2 be the
function defined as
F(x, y, z) =
(
1
x2 + y2 + z2
, x+ y + z
)
.
Show that F : D → R2 is a bounded function.
Solution
Notice that the set D is not closed. Therefore, we cannot apply Theorem
3.30 directly. Consider the set U = {(x, y, z) | 1 ≤ x2 + y2 + z2 ≤ 4}. For
any u = (x, y, z) in U , ∥u∥ ≤ 2. Hence, U is bounded. The function
f : R3 → R defined as f(x, y, z) = x2 + y2 + z2 is continuous, and
U = f−1([1, 4]). Since [1, 4] is closed in R, U is closed in R3. Since
f(x, y, z) ̸= 0 on U ,
F1(x, y, z) =
1
x2 + y2 + z2
is continuous on U . Being a polynomial function, F2(x, y, z) = x + y + z
is continuous. Thus, F : U → R2 is continuous. Since U is closed and
bounded, Theorem 3.30 implies that F : U → R2 is bounded. Since D ⊂
U , F : D → R2 is also a bounded function.
Recall that if S is a subset of R, S has maximum value if and only if S is
bounded above and supS is in S; while S has minimum value if and only if S is
bounded below and inf S is in S.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 183
Definition 3.16 Extremizer and Extreme Values
Let D be a subset of Rn, and let f : D → R be a function defined on D.
1. The function f has maximum value if there is a point x0 in D such that
f(x0) ≥ f(x) for all x ∈ D.
The point x0 is called a maximizer of f ; and f(x0) is the maximum
value of f .
2. The function f has minimum value if there is a point x0 in D such that
f(x0) ≤ f(x) for all x ∈ D.
The point x0 is called a minimizer of f ; and f(x0) is the minimum value
of f .
We have proved in volume I that a sequentially compact subset of R has a
maximum value and a minimum value. This gives us the extreme value theorem.
Theorem 3.31 Extreme Value Theorem
Let D be a compact subset of Rn. If the function f : D → R is continuous,
then it has a maximum value and a minimum value.
Proof
By Theorem 3.24, f(D) is a sequentially compact subset of R. Therefore,
f has a maximum value and a minimum value.
Example 3.30
Let D = {(x, y) |x2 + 2x+ y2 ≤ 3}, and let f : D → R be the function
defined by
f(x, y) = x2 + xy3 + ex−y.
Show that f has a maximum value and a minimum value.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 184
Solution
Notice that
D =
{
(x, y) |x2 + 2x+ y2 ≤ 3
}
=
{
(x, y) | (x+ 1)2 + y2 ≤ 4
}
is a closed ball. Thus, it is closed and bounded. The function f1(x, y) =
x2+xy3 and the function g(x, y) = x−y are polynomial functions. Hence,
they are continuous. The exponential function h(x) = ex is continuous.
Hence, the function f2(x, y) = (h ◦ g)(x, y) = ex−y is continuous. Since
f = f1 + f2, the function f : D → R is continuous. Since D is compact,
the function f : D → R has a maximum value and a minimum value.
Remark 3.6 Extreme Value Property
Let S be a subset of Rn. We say that S has extreme value property provided
that whenever f : S → R is a continuous function, then f has maximum
and minimum values.
The extreme value theorem says that if S is compact, then it has extreme
value property. Now let us show the converse. Namely, if S has extreme
value property, then it is compact, or equivalently, it is closed and bounded.
If S is not bounded, the function f : S → R, f(x) = ∥x∥ is continuous,
but it does not have maximum value. If S is not closed, there is a sequence
{xk} in S that converges to a point x0 that is not in S. The function g :
S → R, g(x) = ∥x− x0∥ is continuous and g(x) ≥ 0 for all x ∈ S. Since
lim
k→∞
g(xk) = 0, we find that inf g(S) = 0. Since x0 is not in S, there is no
point x in S such that g(x) = 0. Hence, g does not have minimum value.
This shows that for S to have extreme value property, it is necessary that S
is closed and bounded.
Therefore, a subset S of Rn has extreme value property if and only if it is
compact.
3.4.2 Distance Between Sets
The distance between two sets is defined in the following way.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 185
Definition 3.17 Distance Between Two Sets
LetA andB be two subsets of Rn. The distance betweenA andB is defined
as
d(A,B) = inf {d(a,b) | a ∈ A,b ∈ B} .
The distance between two sets is always well-defined and nonnegative. If A
and B are not disjoint, then their distance is 0.
Example 3.31
Let A = {(x, y) |x2 + y2 < 1} and let B = [1, 3] × [−1, 1]. Find the
distance between the two sets A and B.
Solution
For k ∈ Z+, let ak be the point in A given by
ak =
(
1− 1
k
, 0
)
.
Let b = (1, 0). Then b is in B. Notice that
d(ak,b) = ∥ak − b∥ =
1
k
.
Hence, d(A,B) ≤ 1
k
for all k ∈ Z+. This shows that the distance between
A and B is 0.
Figure 3.15: The sets A and B in Example 3.31.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 186
In Example 3.31, we find that the distance between two disjoint sets can be 0,
even thoughthey are both bounded.
Example 3.32
Let A = {(x, y) | y = 0} and let B = {(x, y) |xy = 1}. Find the distance
between the two sets A and B.
Solution
For k ∈ Z+, let ak = (k, 0) and bk = (k, 1/k). Then ak is in A and bk is
in B. Notice that
d(ak,bk) = ∥ak − bk∥ =
1
k
.
Hence, d(A,B) ≤ 1
k
for all k ∈ Z+. This shows that the distance between
A and B is 0.
Figure 3.16: The sets A and B in Example 3.32.
In Example 3.32, we find that the distance between two disjoint sets can be 0,
even though both of them are closed.
When B is the one-point set {x0}, the distance between A and B is the
distance from the point x0 to the set A. We denote this distance as d(x0, A).
In other words,
d(x0, A) = inf {d(a,x0) | a ∈ A} .
If x0 is a point in A, then d(x0, A) = 0. However, the distance from a point x0 to
a set A can be 0 even though x0 is not in A. For example, the distance between
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 187
the point x0 = (1, 0) and the set A = {(x, y) |x2 + y2 < 1} is 0, even thought x0
is not in A. The following proposition says that this cannot happen if A is closed.
Proposition 3.32
LetA be a closed subset of Rn and let x0 be a point in Rn. Then d(x0, A) =
0 if and only if x0 is in A.
Proof
If x0 is in A, it is obvious that d(x0, A) = 0.
Conversely, if x0 is not in A, x0 is in the open set Rn \ A. Therefore,
there is an r > 0 such that B(x0, r) ⊂ Rn \ A. For any a ∈ A, a /∈
B(x0, r). Therefore, ∥x0 − a∥ ≥ r. Taking infimum over a ∈ A, we find
that d(x0, A) ≥ r. Hence, d(x0, A) ̸= 0.
Figure 3.17: A point outside a closed set has positive distance from the set.
Proposition 3.33
Given a subset A of Rn, define the function f : Rn → R by
f(x) = d(x, A).
Then f is a continuous function.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 188
Proof
We prove something stronger. For any u and v in Rn, we claim that
|f(u)− f(v)| ≤ ∥u− v∥.
This means that f is a Lipschitz function with Lipschitz constant 1, which
implies that it is continuous.
Given u and v in Rn, if a is in A, then
d(u, A) ≤ ∥u− a∥ ≤ ∥v − a∥+ ∥u− v∥.
This shows that
∥v − a∥ ≥ d(u, A)− ∥u− v∥.
Taking infimum over a ∈ A, we find that
d(v, A) ≥ d(u, A)− ∥u− v∥.
Therefore,
f(u)− f(v) ≤ ∥u− v∥.
Interchanging u and v, we obtain
f(v)− f(u) ≤ ∥u− v∥.
This proves that
|f(u)− f(v)| ≤ ∥u− v∥.
Now we can prove the following.
Theorem 3.34
Let A and C be disjoint subsets of Rn. If A is compact and C is closed,
then the distance between A and C is positive.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 189
Proof
By Proposition 3.33, the function f : A → R, f(a) = d(a, C) is
continuous. Since A is compact, f has a minimum value. Namely, there is
a point a0 in A such that
d(a0, C) ≤ d(a, C) for all a ∈ A.
For any a in A and c ∈ C,
d(a, c) ≥ d(a, C) ≥ d(a0, C).
Taking infimum over all a ∈ A and c ∈ C, we find that
d(A,C) ≥ d(a0, C).
By definition, we also have d(A,C) ≤ d(a0, C). Thus, d(A,C) =
d(a0, C). Since A and C are disjoint and C is closed, Proposition 3.32
implies that d(A,C) = d(a0, C) > 0.
An equivalent form of Theorem 3.34 is the following important theorem.
Theorem 3.35
Let A be a compact subset of Rn, and let U be an open subset of Rn that
contains A. Then there is a positive number δ such that if x is a point in Rn
that has a distance less than δ from the set A, then x is in U .
Figure 3.18: A compact set has a positive distance from the boundary of the open
set that contains it.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 190
Proof
Let C = Rn \ U . Then C is a closed subset of Rn that is disjoint from A.
By Theorem 3.34, δ = d(A,C) > 0. If x is in Rn and d(x, A) < δ, then x
cannot be in C. Therefore, x is in U .
As a corollary, we have the following.
Corollary 3.36
Let A be a compact subset of Rn, and let U be an open subset of Rn that
contains A. Then there is a positive number r and a compact set K such
that A ⊂ K ⊂ U , and if x is a point in Rn that has a distance less than r
from the set A, then x is in K.
Proof
By Theorem 3.35, there is a positive number δ such that if x is a point in Rn
that has a distance less than δ from the set A, then x is in U . Take r = δ/2,
and let
K = V , where V =
⋃
u∈A
B(u, r).
Since A is compact, it is bounded. There is a positive number M such that
∥u∥ ≤ M for all u ∈ A. If x ∈ V , then there is an u ∈ A such that
∥x − u∥ < r. This implies that ∥x∥ ≤ M + r. Hence, the set V is also
bounded. Since K is the closure of a bounded set, K is compact. Since
A ⊂ V , A ⊂ K. If w ∈ K, since K is the closure of V , there is a point
v in V that lies in B(w, r). By the definition of V , there is a point u in A
such that v ∈ B(u, r). Thus,
∥w − u∥ ≤ ∥w − v∥+ ∥v − u∥ < r + r = δ.
This implies that w has a distance less than δ from A. Hence, w is in U .
This shows that K ⊂ U .
Now if x is a point that has distance d less than r from the set A, there is a
point u is A such that ∥x− u∥ < r. This implies that x ∈ B(u, r) ∈ V ⊂
K.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 191
3.4.3 Uniform Continuity
In Section 2.4, we have discussed uniform continuity. Let D be a subset of Rn
and let F : D → Rm be a function defined on D. We say that F : D → Rm is
uniformly continuous provided that for any ε > 0, there exists δ > 0 such that for
any points u and v in D, if ∥u− v∥ < δ, then ∥F(u)− F(v)∥ < ε. If a function
is uniformly continuous, it is continuous. The converse is not true. However,
a continuous function that is defined on a compact subset of Rn is uniformly
continuous. This is an important theorem in analysis.
Theorem 3.37
Let D be a subset of Rn, and let F : D → Rm be a continuous function
defined on D. If D is compact, then F : D → Rm is uniformly continuous.
Proof
Assume to the contrary that F : D → Rm is not uniformly continuous.
Then there exists an ε > 0, for any δ > 0, there exist points u and v in
D such that ∥u − v∥ < δ and ∥F(u) − F(v)∥ ≥ ε. This implies that
for any k ∈ Z+, there exist uk and vk in D such that ∥uk − vk∥ < 1/k
and ∥F(uk) − F(vk)∥ ≥ ε. Since D is sequentially compact, there is a
subsequence {ukj} of {uk} that converges to a point u0 in D. Consider the
sequence {vkj} in D. It has a subsequence {vkjl
} that converges to a point
v0 in D. Being a subsequence of {ukj}, the sequence {ukjl
} also converges
to u0.
Since F : D → Rm is continuous, the sequences {F(ukjl
)} and {F(vkjl
)}
converge to F(u0) and F(v0) respectively. Notice that by construction,
∥F(ukjl
)− F(vkjl
)∥ ≥ ε for all l ∈ Z+.
Thus, ∥F(u0) − F(v0)∥ ≥ ε. This implies that F(u0) ̸= F(v0), and so
u0 ̸= v0.
Since kj1 , kj2 , . . . is a strictly increasing sequence of positive integers, kjl ≥
l. Thus,
∥ukjl
− vkjl
∥ < 1
kjl
≤ 1
l
.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 192
Taking l → ∞ implies that u0 = v0. This gives a contradiction. Thus,
F : D → Rm must be uniformly continuous.
Example 3.33
Let D = (−1, 4)× (−7, 5] and let F : D → R3 be the function defined as
F(x, y) =
(
sin(x+ y),
√
x+ y + 8, exy
)
.
Show that F is uniformly continuous.
Solution
Let U = [−1, 4] × [−7, 5]. Then U is a closed and bounded subset of R2
that contains D. The functions f1(x, y) = x+ y, f2(x, y) = x+ y + 8 and
f3(x, y) = xy are polynomial functions. Hence, they are continuous. If
(x, y) ∈ U , x ≥ −1, y ≥ −7 and so f2(x, y) = x+ y+8 ≥ 0. Thus, f2(U)
is contained in the domain of the square root function. Since the square
root function, the sine function and the exponential function are continuous
on their domains, we find that the functions
F1(x, y) = sin(x+ y), F2(x, y) =
√
x+ y + 8, F3(x, y) = exy
are continuous on U . Since U is closed and bounded, F : U → R3 is
uniformly continuous. Since D ⊂ U , F : D → R3 is uniformly continuous.
3.4.4 Linear Transformations and Quadratic Forms
In Chapter 2, we have seen that a linear transformation T : Rn → Rm is a matrix
transformation. Namely, there exists an m× n matrixsuch that
T(x) = Ax for all x ∈ Rn.
A linear transformation is continuous. Theorem 2.34 says that a linear transformation
is Lipschitz. More precisely, there exists a positive constant c > 0 such that
∥T(x)∥ ≤ c∥x∥ for all x ∈ Rn.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 193
Theorem 2.5 says that when m = n, a linear transformation T : Rn → Rn is
invertible if and only if it is one-to-one, if and only if the matrix A is invertible,
if and only if detA ̸= 0. Here we want to give a stronger characterization of a
linear transformation T : Rn → Rn that is invertible.
Recall that to show that a linear transformation T : Rn → Rm is one-to-one,
it is sufficient to show that T(x) = 0 implies that x = 0.
Theorem 3.38
Let T : Rn → Rn be a linear transformation. The following are equivalent.
(a) T is invertible.
(b) There is a positive constant a such that
∥T(x)∥ ≥ a∥x∥ for all x ∈ Rn.
Proof
(b) implies (a) is easy. Notice that (b) says that
∥x∥ ≤ 1
a
∥T(x)∥ for all x ∈ Rn. (3.3)
If T(x) = 0, then ∥T(x)∥ = 0. Eq. (3.3) implies that ∥x∥ = 0. Thus,
x = 0. This proves that T is one-to-one. Hence, it is invertible.
Conversely, assume that T : Rn → Rn is invertible. Let
Sn−1 =
{
(x1, . . . , xn) |x21 + · · ·+ x2n = 1
}
be the standard unit (n − 1)-sphere in Rn. We have seen that Sn−1 is
compact. For any u ∈ Sn−1, u ̸= 0. Therefore, T(u) ̸= 0 and so
∥T(u)∥ > 0. The function f : Sn−1 → Rn, f(u) = ∥T(u)∥ is continuous.
Hence, it has a mimimum value at some u0 on Sn−1. Let a = ∥T(u0)∥.
Then a > 0. Since a is the minimum value of f ,
∥T(u)∥ ≥ a for all u ∈ Sn−1.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 194
Notice that if x = 0, ∥T(x)∥ ≥ a∥x∥ holds trivially. If x is in Rn and
x ̸= 0, let u = αx, where α = 1/∥x∥. Then u is in Sn−1. Therefore,
∥T(u)∥ ≥ a. Since T(u) = αT(x), and α > 0, we find that ∥T(u)∥ =
α∥T(x)∥. Hence, α∥T(x)∥ ≥ a. This gives
∥T(x)∥ ≥ a
α
= a∥x∥.
In Section 2.1.5, we have reviewed some theories of quadratic forms from
linear algebra. In Theorem 2.7, we state for a quadratic form QA : Rn → R,
QA(x) = xTAx defined by the symmetric matrix A, we have
λn∥x∥2 ≤ QA(x) ≤ λ1∥x∥2 for all x ∈ Rn.
Here λn is the smallest eigenvalue of A, and λ1 is the largest eigenvalue of A.
We have used Theorem 2.7 to prove that a linear transformation is Lipschitz
in Theorem 2.34. It boils down to the fact that if T(x) = Ax, then ∥T(x)∥2 =
xT (ATA)x, and ATA is a positive semi-definite quadractic form. In fact, we can
also use Theorem 2.7 to prove Theorem 3.38, using the fact that if A is invertible,
then ATA is positive definite.
Let us prove a weaker version of Theorem 2.7 here, which is sufficient to
establish Theorem 3.38 and the theorem which says that a linear transformation is
Lipschitz.
Theorem 3.39
Let A be an n×n symmetric matrix, and let QA : Rn → R be the quadratic
form QA(x) = xTAx defined by A. There exists constants a and b such
that
a∥x∥2 ≤ QA(x) ≤ b∥x∥2 for all x ∈ Rn,
QA(u) = a∥u∥2 and QA(v) = b∥v∥2 for some u and v in Rn. Therefore,
(i) if A is positive semi-definite, b ≥ a ≥ 0;
(ii) if A is positive definite, b ≥ a > 0.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 195
Proof
As in the proof of Theorem 3.38, consider the continuous function QA :
Sn−1 → R. Since Sn−1 is compact, there exsits u and v in Sn−1 such that
QA(u) ≤ QA(w) ≤ QA(v) for all w ∈ Sn−1.
Let a = QA(u) and b = QA(v). If x = 0, a∥x∥2 ≤ QA(x) ≤ b∥x∥2 holds
trivially. Now if x is in Rn and x ̸= 0, let w = αx, where α = 1/∥x∥.
Then w in on Sn−1. Notice that
QA(w) = α2QA(x).
Hence,
a ≤ 1
∥x∥2
QA(x) ≤ b.
This proves that
a∥x∥2 ≤ QA(x) ≤ b∥x∥2.
3.4.5 Lebesgue Number Lemma
Now let us prove the following important theorem.
Theorem 3.40 Lebesgue Number Lemma
Let A be a subset of Rn, and let A = {Uα |α ∈ J} be an open covering
of A. If A is compact, there exists a positive number δ such that if S is a
subset of A and diamS < δ, then S is contained in one of the elements of
A . Such a positive number δ is called the Lebesgue number of the covering
A .
We give two proofs of this theorem.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 196
First Proof of the Lebesgue Number Lemma
We use proof by contradiction. Assume that there does not exist a positive
number δ such that any subset S of A that has diameter less than δ lies
inside an open set in A . Then for any k ∈ Z+, there is a subset Sk of A
whose diameter is less than 1/k, but Sk is not contained in any element of
A .
For each k ∈ Z+, the set Sk cannot be empty. Let xk be any point in Sk.
Then {xk} is a sequence of points in A. Since A is sequentially compact,
there is a subsequence {xkm} that converges to a point x0 in A. Since A
is an open covering of A, there exists β ∈ J such that x0 ∈ Uβ . Since Uβ
is open, there exists r > 0 such that B(x0, r) ⊂ Uβ . Since the sequence
{xkm} converges x0, there is a positive integer M such that for all m ≥M ,
xkm ∈ B(x0, r/2). There exists an integer j ≥M such that 1/kj < r/2. If
x ∈ Akj , then
∥x− xkj∥ ≤ diamAkj <
1
kj
<
r
2
.
Since xkj ∈ B(x0, r/2), ∥xkj −x0∥ < r/2. Therefore, ∥x−x0∥ < r. This
proves that x ∈ B(x0, r) ⊂ Uβ . Thus, we have shown that Akj ⊂ Uβ . But
this contradicts to Akj does not lie in any element of A .
Second Proof of the Lebesgue Number Lemma
Since A is compact, there are finitely many indices α1, . . . , αm in J such
that
A ⊂
m⋃
j=1
Uαj
.
For 1 ≤ j ≤ m, let Cj = Rn \ Uαj
. Then Cj is a closed set and
m⋂
j=1
Cj
is disjoint from A. By Theorem 3.33, the function fj : A → R, fj(x) =
d(x, Cj) is continuous. Define f : A→ R by
f(x) =
f1(x) + · · ·+ fm(x)
m
.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 197
Then f is also a continuous function. Since A is compact, there is a point
a0 in A such that
f(a0) ≤ f(a) for all a ∈ A.
Notice that fj(a0) ≥ 0 for all 1 ≤ j ≤ m. Since
m⋂
j=1
Cj is disjoint from
A, there is an 1 ≤ k ≤ m such that a0 /∈ Ck. Proposition 3.32 says that
fk(a0) = d(a0, Ck) > 0. Hence, f(a0) > 0. Let δ = f(a0). It is the
minimum value of the function f : A→ R.
Now let S be a nonempty subset of A such that diamS < δ. Take a point
x0 in S. Let 1 ≤ l ≤ m be an integer such that
fl(x0) ≥ fj(x0) for all 1 ≤ j ≤ m.
Then
δ ≤ f(x0) ≤ fl(x0) = d(x0, Cl).
For any u ∈ Cl,
d(x0,u) ≥ d(x0, Cl) ≥ δ.
If x ∈ S, then d(x,x0) ≤ diamS < δ. This implies that x is not in Cl.
Hence, it must be in Uαl
. This shows that S is contained in Uαl
, which is an
element of A . This completes the proof of the theorem.
The Lebesgue number lemma can be used to give an alternative proof of
Theorem 3.37, which says that a continuous function defined on a compact subset
of Rn is uniformly continuous.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 198
Alternative Proof of Theorem 3.37
Fixed ε > 0. We want to show that there exists δ > 0 such that if u and v
are in D and ∥u− v∥ < δ, then ∥F(u)− F(v)∥ < ε.
We will construct an open covering of D indexed by J = D. Since F :
D → Rm is continuous, for each x ∈ D, there is a positive number δx
(depending on x), such that if u is in D and ∥u − x∥ < δx, then ∥F(u) −
F(x)∥ < ε/2. Let Ux = B(x, δx). Then Ux is an open set. If u and
v are in Ux, ∥F(u) − F(x)∥ < ε/2 and ∥F(v) − F(x)∥ < ε/2. Thus,
∥F(u)− F(v)∥ < ε.
Now A = {Ux |x ∈ D} is an open covering of D. Since D is compact,
the Lebesgue number lemma implies that there exists a number δ > 0 such
that if S is a subset of D that has diameter less than δ, then S is contained
in one of the Ux for some x ∈ D. We claim that this is the δ that we need.
If u and v are two points in D and ∥u − v∥ < δ, then S = {u,v} is a set
with diameter less than δ. Hence, there is an x ∈ D such that S ⊂ Ux. This
implies that u and v are in Ux. Hence, ∥F(u)−F(v)∥ < ε. This completes
the proof.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 199
Exercises 3.4
Question 1
Let D = {(x, y) | 2 < x2 + 4y2 < 10}, and let F : D → R3 be the functiondefined as
F(x, y) =
(
x
x2 + y2
,
y
x2 + y2
,
x2 − y2
x2 + y2
)
.
Show that the function F : D → R3 is bounded.
Question 2
Let D = {(x, y, z) | 1 ≤ x2 + 4y2 ≤ 10, 0 ≤ z ≤ 5}, and let f : D → R be
the function defined as
f(x, y, z) =
x2 − y2
x2 + y2 + z2
.
Show that the function f : D → R has a maximum value and a minimum
value.
Question 3
Let A = {(x, y) |x2 + 4y2 ≤ 16} and B = {(x, y) |x+ y ≥ 10}. Show
that the distance between the sets A and B is positive.
Question 4
Let D = {(x, y, z) |x2 + y2 + z2 ≤ 20} and let f : D → R be the function
defined as
f(x, y, z) = ex
2+4z2 .
Show that f : D → R is uniformly continuous.
Chapter 3. Continuous Functions on Connected Sets and Compact Sets 200
Question 5
Let D = (−1, 2)× (−6, 0) and let f : D → R be the function defined as
f(x, y) =
√
x+ y + 7 + ln(x2 + y2 + 1).
Show that f : D → R is uniformly continuous.
Chapter 4. Differentiating Functions of Several Variables 201
Chapter 4
Differentiating Functions of Several Variables
In this chapter, we study differential calculus of functions of several variables.
4.1 Partial Derivatives
When f : (a, b) → R is a function defined on an open interval (a, b), the derivative
of the function at a point x0 in (a, b) is defined as
f ′(x0) = lim
h→0
f(x0) + h)− f(x0)
h
,
provided that the limit exists. The derivative gives the instantaneous rate of change
of the function at the point x0. Geometrically, it is the slope of the tangent line to
the graph of the function f : (a, b) → R at the point (x0, f(x0)).
Figure 4.1: Derivative as slope of tangent line.
Now consider a function f : O → R that is defined on an open subset O of
Rn, where n ≥ 2. What is the natural way to extend the concept of derivatives to
this function?
Chapter 4. Differentiating Functions of Several Variables 202
From the perspective of rate of change, we need to consider the change of f
in various different directions. This leads us to consider directional derivatives.
Another perspective is to regard existence of derivatives as differentiability and
first-order approximation. Later we will see that all these are closely related.
First let us consider the rates of change of the function f : O → R at a
point x0 in O along the directions of the coordinate axes. These are called partial
derivatives.
Definition 4.1 Partial Derivatives
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. For 1 ≤ i ≤ n, we say that the function
f : O → R has a partial derivative with respect to its ith component at the
point x0 if the limit
lim
h→0
f(x0 + hei)− f(x0)
h
exists. In this case, we denote the limit by
∂f
∂xi
(x0), and call it the partial
derivative of f : O → R with respect to xi at x0.
We say that the function f : O → R has partial derivatives at x0 if
∂f
∂xi
(x0)
exists for all 1 ≤ i ≤ n.
Remark 4.1
When we consider partial derivatives of a function, we always assume that
the domain of the function is an open set O, so that each point x0 in the
domain is an interior point of O, and a limit point of O\{x0}. By definition
of open sets, there exists r > 0 such that B(x0, r) is contained in O. This
allows us to compare the function values of f in a neighbourhood of x0
from various different directions.
By definition,
∂f
∂xi
(x0) measures the rate of change of f at x0 in the
direction of ei. It can also be interpreted as the slope of a curve at the
point (x0, f(x0)) on the surface xn+1 = f(x), as shown in Figure 4.2
Chapter 4. Differentiating Functions of Several Variables 203
Notations for Partial Derivatives
An alternative notation for
∂f
∂xi
(x0) is fxi
(x0).
Figure 4.2: Partial derivative.
Remark 4.2 Partial Derivatives
Let x0 = (a1, a2, . . . , an) and define the function g : (−r, r) → R by
g(h) = f(x0 + hei) = f(a1, . . . , ai−1, ai + h, ai+1, . . . , an).
Then
lim
h→0
f(x0 + hei)− f(x0)
h
= lim
h→0
g(h)− g(0)
h
= g′(0).
Thus, fxi
(x0) exists if and only if g(h) is differentiable at h = 0. Moreover,
to find fxi
(x0), we regard the variables x1, . . . , xi−1, xi+1, . . . , xn as
constants, and differentiate with respect to xi. Hence, the derivative rules
such as sum rule, product rule and quotient rule still work for partial
derivatives, as long as one is clear which variable to take derivative, which
variable to be regarded as constant.
Chapter 4. Differentiating Functions of Several Variables 204
Example 4.1
Let f : R2 → R be the function defined as f(x, y) = x2y. Find fx(1, 2)
and fy(1, 2).
Solution
∂f
∂x
= 2xy,
∂f
∂y
= x2.
Therefore,
fx(1, 2) = 4, fy(1, 2) = 1.
Example 4.2
Let f : R2 → R be the function defined as f(x, y) = |x + y|. Determine
whether fx(0, 0) exists.
Solution
By definition, fx(0, 0) is given by the limit
lim
h→0
f(h, 0)− f(0, 0)
h
if it exists. Since
lim
h→0
f(h, 0)− f(0, 0)
h
= lim
h→0
|h|
h
,
and
lim
h→0−
|h|
h
= −1 and lim
h→0+
|h|
h
= 1,
the limit
lim
h→0
f(h, 0)− f(0, 0)
h
does not exist. Hence, fx(0, 0) does not exist.
Chapter 4. Differentiating Functions of Several Variables 205
Definition 4.2
Let O be an open subset of Rn, and let f : O → R be a function defined
on O. If the function f : O → R has partial derivative with respect to xi at
every point of O, this defines the function fxi
: O → R. In this case, we
say that the partial derivative of f with respect to xi exists.
If fxi
: O → R exists for all 1 ≤ i ≤ n, we say that the function f : O → R
has partial derivatives.
Example 4.3
Find the partial derivatives of the function f : R3 → R defined as
f(x, y, z) = sin(xy + z) +
3x
y2 + z2 + 1
.
Solution
∂f
∂x
(x, y, z) = y cos(xy + z) +
3
y2 + z2 + 1
,
∂f
∂y
(x, y, z) = x cos(xy + z)− 6xy
(y2 + z2 + 1)2
,
∂f
∂z
(x, y, z) = cos(xy + z)− 6xz
(y2 + z2 + 1)2
.
For a function defined on an open subset of Rn, there are n partial derivatives
with respect to the n directions defined by the coordinate axes. These define a
vector in Rn.
Definition 4.3 Gradient
Let O be an open subset of Rn, and let x0 be a point in O. If the function
f : O → R has partial derivatives at x0, we define the gradient of the
function f at x0 as the vector in Rn given by
∇f(x0) =
(
∂f
∂x1
(x0),
∂f
∂x2
(x0), · · · ,
∂f
∂xn
(x0)
)
.
Let us revisit Example 4.3.
Chapter 4. Differentiating Functions of Several Variables 206
Example 4.4
The gradient of the function f : R3 → R defined as
f(x, y, z) = sin(xy + z) +
3x
y2 + z2 + 1
in Example 4.3 is the function ∇f : R3 → R3,
∇f(x, y, z) =

y cos(xy + z) +
3
y2 + z2 + 1
x cos(xy + z)− 6xy
(y2 + z2 + 1)2
cos(xy + z)− 6xz
(y2 + z2 + 1)2
 .
In particular,
∇f(1,−1, 1) =
(
0,
5
3
,
1
3
)
.
It is straightforward to extend the definition of partial derivative to a function
F : O → Rm whose codomain is Rm with m ≥ 2.
Definition 4.4
Let O be an open subset of Rn, and let F : O → Rm be a function defined
on O. Given x0 in O and 1 ≤ i ≤ n, we say that F : O → Rm has partial
derivative with respect to xi at the point x0 if the limit
∂F
∂xi
(x0) = lim
h→0
F(x0 + hei)− F(x0)
h
exists. We say that F : O → Rm has partial derivative at the point x0 if
∂F
∂xi
(x0) exists for each 1 ≤ i ≤ n. We say that F : O → Rm has partial
derivative if it has partial derivative at each point of O.
Since the limit of a function G : (−r, r) → Rm when h → 0 exists if and
only if the limit of each component function Gj : (−r, r) → R, 1 ≤ j ≤ m when
h→ 0 exists, we have the following.
Chapter 4. Differentiating Functions of Several Variables 207
Proposition 4.1
Let O be an open subset of Rn, and let F : O → Rm be a function defined
on O. Given x0 in O and 1 ≤ i ≤ n, F : O → Rm has partial derivative
with respect to xi at the point x0 if and only if if each component function
Fj : O → R, 1 ≤ j ≤ m has partial derivative with respect to xi at the
point x0. In this case, we have
∂F
∂xi
(x0) =
(
∂F1
∂xi
(x0), . . . ,
∂Fm
∂xi
(x0)
)
.
To capture all the partial derivatives, we define a derivative matrix.Definition 4.5 The Derivative Matrix
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. If F : O → Rm has partial derivative at the
point x0, the derivative matrix of F : O → Rm at x0 is the m× n matrix
DF(x0) =

∇F1(x0)
∇F2(x0)
...
∇Fm(x0)
 =

∂F1
∂x1
(x0)
∂F1
∂x2
(x0) · · · ∂F1
∂xn
(x0)
∂F2
∂x1
(x0)
∂F2
∂x2
(x0) · · · ∂F2
∂xn
(x0)
...
... . . . ...
∂Fm
∂x1
(x0)
∂Fm
∂x2
(x0) · · · ∂Fm
∂xn
(x0)

.
When m = 1, the derivative matrix is just the gradient of the function as a row
matrix.
Example 4.5
Let F : R3 → R2 be the function defined as
F(x, y, z) =
(
xy2z3, x+ 3y − 7z
)
.
Find the derivative matrix of F at the point (1,−1, 2).
Chapter 4. Differentiating Functions of Several Variables 208
Solution
DF(x, y, z) =
[
y2z3 2xyz3 3xy2z2
1 3 −7
]
.
Thus, the derivative matrix of F at the point (1,−1, 2) is
DF(1,−1, 2) =
[
8 −16 12
1 3 −7
]
.
Since the partial derivatives of a function is defined componentwise, we can
focus on functions f : O → R whose codomain is R. One might wonder why we
have not mentioned the word ”differentiable” so far. For single variable functions,
we have seen in volume I that if a function is differentiable at a point, then it
is continuous at that point. For multivariable functions, the existence of partial
derivatives is not enough to guarantee continuity, as is shown in the next example.
Example 4.6
Let f : R2 → R be the function defined as
f(x, y) =

xy
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
Show that f is not continuous at (0, 0), but it has partial derivatives at (0, 0).
Solution
Consider the sequence {uk} with
uk =
(
1
k
,
1
k
)
.
It is a sequence in R2 that converges to (0, 0). Since
f(uk) =
1
2
for all k ∈ Z+,
Chapter 4. Differentiating Functions of Several Variables 209
Figure 4.3: The function f(x, y) defined in Example 4.6.
the sequence {f(uk)} converges to 1/2. But f(0, 0) = 0 ̸= 1/2. Since
there is a sequence {uk} that converges to (0, 0), but the sequence {f(uk)}
does not converge to f(0, 0), f is not continuous at (0, 0).
To find partial derivatives at (0, 0), we use definitions.
fx(0, 0) = lim
h→0
f(h, 0)− f(0, 0)
h
= lim
h→0
0− 0
h
= 0,
fy(0, 0) = lim
h→0
f(0, h)− f(0, 0)
h
= lim
h→0
0− 0
h
= 0.
These show that f has partial derivatives at (0, 0), and fx(0, 0) = fy(0, 0) =
0.
For the function defined in Example 4.6, it has partial derivatives at all points.
In fact, when (x, y) ̸= (0, 0), we can apply derivative rules directly and find that
∂f
∂x
(x, y) =
(x2 + y2)y − 2x2y
(x2 + y2)2
=
y(y2 − x2)
(x2 + y2)2
.
Similarly,
∂f
∂y
(x, y) =
x(x2 − y2)
(x2 + y2)2
.
Let us highlight again our conclusion.
Chapter 4. Differentiating Functions of Several Variables 210
Partial Derivative vs Continuity
The existence of partial derivatives does not imply continuity.
This prompts us to find a better definition of differentiability, which can imply
continuity. This will be considered in a latter section.
When the function f : O → R has partial derivative with respect to xi, we
obtain the function fxi
: O → R. Then we can discuss whether the function fxi
has partial derivative at a point in O.
Definition 4.6 Second Order Partial Derivatives
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. Given that 1 ≤ i ≤ n, 1 ≤ j ≤ n, we say
that the second order partial derivative
∂2f
∂xj∂xi
exists at x0 provided that
there exists an open ball B(x0, r) that is contained in O such that
∂f
∂xi
:
B(x0, r) → R exists, and it has partial derivative with respect to xj at
the point x0. In this case, we define the second order partial derivative
∂2f
∂xj∂xi
(x0) of f at x0 as
∂2f
∂xj∂xi
(x0) =
∂fxi
∂xj
(x0) = lim
h→0
fxi
(x0 + hej)− fxi
(x0)
h
.
We say that the function f : O → R has second order partial derivatives at
x0 provided that
∂2f
∂xj∂xi
(x0) exists for all 1 ≤ i ≤ n, 1 ≤ j ≤ n.
In the same way, one can also define second order partial derivatives for a
function F : O → Rm with codomain Rm when m ≥ 2.
Chapter 4. Differentiating Functions of Several Variables 211
Remark 4.3
In the definition of the second order partial derivative
∂2f
∂xj∂xi
(x0), instead
of assuming fxi
(x) exists for all x in a ball of radius r centered at x0, it is
sufficient to assume that there exists r > 0 such that fxi
(x0 + hej) exists
for all |h| < r.
Definition 4.7
Given 1 ≤ i ≤ n, 1 ≤ j ≤ n, we say that the function f : O → R has the
second order partial derivative
∂2f
∂xj∂xi
provided that
∂2f
∂xj∂xi
(x0) exists for
all x0 in O.
We say that the function f : O → R has second order partial derivatives
provided that
∂2f
∂xj∂xi
exists for all 1 ≤ i ≤ n, 1 ≤ j ≤ n.
Notations for Second Order Partial Derivatives
Alternative notations for second order partial derivatives are
∂2f
∂xj∂xi
= (fxi
)xj
= fxixj
.
Notice that the orders of xi and xj are different in different notations.
Remark 4.4
Given 1 ≤ i ≤ n, 1 ≤ j ≤ n, the function f : O → R has the second
order partial derivative
∂2f
∂xj∂xi
provided that fxi
: O → R exists, and fxi
has partial derivative with respect to xj .
Example 4.7
Find the second order partial derivatives of the function f : R2 → R defined
as
f(x, y) = xe2x+3y.
Chapter 4. Differentiating Functions of Several Variables 212
Solution
We find the first order partial derivatives first.
∂f
∂x
(x, y) = e2x+3y + 2xe2x+3y = (1 + 2x)e2x+3y,
∂f
∂y
(x, y) = 3xe2x+3y.
Then we compute the second order partial derivatives.
∂2f
∂x2
(x, y) = 2e2x+3y + 2(1 + 2x)e2x+3y = (4 + 4x)e2x+3y,
∂2f
∂y∂x
(x, y) = 3(1 + 2x)e2x+3y = (3 + 6x)e2x+3y,
∂2f
∂x∂y
(x, y) = 3e2x+3y + 6xe2x+3y = (3 + 6x)e2x+3y,
∂2f
∂y2
(x, y) = 9xe2x+3y.
Definition 4.8 The Hessian Matrix
Let O be an open subset of Rn that contains the point x0. If f : O → R is a
function that has second order partial derivatives at x0, the Hessian matrix
of f at x0 is the n× n matrix defined as
Hf (x0) =
[
∂2f
∂xi∂xj
(x0)
]
=

∂2f
∂x21
(x0)
∂2f
∂x1∂x2
(x0) · · · ∂2f
∂x1∂xn
(x0)
∂2f
∂x2∂x1
(x0)
∂2f
∂x22
(x0) · · · ∂2f
∂x2∂xn
(x0)
...
... . . . ...
∂2f
∂xn∂x1
(x0)
∂2f
∂xn∂x2
(x0) · · · ∂2f
∂x2n
(x0)

.
We do not define Hessian matrix for a function F : O → Rm with codomain
Rm when m ≥ 2.
Chapter 4. Differentiating Functions of Several Variables 213
Example 4.8
For the function f : R2 → R defined as f(x, y) = xe2x+3y in Example 4.7,
Hf (x, y) =
[
(4 + 4x)e2x+3y (3 + 6x)e2x+3y
(3 + 6x)e2x+3y 9xe2x+3y
]
.
In Example 4.7, we notice that
∂2f
∂y∂x
(x, y) =
∂2f
∂x∂y
(x, y)
for all (x, y) ∈ R2. The following example shows that this is not always true.
Example 4.9
Consider the function f : R2 → R defined as
f(x, y) =

xy(x2 − y2)
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
Find fxy(0, 0) and fyx(0, 0).
Figure 4.4: The function f(x, y) defined in Example 4.9.
Chapter 4. Differentiating Functions of Several Variables 214
Solution
To compute fxy(0, 0), we need to compute fx(0, h) for all h in a
neighbourhood of 0. To compute fyx(0, 0), we need to compute fy(h, 0)
for all h in a neighbourhood of 0. Notice that for any h ∈ R, f(0, h) =
f(h, 0) = 0. By considering h = 0 and h ̸= 0 separately, we find that
fx(0, h) = lim
t→0
f(t, h)− f(0, h)
t
= lim
t→0
h(t2 − h2)
t2 + h2
= −h,
fy(h, 0) = lim
t→0
f(h, t)− f(h, 0)
t
= lim
t→0
h(h2 − t2)
h2 + t2
= h.
It follows that
fxy(0, 0) = lim
h→0
fx(0, h)− fx(0, 0)
h
= lim
h→0
−h
h
= −1,
fyx(0, 0) = lim
h→0
fy(h, 0)− fy(0, 0)
h
= lim
h→0
h
h
= 1.
Example 4.9 shows that there exists a function f : R2 → R which has
second order partial derivatives at (0, 0) but
∂2f
∂x∂y
(0, 0) ̸= ∂2f
∂y∂x
(0, 0).
Remark 4.5
If O is an open subset of Rn that contains the point x0, there exists r > 0
such that B(x0, r) ⊂ O. Given that f : O → R is a function defined on O,
and 1 ≤ i < j ≤ n, let D be the ball with centerat (0, 0) and radius r in
R2. Define the function g : D → R by
g(u, v) = f(x0 + uei + vej).
Then
∂2f
∂xj∂xi
(x0) exists if and only if
∂2g
∂v∂u
(0, 0) exists. In such case, we
have
∂2f
∂xj∂xi
(x0) =
∂2g
∂v∂u
(0, 0).
Chapter 4. Differentiating Functions of Several Variables 215
The following gives a sufficient condition to interchange the order of taking
partial derivatives.
Theorem 4.2 Clairaut’s Theorem or Schwarz’s Theorem
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. Assume that 1 ≤ i < j ≤ n, and the second
order partial derivatives
∂2f
∂xj∂xi
: O → R and
∂2f
∂xi∂xj
: O → R exist. If
the functions
∂2f
∂xj∂xi
and
∂2f
∂xi∂xj
: O → R are continuous at x0, then
∂2f
∂xj∂xi
(x0) =
∂2f
∂xi∂xj
(x0).
Proof
Since O is an open set that contains the point x0, there exists r > 0 such
that B(x0, r) ⊂ O. Let
D =
{
(u, v)|u2 + v2 < r2
}
,
and define the function g : D → R by
g(u, v) = f(x0 + uei + vej).
By Remark 4.5, g has second order partial derivatives, and
∂2g
∂v∂u
and
∂2g
∂u∂v
are continuous at (0, 0). We need to show that
∂2g
∂v∂u
(0, 0) =
∂2g
∂u∂v
(0, 0).
Consider the function
G(u, v) = g(u, v)− g(u, 0)− g(0, v) + g(0, 0).
Notice that
G(u, v) = Hv(u)−Hv(0) = Su(v)− Su(0),
Chapter 4. Differentiating Functions of Several Variables 216
where
Hv(u) = g(u, v)− g(u, 0), Su(v) = g(u, v)− g(0, v).
For fixed v with |v| < r, the function Hv(u) is defined for those u with
|u| <
√
r2 − v2, such that (u, v) is in D. It is differentiable with
H ′
v(u) =
∂g
∂u
(u, v)− ∂g
∂u
(u, 0).
Hence, if (u, v) is in D, mean value theorem for single variable functions
implies that there exists cu,v ∈ (0, 1) such that
G(u, v) = Hv(u)−Hv(0)
= uH ′
v(cu,vu)
= u
(
∂g
∂u
(cu,vu, v)−
∂g
∂u
(cu,vu, 0)
)
.
Regard this now as a function of v, the mean value theorem for single
variable functions implies that there exists du,v ∈ (0, 1) such that
G(u, v) = uv
∂2g
∂v∂u
(cu,vu, du,vv). (4.1)
Using the same reasoning, we find that for (u, v) ∈ D, there exists d̃u,v ∈
(0, 1) such that
G(u, v) = vS ′
u(d̃u,vv) = v
(
∂g
∂v
(u, d̃u,vv)−
∂g
∂v
(0, d̃u,vv)
)
.
Regard this as a function of u, mean value theorem implies that there exists
c̃u,v ∈ (0, 1) such that
G(u, v) = uv
∂2g
∂u∂v
(c̃u,vu, d̃u,vv). (4.2)
Comparing (4.1) and (4.2), we find that
∂2g
∂v∂u
(cu,vu, du,vv) =
∂2g
∂u∂v
(c̃u,vu, d̃u,vv).
Chapter 4. Differentiating Functions of Several Variables 217
When (u, v) → (0, 0), (cu,vu, du,vv) → (0, 0) and (c̃u,vu, d̃u,vv) → (0, 0).
The continuities of guv and gvu at (0, 0) then imply that
∂2g
∂v∂u
(0, 0) =
∂2g
∂u∂v
(0, 0).
This completes the proof.
Example 4.10
Consider the function f : R2 → R in Example 4.9 defined as
f(x, y) =

xy(x2 − y2)
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
When (x, y) ̸= (0, 0), we find that
∂f
∂x
(x, y) =
y(x4 + 4x2y2 − y4)
(x2 + y2)2
,
∂f
∂y
(x, y) =
x(x4 − 4x2y2 − y4)
(x2 + y2)2
.
It follows that
∂2f
∂y∂x
(x, y) =
x6 + 9x4y2 − 9x2y4 − y6
(x2 + y2)3
=
∂2f
∂x∂y
(x, y).
Indeed, both fxy and fyx are continuous on R2 \ {(0, 0)}.
Corollary 4.3
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. If all the second order partial derivatives of the
function f : O → R at x0 are continuous, then the Hessian matrix Hf (x0)
of f at x0 is a symmetric matrix.
Chapter 4. Differentiating Functions of Several Variables 218
Remark 4.6
One can define partial derivatives of higher orders following the same
rationale as we define the second order partial derivatives. Extension of
Clairaut’s theorem to higher order partial derivatives is straightforward.
The key point is the continuity of the partial derivatives involved.
Chapter 4. Differentiating Functions of Several Variables 219
Exercises 4.1
Question 1
Let f : R3 → R be the function defined as
f(x, y, z) =
xz
ey + 1
.
Find ∇f(1, 0,−1), the gradient of f at the point (1, 0,−1).
Question 2
Let F : R2 → R3 be the function defined as
F(x, y) =
(
x2y, xy2, 3x2 + 4y2
)
.
Find DF(2,−1), the derivative matrix of F at the point (2,−1).
Question 3
Let f : R3 → R be the function defined as
f(x, y, z) = x2 + 3xyz + 2y2z3.
Find Hf (1,−1, 2), the Hessian matrix of f at the point (1,−1, 2).
Question 4
Let f : R2 → R be the function defined as
f(x, y) =

3xy
x2 + 4y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
Show that f is not continuous at (0, 0), but it has partial derivatives at (0, 0).
Question 5
Let f : R2 → R be the function defined as f(x, y) = |x2 + y|. Determine
whether fy(1,−1) exists.
Chapter 4. Differentiating Functions of Several Variables 220
Question 6
Let f : R2 → R be the function defined as
f(x, y) =

x2y
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
Show that f is continuous, it has partial derivatives, but the partial
derivatives are not continuous.
Question 7
Consider the function f : R2 → R defined as
f(x, y) =

xy(x2 + 9y2)
4x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
Find the Hessian matrix Hf (0, 0) of f at (0, 0).
Chapter 4. Differentiating Functions of Several Variables 221
4.2 Differentiability and First Order Approximation
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. As we have seen in the previous section, even
if F has partial derivatives at x0, it does not imply that F is continuous at x0.
Heuristically, this is because the partial derivatives only consider the change of the
function along the n directions defined by the coordinate axes, while continuity of
F requires us to consider the change of F along all directions.
4.2.1 Differentiability
In this section, we will give a suitable definition of differentiability to ensure that
we can capture the change of F in all directions. Let us first revisit an alternative
perpective of differentiability for a single variable function f : (a, b) → R, which
we have discussed in volume I. If x0 is a point in (a, b), then the function f :
(a, b) → R is differentiable at x0 if and only if there is a number c such that
lim
h→0
f(x0 + h)− f(x0)− ch
h
= 0. (4.3)
In fact, if f is differentiable at x0, then this number c has to equal to f ′(x0).
Now for a function F : O → Rm defined on an open subset O of Rn, to
consider the differentiability of F at x0 ∈ O, we should compare F(x0) to F(x0+
h) for all h in a neighbourhood of 0. But then a reasonable substitute of the
number c should be a linear transformation T : Rn → Rm, so that for each h in
a neighbourhood of 0, it gives a vector T(h) in Rm. As now h is a vector in Rn,
we cannot divide by h in (4.3). It should be replaced with ∥h∥, the norm of h.
Definition 4.9 Differentiability
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. The function F : O → Rm is differentiable
at x0 provided that there exists a linear transformation T : Rn → Rm so
that
lim
h→0
F(x0 + h)− F(x0)−T(h)
∥h∥
= 0.
F : O → Rm is differentiable if it is differentiable at each point of O.
Chapter 4. Differentiating Functions of Several Variables 222
Remark 4.7
The differentiability of F : O → Rm at x0 amounts to the existence of a
linear transformation T : Rn → Rm so that
F(x0 + h) = F(x0) +T(h) + ε(h)∥h∥,
where ε(h) → 0 as h → 0.
The following is obvious from the definition.
Proposition 4.4
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. The function F : O → Rm is differentiable
at x0 if and only if each of its component functions Fj : O → R, 1 ≤ j ≤ m
is differentiable at x0.
Proof
Let the components of the function
ε(h) =
F(x0 + h)− F(x0)−T(h)
∥h∥
be ε1(h), ε2(h), . . . , εm(h). Then for 1 ≤ j ≤ m,
εj(h) =
Fj(x0 + h)− Fj(x0)− Tj(h)
∥h∥
.
The assertion of the proposition follows from the fact that
lim
h→0
ε(h) = 0 if and only if lim
h→0
εj(h) = 0 for all 1 ≤ j ≤ m,
while lim
h→0
εj(h)= 0 if and only if Fj : O → R is differentiable at x0.
Let us look at a simple example of differentiable functions.
Chapter 4. Differentiating Functions of Several Variables 223
Example 4.11
Let A be an m× n matrix, and let b be a point in Rm. Define the function
F : Rn → Rm by
F(x) = Ax+ b.
Show that F : Rn → Rm is differentiable.
Solution
Given x0 and h in Rn, notice that
F(x0 + h)− F(x0) = A(x0 + h) + b− Ax0 − b = Ah. (4.4)
The map T : Rn → Rm defined as T(h) = Ah is a linear transformation.
Eq. (4.4) says that
F(x0 + h)− F(x0)−T(h) = 0.
Thus,
lim
h→0
F(x0 + h)− F(x0)−T(h)
∥h∥
= 0.
Therefore, F is differentiable at x0. Since the point x0 is arbitrary, the
function F : Rn → Rm is differentiable.
The next theorem says that differentiability implies continuity.
Theorem 4.5 Differentiability Implies Continuity
Let O be an open subset of Rn that contains the point x0, and let F :
O → Rm be a function defined on O. If the function F : O → Rm is
differentiable at x0, then it is continuous at x0.
Proof
Since F : O → Rm is differentiable at x0, there exists a linear
transformation T : Rn → Rm such that
ε(h) =
F(x0 + h)− F(x0)−T(h)
∥h∥
h→0−−−−→ 0.
Chapter 4. Differentiating Functions of Several Variables 224
By Theorem 2.34, there is a positive constant c such that
∥T(h)∥ ≤ c∥h∥ for all h ∈ Rn.
Therefore,
∥F(x0 + h)− F(x0)∥ ≤ ∥T(h)∥+ ∥h∥∥ε(h)∥ ≤ ∥h∥ (c+ ∥ε(h)∥) .
This implies that
lim
h→0
F(x0 + h) = F(x0).
Thus, F : O → Rm is continuous at x0.
Example 4.12
The function f : R2 → R defined as
f(x, y) =

xy
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0)
in Example 4.6 is not differentiable at (0, 0) since it is not continuous at
(0, 0). However, we have shown that it has partial derivatives at (0, 0).
Let us study the function F : Rn → Rm, F(x) = Ax + b that is defined in
Example 4.11. The component functions of F are
F1(x1, x2, . . . , xn) = a11x1 + a12x2 + · · ·+ a1nxn + b1,
F2(x1, x2, . . . , xn) = a21x1 + a22x2 + · · ·+ a2nxn + b2,
...
Fm(x1, x2, . . . , xn) = am1x1 + am2x2 + · · ·+ amnxn + bm.
Notice that
∇F1(x) = a1 = (a11, a12, . . . , a1n) ,
∇F2(x) = a2 = (a21, a22, . . . , a2n) ,
...
∇Fm(x) = am = (am1, am2, . . . , amn)
Chapter 4. Differentiating Functions of Several Variables 225
are the row vectors of A. Hence, the derivative matrix of F is a given by
DF(x) =

∇F1(x)
∇F2(x)
...
∇Fm(x)
 =

a11 a12 · · · a1n
a21 a22 · · · a2n
...
... . . . ...
am1 am2 · · · amn
 ,
which is the matrix A itself. Observe that
DF(x)h =

a11h1 + a12h2 + · · ·+ a1nhn
a21h1 + a22h2 + · · ·+ a2nhn
...
am1h1 + am2h2 + · · ·+ amnhn
 =

⟨∇F1(x),h⟩
⟨∇F2(x),h⟩
...
⟨∇Fm(x),h⟩
 .
From Example 4.11, we suspect that the linear transformation T : Rn → Rm
that appears in the definition of differentiability of a function should be the linear
transformation defined by the derivative matrix. In fact, this is the case.
Theorem 4.6
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. The following are equivalent.
(a) The function F : O → Rm is differentiable at x0.
(b) The function F : O → Rm has partial derivatives at x0, and
lim
h→0
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
= 0. (4.5)
(c) For each 1 ≤ j ≤ m, the component function Fj : O → R has partial
derivatives at x0, and
lim
h→0
Fj(x0 + h)− Fj(x0)− ⟨∇Fj(x0),h⟩
∥h∥
= 0.
Chapter 4. Differentiating Functions of Several Variables 226
Proof
The equivalence of (b) and (c) is Proposition 4.4, the componentwise
differentiability. Thus, we are left to prove the equivalence of (a) and (b).
First, we prove (b) implies (a). If (b) holds, let T : Rn → Rm be the linear
transformation defined by the derivative matrix DF(x0). Then (4.5) says
that F : O → Rm is differentiable at x0.
Conversely, assume that F : O → Rm is differentiable at x0. Then there
exists a linear transformation T : Rn → Rm such that
lim
h→0
F(x0 + h)− F(x0)−T(h)
∥h∥
= 0. (4.6)
Let A be a m × n matrix so that T(h) = Ah. For 1 ≤ i ≤ n, eq. (4.6)
implies that
lim
h→0
F(x0 + hei)− F(x0)− A(hei)
h
= 0.
This gives
Aei = lim
h→0
F(x0 + hei)− F(x0)
h
.
This shows that
∂F
∂xi
(x0) exists and
∂F
∂xi
(x0) = Aei.
Therefore, F : O → Rm has partial derivatives at x0. Since
A =
[
Ae1 Ae2 · · · Aen
]
=
[
∂F
∂x1
(x0)
∂F
∂x2
(x0) · · · ∂F
∂xn
(x0)
]
= DF(x0),
eq. (4.6) says that
lim
h→0
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
= 0.
This proves (a) implies (b).
Chapter 4. Differentiating Functions of Several Variables 227
Corollary 4.7
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. If the partial derivatives of F : O → Rm
exist at x0, but
lim
h→0
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
̸= 0,
then F is not differentiable at x0.
Proof
If F is differentiable at x0, Theorem 4.6 says that we must have
lim
h→0
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
= 0.
By contrapositive, since
lim
h→0
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
̸= 0,
we find that F is not differentiable at x0.
Example 4.13
Let f : R2 → R be the function defined as
f(x, y) =

x3
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
Determine whether f is differentiable at (0, 0).
Solution
One can show that f is continuous at 0 = (0, 0). Hence, we cannot use
continuity to determine whether f is differentiable at x0. Notice that
fx(0, 0) = lim
h→0
f(h, 0)− f(0, 0)
h
= lim
h→0
h− 0
h
= 1,
Chapter 4. Differentiating Functions of Several Variables 228
Figure 4.5: The function f(x, y) defined in Example 4.13.
fy(0, 0) = lim
h→0
f(0, h)− f(0, 0)
h
= lim
h→0
0− 0
h
= 0.
Therefore, f has partial derivatives at 0, and ∇f(0) = (1, 0). Now we
consider the function
ε(h) =
f(h)− f(0)− ⟨∇f(0),h⟩
∥h∥
= − h1h
2
2
(h21 + h22)
3/2
.
Let {hk} be the sequence with hk =
(
1
k
,
1
k
)
. It converges to 0. Since
ε(hk) = − 1
2
√
2
for all k ∈ Z+,
The sequence {ε(hk)} does not converge to 0. Hence,
lim
h→0
f(h)− f(0)− ⟨∇f(0),h⟩
∥h∥
̸= 0.
Therefore, f is not differentiable at (0, 0).
Example 4.13 gives a function which is continuous and has partial derivatives
at a point, yet it fails to be differentiable at that point. In the following, we are
going to give a sufficient condition for differentiability. We begin with a lemma.
Chapter 4. Differentiating Functions of Several Variables 229
Lemma 4.8
Let x0 be a point in Rn and let f : B(x0, r) → R be a function defined on
an open ball centered at x0. Assume that f : B(x0, r) → R has first order
partial derivatives. For each h in Rn with ∥h∥ < r, there exists z1, . . . , zn
in B(x0, r) such that
f(x0 + h)− f(x0) =
n∑
i=1
hi
∂f
∂xi
(zi),
and
∥zi − x0∥ < ∥h∥ for all 1 ≤ i ≤ n.
Proof
We will take a zigzag path from x0 to x0 + h, which is a union of paths
parallel to the coordinate axes. For 1 ≤ i ≤ n, let
xi = x0 +
i∑
k=1
hkek = x0 + h1e1 + · · ·+ hiei.
Then xi is in B(x0, r). Notice that B(x0, r) is a convex set. Therefore,
for any 1 ≤ i ≤ n, the line segment between xi−1 and xi = xi−1 + hiei
lies entirely inside B(x0, r). Since f : B(x0, r) → R has first order partial
derivative with respect to xi, the function gi : [0, 1] → R,
gi(t) = f(xi−1 + thiei)
is differentiable and
g′i(t) = hi
∂f
∂xi
(xi−1 + thiei).
By mean value theorem, there exists ci ∈ (0, 1) such that
f(xi)− f(xi−1) = gi(1)− gi(0) = g′i(ci) = hi
∂f
∂xi
(xi−1 + cihiei).
Chapter 4. Differentiating Functions of Several Variables 230
Let
zi = xi−1 + cihiei = x0 +
i−1∑
k=1
hkek + cihiei.
Then zi is a point in B(x0, r). Moreover,
f(x0 + h)− f(x0) =
n∑
i=1
(f(xi)− f(xi−1)) =
n∑
i=1
hi
∂f
∂xi
(zi).
For 1 ≤ i ≤ n, since ci ∈ (0, 1), we have
∥zi − x0∥ =
√
h21 + · · ·+ h2i−1 + c2ih
2
i <
√
h21 + · · ·+ h2i−1 + h2i ≤ ∥h∥.
This completes the proof.
Figure 4.6: A zigzag path from x0 to x0 + h.
Theorem 4.9
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. If the partial derivatives of F : O → Rm
exists and are continuous at x0, thenF is differentiable at x0.
Chapter 4. Differentiating Functions of Several Variables 231
Proof
By Proposition 4.4, it suffices to prove the theorem for a function f : O →
R with codomain R. Since O is an open set that contains the point x0,
there exists r > 0 such that B(x0, r) ⊂ O. By Lemma 4.8, for each h that
satisfies 0 < ∥h∥ < r, there exists z1, z2, . . . , zn such that
f(x0 + h)− f(x0) =
n∑
i=1
hi
∂f
∂xi
(zi),
and
∥zi − x0∥ < ∥h∥ for all 1 ≤ i ≤ n.
Therefore,
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩
∥h∥
=
n∑
i=1
hi
∥h∥
(
∂f
∂xi
(zi)−
∂f
∂xi
(x0)
)
.
Fixed ε > 0. For 1 ≤ i ≤ n, since fxi
: B(x0, r) → R is continuous at x0,
there exists 0 < δi ≤ r such that if 0 < ∥z− x0∥ < δi, then
|fxi
(z)− fxi
(x0)| <
ε
n
.
Take δ = min{δ1, . . . , δn}. Then δ > 0. If ∥h∥ < δ, then for 1 ≤ i ≤ n,
∥zi − x0∥ < ∥h∥ < δ ≤ δi. Thus,
|fxi
(zi)− fxi
(x0)| <
ε
n
.
This implies that∣∣∣∣f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩
∥h∥
∣∣∣∣ ≤ n∑
i=1
|hi|
∥h∥
∣∣∣∣ ∂f∂xi (zi)− ∂f
∂xi
(x0)
∣∣∣∣
< ε.
Hence,
lim
h→0
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩
∥h∥
= 0.
This proves that f is differentiable at x0.
Chapter 4. Differentiating Functions of Several Variables 232
Theorem 4.9 says that a function which has continuous partial derivatives is
differentiable. This prompts us to make the following definition.
Definition 4.10 Continuously Differentiable
Let O be an open subset of Rn, and let F : O → Rm be a function defined
on O. We say that F : O → Rm is continuously differentiable, or C1,
provided that it has partial derivatives that are continuous.
Theorem 4.9 says that a continuously differentiable function is differentiable.
Analogously, we define Ck for any k ≥ 1.
Definition 4.11 CkCkCk Functions
Let O be an open subset of Rn, and let F : O → Rm be a function defined
on O. We say that F : O → Rm is k-times continuously differentiable, or
Ck, provided that it has all partial derivatives of order k, and each of them
is continuous.
Definition 4.12 C∞C∞C∞ Functions
Let O be an open subset of Rn, and let F : O → Rm be a function defined
on O. We say that F : O → Rm is infinitely differentiable, orC∞, provided
that it is Ck for all positive integers k.
Proposition 4.10
Polynomials and rational functions are infinitely differentiable functions.
Sketch of Proof
A partial derivative of a rational function is still a rational function, which
is continuous.
Obviously, for any k ∈ Z+, a Ck+1 function is Ck.
Chapter 4. Differentiating Functions of Several Variables 233
Remark 4.8 Higher Order Differentiability
We can define second order differentiability in the following way. We say
that a function F : O → R is twice differentiable at a point x0 in O if there
is a neighbourhood of x0 which F has first order partial derivatives, and
each of them is differentiable at the point x0. Theorem 4.9 says that a C2
function is twice differentiable.
Similarly, we can define higher order differentiability.
4.2.2 First Order Approximations
First we extend the concept of order of approximation to multivariable functions.
Definition 4.13 Order of Approximation
Let O be an open subset of Rn that contains the point x0, and let k be a
positive integer. We say that the two functions F : O → Rm and G : O →
Rm are kth-order of approximations of each other at x0 provided that
lim
h→0
F(x0 + h)−G(x0 + h)
∥h∥k
= 0.
Recall that a mapping G : O → Rm is a polynomial mapping of degree at
most one if it has the form
G(x) =

a11x1 + a12x2 + · · ·+ a1nxn + b1
a21x1 + a22x2 + · · ·+ a2nxn + b2
...
am1x1 + am2x2 + · · ·+ amnxn + bm
 = Ax+ b,
where A = [aij] and b = (b1, . . . , bm). The mapping G is a linear transformation
if and only if b = 0.
The following theorem shows that first order approximation is closely related
to differentiability. It is a consequence of Theorem 4.6.
Chapter 4. Differentiating Functions of Several Variables 234
Theorem 4.11 First Order Approximation Theorem
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O.
(a) If F : O → Rm is continuous at x0, and there is a polynomial mapping
G : O → Rm of degree at most one which is a first order approximation
of F : O → Rm at the point x0, then F : O → Rm is differentiable at
x0.
(b) If F : O → Rm is differentiable at x0, then there is a unique polynomial
mapping G : O → Rm of degree at most one which is a first order
approximation of F at x0. It is given by
G(x) = F(x0) +DF(x0)(x− x0).
Proof
First we prove (a). Assume that G : O → Rm is a polynomial mapping of
degree at most one which is a first order approximation of F : O → Rm at
the point x0. There exists an m × n matrix A and a vector b in Rm such
that
G(x) = Ax+ b.
By assumption,
lim
h→0
F(x0 + h)− A(x0 + h)− b
∥h∥
= 0. (4.7)
This implies that
lim
h→0
(F(x0 + h)− A(x0 + h)− b) = 0,
which gives
Ax0 + b = lim
h→0
F(x0 + h) = F(x0).
Substitute back into (4.7), we find that
lim
h→0
F(x0 + h)− F(x0)− Ah
∥h∥
= 0.
Chapter 4. Differentiating Functions of Several Variables 235
Since T(h) = Ah is a linear transformation, this shows that F : O → Rm
is differentiable at x0.
Next, we prove (b). If F : O → Rm is differentiable at x0, Theorem 4.6
says that
lim
h→0
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
= 0.
This precisely means that the polynomial mapping G : O → Rm,
G(x) = F(x0) +DF(x0)(x− x0),
is a first order approximation of F : O → Rm at x0. By definition, the
polynomial mapping G has degree at most one. The uniqueness of G is
also asserted in Theorem 4.6.
Remark 4.9
The first order approximation theorem says that if the function F : O →
Rm is differentiable at the point u, then there is a unique polynomial
mapping G : O → Rm of degree at most one which is a first order
approximation of F : O → Rm at the point u. The components of the
mapping G : O → Rm are given by
Gj(x1, . . . , xn) = Fj(u1, . . . , un) +
n∑
i=1
∂Fj
∂xi
(u1, . . . , un)(xi − ui).
Notice that this is a (generalization) of Taylor polynomial of order 1.
Example 4.14
Let F : R3 → R2 be the function defined as
F(x, y, z) = (xyz2, x+ 2y + 3z),
and let x0 = (1,−1, 1). Find a vector b in R2 and a 2 × 3 matrix A such
that
lim
h→0
F(x0 + h)− Ah− b
∥h∥
= 0.
Chapter 4. Differentiating Functions of Several Variables 236
Solution
The function F : R3 → R2 is itself a polynomial mapping. Hence, it is
differentiable. The derivative matrix is given by
DF(x) =
[
yz2 xz2 2xyz
1 2 3
]
.
By the first order approximation theorem, b = F(x0) = (−1, 2) and
A = DF(1,−1, 1) =
[
−1 1 −2
1 2 3
]
.
Example 4.15
Determine whether the limit lim
(x,y)→(0,0)
ex+2y − 1− x− 2y√
x2 + y2
exists.
Solution
Let f(x, y) = ex+2y. Then
∂f
∂x
(x, y) = ex+2y,
∂f
∂y
(x, y) = 2ex+2y.
It follows that
f(0, 0) = 1,
∂f
∂x
(0, 0) = 1,
∂f
∂y
(0, 0) = 2.
Since the function g(x, y) = x + 2y is continuous and the exponential
function is also continuous, f has continuous first order partial derivatives.
Hence, f is differentiable. By first order approximation theorem,
lim
(x,y)→(0,0)
f(x, y)− f(0, 0)− x
∂f
∂x
(0, 0)− y
∂f
∂y
(0, 0)√
x2 + y2
= 0.
Chapter 4. Differentiating Functions of Several Variables 237
Since
f(x, y)− f(0, 0)− x
∂f
∂x
(0, 0)− y
∂f
∂y
(0, 0) = ex+2y − 1− x− 2y,
we find that
lim
(x,y)→(0,0)
ex+2y − 1− x− 2y√
x2 + y2
= 0.
4.2.3 Tangent Planes
The tangent plane to a graph is closely related to the concept of differentiability
and first order approximations. Recall that the graph of a function f : O → R
defined on a subset of Rn is the subset of Rn+1 consists of all the points of the
form (x, f(x)) where x ∈ O.
Definition 4.14 Tangent Planes
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. The graph of f has a tangent plane at x0 if it
is differentiable at x0. In this case, the tangent plane is the hyperplane of
Rn+1 that satisfies the equation
xn+1 = f(x0) + ⟨∇f(x0),x− x0⟩, where x = (x1, . . . , xn).
The tangent plane is the graph of the polynomial function of degree at most
onewhich is the first order approximation of the function f at the point x0.
Example 4.16
Find the equation of the tangent plane to the graph of the function f : R2 →
R, f(x, y) = x2 + 4xy + 5y2 at the point where (x, y) = (1,−1).
Solution
The function f is a polynomial. Hence, it is a differentiable function with
∇f(x, y) = (2x+ 4y, 4x+ 10y).
Chapter 4. Differentiating Functions of Several Variables 238
Figure 4.7: The tangent plane to the graph of a function.
From this, we find that ∇f(1,−1) = (−2,−6). Together with f(1,−1) =
2, we find that the equation of the tangent plane to the graph of f at the
point where (x, y) = (1,−1) is
z = 2− 2(x− 1)− 6(y + 1) = −2x− 6y − 2.
4.2.4 Directional Derivatives
As we mentioned before, the partial derivatives measure the rate of change of the
function when it varies along the directions of the coordinate axes. To capture
the rate of change of a function along other directions, we define the concept of
directional derivatives. Notice that a direction in Rn is specified by a unit vector.
Definition 4.15 Directional Derivatives
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. Given a unit vector u in Rn, we say that F
has directional derivative in the direction of u at the point x0 provided that
the limit
lim
h→0
F(x0 + hu)− F(x0)
h
exists. This limit, denoted as DuF(x0), is called the directional derivative
of F in the direction of u at the point x0.
Chapter 4. Differentiating Functions of Several Variables 239
Whenm = 1, it is customary to denote the directional derivative of f : O → R
in the direction of u at the point x0 as Duf(x0).
Remark 4.10
For any nonzero vector v in Rn, we can also define DvF(x0) as
DvF(x0) = lim
h→0
F(x0 + hv)− F(x0)
h
.
However, we will not call it a directional derivative unless v is a unit vector.
Remark 4.11
From the definition, it is obvious that when u is one of the standard unit
vectors e1, . . ., en, then the directional derivative in the direction of u is a
partial derivative. More precisely,
DeiF(x0) =
∂F
∂xi
(x0), 1 ≤ i ≤ n.
The following is obvious.
Proposition 4.12
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. Given a nonzero vector v in Rn, DvF(x0)
exists if only if DvFj(x0) exists for all 1 ≤ j ≤ m. Moreover,
DvF(x0) = (DvF1(x0), DvF2(x0), . . . , DvFm(x0)) .
Example 4.17
Let f : R2 → R be the function defined as
f(x, y) = x2y.
Given that v = (v1, v2) is a nonzero vector in R2, find Dvf(3, 2).
Chapter 4. Differentiating Functions of Several Variables 240
Solution
By definition,
Dvf(3, 2) = lim
h→0
f(3 + hv1, 2 + hv2)− f(3, 2)
h
= g′(0),
where
g(h) = f(3 + hv1, 2 + hv2) = (3 + hv1)
2(2 + hv2).
Since
g′(h) = 2v1(3 + hv1)(2 + hv2) + v2(3 + hv1)
2,
we find that
Dvf(3, 2) = g′(0) = 12v1 + 9v2.
Take v = e1 = (1, 0) and v = e2 = (0, 1) respectively, we find that
fx(3, 2) = 12 and fy(3, 2) = 9. For general v = (v1, v2), we notice that
Dvf(3, 2) = ⟨∇f(3, 2),v⟩.
Example 4.18
Consider the function f : R2 → R defined as
f(x, y) =

xy
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0)
in Example 4.6. Find all the nonzero vectors v for which Dvf(0, 0) exists.
Solution
Given a nonzero vector v = (v1, v2), v21 + v22 ̸= 0. By definition,
Dvf(0, 0) = lim
h→0
f(hv1, hv2)− f(0, 0)
h
= lim
h→0
1
h
v1v2
v21 + v22
.
This limit exists if and only if v1v2 = 0, which is the case if v1 = 0 or
v2 = 0.
Chapter 4. Differentiating Functions of Several Variables 241
Figure 4.8: The function f(x, y) in Example 4.18.
Example 4.19
Let f : R2 → R be the function defined as
f(x, y) =

y
√
x2 + y2
|x|
, if x ̸= 0,
0, if x = 0.
Find all the nonzero vectors v for which Dvf(0, 0) exists.
Figure 4.9: The function f(x, y) in Example 4.19.
Chapter 4. Differentiating Functions of Several Variables 242
Solution
Given a nonzero vector v = (v1, v2), we consider two cases.
Case I: v1 = 0.
Then v = (0, v2). In this case,
Dvf(0, 0) = lim
h→0
f(0, hv2)− f(0, 0)
h
= lim
h→0
0− 0
h
= 0.
Case 2: v1 ̸= 0.
Dvf(0, 0) = lim
h→0
f(hv1, hv2)− f(0, 0)
h
= lim
h→0
1
h
hv2
|hv1|
√
h2(v21 + v22)
=
v2
√
v21 + v22
|v1|
.
We conclude that Dvf(0, 0) exists for all nonzero vectors v.
Remark 4.12
For the function considered in Example 4.19, by taking v to be (1, 0) and
(0, 1) respectively, we find that fx(0, 0) = 0 and fy(0, 0) = 0. Notice that
lim
h→0
f(h)− f(0)− ⟨∇f(0),h⟩
∥h∥
= lim
h→0
h2
|h1|
.
This limit does not exist. By Corollary 4.7, f is not differentiable at (0, 0).
This gives an example of a function which is not differentiable at (0, 0) but
has directional derivatives at (0, 0) in all directions. In fact, one can show
that f is not continuous at (0, 0).
The following theorem says that differentiability of a function implies existence
of directional derivatives.
Chapter 4. Differentiating Functions of Several Variables 243
Theorem 4.13
Let O be an open subset of Rn that contains the point x0, and let F : O →
Rm be a function defined on O. If F is differentiable at x0, then for any
nonzero vector v, DvF(x0) exists and
DvF(x0) = DF(x0)v =

⟨∇F1(x0),v⟩
⟨∇F2(x0),v⟩
...
⟨∇Fm(x0),v⟩
 .
Proof
Again, it is sufficient to consider a function f : O → R with codomain R.
By definition, Dvf(x0) is given by the limit
lim
h→0
f(x0 + hv)− f(x0)
h
if it exists. Since f is differentiable at x0, it has partial derivatives at x0 and
lim
h→0
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩
∥h∥
= 0.
As h→ 0, hv → 0. By limit law for composite functions, we find that
lim
h→0
f(x0 + hv)− f(x0)− ⟨∇f(x0), hv⟩
|h|∥v∥
= 0.
This implies that
lim
h→0
f(x0 + hv)− f(x0)− h⟨∇f(x0),v⟩
h
= 0.
Thus,
Dvf(x0) = lim
h→0
f(x0 + hv)− f(x0)
h
= ⟨∇f(x0),v⟩.
Chapter 4. Differentiating Functions of Several Variables 244
Example 4.20
Consider the function F : R2 → R2 defined as F(x, y) = (x2y, xy2). Find
DvF(2, 3) when v = (−1, 2).
Solution
Since F is a polynomial mapping, it is differentiable. The derivative matrix
is DF(x, y) =
[
2xy x2
y2 2xy
]
. Therefore,
DvF(2, 3) = DF(2, 3)
[
−1
2
]
=
[
12 4
9 12
][
−1
2
]
=
[
−4
15
]
.
Theorem 4.13 can be used to determine the direction which a differentiable
function increase fastest at a point.
Corollary 4.14
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. If f is differentiable at x0 and ∇f(x0) ̸= 0,
then at the point x0, the function f increases fastest in the direction of
∇f(x0).
Proof
Let u be a unit vector. Then the rate of change of the function f at the point
x0 in the direction of u is given by
Duf(x0) = ⟨∇f(x0),u⟩.
By Cauchy-Schwarz inequality,
⟨∇f(x0),u⟩ ≤ ∥∇f(x0)∥∥u∥ = ∥∇f(x0)∥,
and the equality holds if and only if u has the same direction as ∇f(x0).
Chapter 4. Differentiating Functions of Several Variables 245
Exercises 4.2
Question 1
Let f : R3 → R be the function defined as
f(x, y, z) = xey
2+4z.
Find a vector c in R3 and a constant b such that
lim
h→0
f(x0 + h)− ⟨c,h⟩ − b
∥h∥
= 0,
where x0 = (3, 2,−1).
Question 2
Let F : R2 → R3 be the function defined as
F(x, y) = (x2 + 4y2, 7xy, 2x+ y).
Find a polynomial mapping G : R2 → R3 of degree at most one which is a
first order approximation of F : R2 → R3 at the point (1,−1).
Question 3
Let x0 = (1, 2, 0,−1), and let F : R4 → R3 be the function defined as
F(x1, x2, x3, x4) =
(
x2x
2
3, x3x
3
4 + x2, x4 + 2x1 + 1
)
.
Find a 3× 4 matrix A and a vector b in R3 such that
lim
x→x0
F(x)− Ax− b
∥x− x0∥
= 0.
Chapter 4. Differentiating Functions of Several Variables 246
Question 4
Let f : R2 → R be the function defined as
f(x, y) = sin(x2 + y) + 5xy2.
Find Dvf(1,−1) for any nonzero vector v = (v1, v2).
Question 5
Let f : R2 → R be the function defined as
f(x, y) =

x2y2
x2 + y2
, if (x, y) ̸= (0, 0)
0, if (x, y) = (0, 0).
Show that f : R2 → R is continuously differentiable.
Question 6
Find the equation of the tangent plane to the graph of the functionf : R2 →
R, f(x, y) = 4x2 + 3xy − y2 at the point where (x, y) = (2,−1).
Question 7
Let f : R2 → R be the function defined as
f(x, y) =

x2y
x2 + y2
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
(a) Show that f : R2 → R is continuous.
(b) Show that f : R2 → R has partial derivatives.
(c) Show that f : R2 \ {(0, 0)} → R is differentiable.
(d) Show that f : R2 → R is not differentiable at (0, 0).
(e) Find all the nonzero vectors v = (v1, v2) for which Dvf(0, 0) exists.
Chapter 4. Differentiating Functions of Several Variables 247
Question 8
Let f : R2 → R be the function defined as
f(x, y) =

|x|
√
x2 + y2
y
, if y ̸= 0,
0, if y = 0.
(a) Show that f : R2 → R is not continuous at (0, 0).
(b) Show that Dvf(0, 0) exists for all nonzero vectors v.
Question 9
Let f : R2 → R be the function defined as
f(x, y) =

(x2 + y2) sin
(
1√
x2 + y2
)
, if (x, y) ̸= (0, 0),
0, if (x, y) = (0, 0).
(a) Show that f : R2 → R is differentiable at (0, 0).
(b) Show that f : R2 → R is not continuously differentiable at (0, 0).
Chapter 4. Differentiating Functions of Several Variables 248
4.3 The Chain Rule and the Mean Value Theorem
In volume I, we have seen that the chain rule plays an important role in calculating
the derivative of a composite function. Given that f : (a, b) → R and g :
(c, d) → R are functions such that f((a, b)) ⊂ (c, d), the chain rule says that
if f is differentiable at x0, g is differentiable at y0 = f(x0), then the composite
function (g ◦ f) : (a, b) → R is differentiable at x0, and
(g ◦ f)′(x0) = g′(f(x0))f
′(x0).
For multivariable functions, the chain rule takes the following form.
Theorem 4.15 The Chain Rule
Let O be an open subset of Rn, and let U be an open subset of Rk. Assume
that F : O → Rk and G : U → Rm are functions such that F(O) ⊂ U .
If F is differentiable at x0, G is differentiable at y0 = F(x0), then the
composite function H = (G ◦ F) : O → Rm is differentiable at x0 and
DH(x0) = D(G ◦ F)(x0) = DG(F(x0))DF(x0).
Notice that on the right hand side, DG(F(x0)) is an m× k matrix, DF(x0) is
an k × n matrix. Hence, the product DG(F(x0))DF(x0) makes sense, and it is
an m× n matrix, which is the correct size for the derivative matrix DH(x0).
Let us spell out more explicitly. Assume that
F(x1, x2, . . . , xn)
= (F1(x1, x2, . . . , xn), F2(x1, x2, . . . , xn), . . . , Fk(x1, x2, . . . , xn)),
G(y1, y2, . . . , yk)
= (G1(y1, y2, . . . , yk), G2(y1, y2, . . . , yk), . . . , Gm(y1, y2, . . . , yk)),
H(x1, x2, . . . , xn)
= (H1(x1, x2, . . . , xn), H2(x1, x2, . . . , xn), . . . , Hm(x1, x2, . . . , xn)).
Then for 1 ≤ j ≤ m,
Hj(x1, x2, . . . , xn)
= Gj (F1(x1, x2, . . . , xn), F2(x1, x2, . . . , xn), . . . , Fk(x1, x2, . . . , xn)) .
Chapter 4. Differentiating Functions of Several Variables 249
For 1 ≤ l ≤ k, let
yl = Fl (x1, x2, . . . , xn) .
The chain rule says that if 1 ≤ q ≤ n,
∂Hj
∂xq
(x1, x2, . . . , xn) =
k∑
l=1
∂Gj
∂yl
(y1, y2, . . . , yk)
∂Fl
∂xq
(x1, x2, . . . , xn)
=
∂Gj
∂y1
(y1, y2, . . . , yk)
∂F1
∂xq
(x1, x2, . . . , xn)
+
∂Gj
∂y2
(y1, y2, . . . , yk)
∂F2
∂xq
(x1, x2, . . . , xn)
...
+
∂Gj
∂yk
(y1, y2, . . . , yk)
∂Fk
∂xq
(x1, x2, . . . , xn).
Namely, to differentiate Hj = Gj ◦ F with respect to xq, we differentiate Gj with
respect to each of the variables y1, . . . , yk, multiply each by the partial derivatives
of F1, . . . , Fk with respect to xq, then take the sum.
Let us illustrate this with a simple example.
Example 4.21
Consider the function h : R2 → R defined as
h(x, y) = sin(2x+ 3y) + exy.
It is straightforward to find that
∂h
∂x
= 2 cos(2x+ 3y) + yexy,
∂h
∂y
= 3 cos(2x+ 3y) + xexy.
Notice that we can write h = g ◦ F, where F : R2 → R2 is the function
F(x, y) = (2x+ 3y, xy),
and g : R2 → R is the function
g(u, v) = sinu+ ev.
Chapter 4. Differentiating Functions of Several Variables 250
Obviously, F and g are continuously differentiable functions.
DF(x, y) =
[
2 3
y x
]
, Dg(u, v) =
[
cosu ev
]
.
Taking u = 2x+ 3y and v = xy, we find that
Dg(u, v)DF(x, y) =
[
cos(2x+ 3y) exy
] [2 3
y x
]
=
[
2 cos(2x+ 3y) + yexy 3 cos(2x+ 3y) + xexy
]
= Dh(x, y).
Now let us prove the chain rule.
Proof of the Chain Rule
Since F is differentiable at x0 and G is differentiable at y0 = F(x0),
DF(x0) and DG(y0) exist. There exists positive numbers r1 and r2 such
that B(x0, r1) ⊂ O and B(y0, r2) ⊂ U . Let
ε1(h) =
F(x0 + h)− F(x0)−DF(x0)h
∥h∥
, h ∈ B(0, r1),
ε2(v) =
G(y0 + v)−G(y0)−DG(y0)v
∥v∥
, v ∈ B(0, r2).
Since F is differentiable at x0 and G is differentiable at y0,
lim
h→0
ε1(h) = 0, lim
v→0
ε2(v) = 0.
There exist positive constants c1 and c2 such that
∥DF(x0)h∥ ≤ c1∥h∥ for all h ∈ Rn,
∥DG(y0)v∥ ≤ c2∥v∥ for all v ∈ Rk.
Now since F is differentiable at x0, it is continuous at x0. Hence, there
exists a positive number r such that r ≤ r1 and F(B(x0, r)) ⊂ B(y0, r2).
Chapter 4. Differentiating Functions of Several Variables 251
For h ∈ B(0, r), let
v = F(x0 + h)− F(x0).
Then v ∈ B(0, r2) and
v = DF(x0)h+ ∥h∥ε1(h).
It follows that
∥v∥ ≤ ∥DF(x0)h∥+ ∥h∥∥ε1(h)∥ ≤ ∥h∥ (c1 + ∥ε1(h)∥) .
In particular, we find that when h → 0, v → 0. Now,
H(x0 + h)−H(x0)
= G(F(x0 + h))−G(F(x0))
= G(y0 + v)−G(y0)
= DG(y0)v + ∥v∥ε2(v)
= DG(y0)DF(x0)h+ ∥h∥DG(y0)ε1(h) + ∥v∥ε2(v).
Therefore, for h ∈ B(0, r) \ {0},
H(x0 + h)−H(x0)−DG(y0)DF(x0)h
∥h∥
= DG(y0)ε1(h) +
∥v∥
∥h∥
ε2(v).
This implies that∥∥∥∥H(x0 + h)−H(x0)−DG(y0)DF(x0)h
∥h∥
∥∥∥∥
≤ ∥DG(y0)ε1(h)∥+
∥v∥
∥h∥
∥ε2(v)∥
≤ c2∥ε1(h)∥+ (c1 + ∥ε1(h)∥) ∥ε2(v)∥.
Since v → 0 when h → 0, we find that ε2(v) → 0 when h → 0. Thus,
we find that
lim
h→0
H(x0 + h)−H(x0)−DG(y0)DF(x0)h
∥h∥
= 0.
Chapter 4. Differentiating Functions of Several Variables 252
This concludes that H is differentiable at x0 and
DH(x0) = DG(y0)DF(x0).
Example 4.22
Let F : R3 → R2 be the function defined as
F(x, y, z) = (x2 + 4y2 + 9z2, xyz).
Find a vector b in R2 and a 2× 3 matrix A such that
lim
(u,v,w)→(1,−1,0)
F(2u+ v, v + w, u+ w)− b− Ap√
(u− 1)2 + (v + 1)2 + w2
= 0, where p =
uv
w
 .
Solution
Let p0 = (1,−1, 0), and let G : R3 → R3 be the mapping
G(u, v, w) = (2u+ v, v + w, u+ w).
Then H(p) = H(u, v, w) = F(2u+ v, v + w, u+ w) = (F ◦G)(u, v, w).
Notice that F and G are polynomial mappings. Hence, they are infinitely
differentiable. To have
lim
p→p0
H(v)− b− Ap
∥p− p0∥
= lim
(u,v,w)→(1,−1,0)
F(2u+ v, v + w, u+ w)− b− Ap√
(u− 1)2 + (v + 1)2 + w2
= 0,
the first order approximation theorem says that
b+ Ap = H(p0) +DH(p0) (p− p0) .
Therefore,
A = DH(p0) and b = H(p0)− Ap0.
Chapter 4. Differentiating Functions of Several Variables 253
Notice that G(p0) = G(1,−1, 0) = (1,−1, 1),
H(p0) = H(1,−1, 0) = F(1,−1, 1) = (14,−1),
DG(u, v, w) =
2 1 0
0 1 1
1 0 1
 , DF(x, y, z) =
[
2x 8y 18z
yz xz xy
]
.
By chain rule,
A = DF(1,−1, 1)DG(1,−1, 0)
=
[
2 −8 18
−1 1 −1
]2 1 0
0 1 1
1 0 1
 =
[
22 −6 10
−3 0 0
]
.
It follows that
b =
[
14
−1
]
−
[
22 −6 10
−3 0 0
] 1
−1
0
 =
[
−14
2
]
.
Example 4.23
Let α be a positive number, and let f : Rn → R be the function defined as
f(x) = ∥x∥α.
Find the values of α so that f is differentiable.
Solution
Let g : Rn → R be the function
g(x) = ∥x∥2 = x21 + x22 + · · ·+ x2n.
Then g(Rn) = [0,∞), and g(x) = 0 if and only if x = 0.
Chapter 4. Differentiating Functions of Several Variables 254
Since g is a polynomial, it is infinitely differentiable. Let h : [0,∞) → R
be the function h(u) = uα/2. Then h is differentiable on (0,∞). Since
f(x) = (h ◦ g)(x), chain rule implies that for all x0 ∈ Rn \ {0}, f is
differentiable at x0.
Now consider the point x = 0. Notice that for 1 ≤ i ≤ n, fxi
(0) exists
provided that the limit
lim
h→0
f(hei)− f(0)
h
= lim
h→0
|h|α
h
exists. This is the case if α > 1. Therefore, f is not differentiable at x = 0
if α ≤ 1. If α > 1, we find that fxi
(0) = 0 for all 1 ≤ i ≤ n. Hence,
∇f(0) = 0. Since
lim
h→0
f(h)− f(0)− ⟨∇f(0),h⟩
∥h∥
= lim
h→0
∥h∥α−1 = 0,
we conclude that when α > 1, f is differentiableat x = 0.
Therefore, f is differentiable if and only if α > 1.
Example 4.24
Let f : R2 → R be a twice continuously differentiable function, and let
g : R2 → R be the function defined as
g(r, θ) = f(r cos θ, r sin θ).
Show that
∂2g
∂r2
+
1
r
∂g
∂r
+
1
r2
∂2g
∂θ2
=
∂2f
∂x2
+
∂2f
∂y2
.
Solution
Let H : R2 → R2 be the mapping defined by
H(r, θ) = (r cos θ, r sin θ).
Chapter 4. Differentiating Functions of Several Variables 255
Then H is infinitely differentiable, and g = f ◦ H. Let x = H1(r, θ) =
r cos θ and y = H2(r, θ) = r sin θ. By chain rule,
∂g
∂r
=
∂f
∂x
∂x
∂r
+
∂f
∂y
∂y
∂r
= cos θ
∂f
∂x
+ sin θ
∂f
∂y
,
∂g
∂θ
=
∂f
∂x
∂x
∂θ
+
∂f
∂y
∂y
∂θ
= −r sin θ∂f
∂x
+ r cos θ
∂f
∂y
.
Using product rule and chain rule, we then have
∂2g
∂r2
= cos θ
(
∂2f
∂x2
∂x
∂r
+
∂2f
∂y∂x
∂y
∂r
)
+ sin θ
(
∂2f
∂x∂y
∂x
∂r
+
∂2f
∂y2
∂y
∂r
)
.
Since f has continuous second order partial derivatives, fxy = fyx.
Therefore,
∂2g
∂r2
= cos2 θ
∂2f
∂x2
+ 2 sin θ cos θ
∂2f
∂x∂y
+ sin2 θ
∂2f
∂y2
.
Similarly, we have
∂2g
∂θ2
= −r sin θ
(
∂2f
∂x2
∂x
∂θ
+
∂2f
∂y∂x
∂y
∂θ
)
+ r cos θ
(
∂2f
∂x∂y
∂x
∂θ
+
∂2f
∂y2
∂y
∂θ
)
− r cos θ
∂f
∂x
− r sin θ
∂f
∂y
= r2 sin2 θ
∂2f
∂x2
− 2r2 sin θ cos θ
∂2f
∂x∂y
+ r2 cos2 θ
∂2f
∂y2
− r
∂g
∂r
.
From these, we obtain
∂2g
∂r2
+
1
r
∂g
∂r
+
1
r2
∂2g
∂θ2
=
∂2f
∂x2
+
∂2f
∂y2
.
Example 4.24 gives the Laplacian
∆f =
∂2f
∂x2
+
∂2f
∂y2
of f in polar coordinates. It is customary that one would abuse notation and write
g = f , so that the formula takes the form
∂2f
∂x2
+
∂2f
∂y2
=
∂2f
∂r2
+
1
r
∂f
∂r
+
1
r2
∂2f
∂θ2
.
Chapter 4. Differentiating Functions of Several Variables 256
Remark 4.13
We can use the chain rule to prove Theorem 4.13. Given that O is an open
subset of Rn that contains the point x0, and F : O → Rm is a function
that is differentiable at x0, we want to show that DvF(x0) exists for any
nonzero vector v, and
DvF(x0) = DF(x0)v.
Since O is an open set that contains the point x0, there is an r > 0 such that
B(x0, r) ⊂ O. By definition,
DvF(x0) = lim
h→0
F(x0 + hv)− F(x0)
h
= g′(0),
where g : (−r, r) → Rm is the function g(h) = F(x0 + hv). Let γ :
(−r, r) → Rn be the function defined as γ(h) = x0 + hv. Then γ is a
differentiable function with γ ′(h) = v. Since g = F ◦ γ, and γ(0) = x0,
the chain rule implies that g is differentiable at h = 0 and
g′(0) = DF(x0)γ
′(0) = DF(x0)v.
This completes the proof.
Definition 4.16 Tangent Line to a Curve
A curve in Rn is a continuous function γ : [a, b] → Rn. Let c0 be a point in
(a, b). If the curve γ is differentiable at c0, the tangent vector to the curve
γ at the point γ(c0) is the vector γ ′(c0) in Rn, while the tangent line to the
curve γ at the point γ(c0) is the line in Rn given by x : R → Rn,
x(t) = γ(c0) + tγ ′(c0).
Remark 4.14 Tangent Lines and Tangent Planes
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function that is differentiable at x0. We have seen that the tangent plane
to the graph of f at the point (x0, f(x0)) has equation
Chapter 4. Differentiating Functions of Several Variables 257
xn+1 = f(x0) + ⟨∇f(x0),x− x0⟩.
Now assume that r > 0 and γ : (−r, r) → Rn+1 is a differentiable curve
in Rn+1 that lies on the graph of f , and γ(0) = (x0, f(x0)). For all t ∈
(−r, r),
γn+1(t) = f(γ1(t), . . . , γn(t)).
By chain rule, we find that
γ′n+1(0) = ⟨∇f(x0),v⟩, where v = (γ′1(0), . . . , γ
′
n(0)).
The vector w = (v, γ′n+1(0)) is the tangent vector to the curve γ at the
point (x0, f(x0)). The equation of the tangent line is
(x1(t), . . . , xn(t), xn+1(t)) = (x0, f(x0)) + t(γ′1(0), . . . , γ
′
n(0), γ
′
n+1(0)).
Thus, we find that
(x1(t), . . . , xn(t)) = x(t) = x0 + tv,
and
xn+1(t) = f(x0) + tγ′n+1(0).
These imply that
xn+1(t) = f(x0) + t⟨∇f(x0),v⟩
= f(x0) + ⟨∇f(x0),x(t)− x0⟩.
Thus, the tangent line to the curve γ lies in the tangent plane.
In fact, the tangent plane to the graph of a function f at a point can be
characterized as the unique plane that contains all the tangent lines to the
differentiable curves that lie on the graph and passing through that point.
Now we turn to the mean value theorem. For a single variable function, the
mean value theorem says that given that f : I → R is a differentiable function
defined on the open interval I , if x0 and x0 + h are two points in I , there exists
Chapter 4. Differentiating Functions of Several Variables 258
c ∈ (0, 1) such that
f(x0 + h)− f(x0) = hf ′(x0 + ch).
Notice that the point x0 + ch is a point strictly in between x0 and x0 + h. To
generalize this theorem to multivariable functions, one natural question to ask is
the following. If F : O → Rm is a differentiable function defined on the open
subset O of Rn, x0 and x0 +h are points in O such that the line segment between
them lies entirely in O, does there exist a constant c ∈ (0, 1) such that
F(x0 + h)− F(x0) = DF(x0 + ch)h?
When m ≥ 2, the answer is no in general. Let us look at the following example.
Example 4.25
Consider the function F : R2 → R2 defined as
F(x, y) = (x2y, xy).
Show that there does not exist a contant c ∈ (0, 1) such that
F(x0 + h)− F(x0) = DF(x0 + ch)h,
when x0 = (0, 0) and h = (1, 1).
Solution
Notice that
DF(x, y) =
[
2xy x2
y x
]
.
When x0 = (0, 0) and h = (1, 1), x0+ch = (c, c). If there exists a constant
c ∈ (0, 1) such that
F(x0 + h)− F(x0) = DF(x0 + ch)h,
Chapter 4. Differentiating Functions of Several Variables 259
then [
1
1
]
=
[
2c2 c2
c c
][
1
1
]
.
This gives
3c2 = 1 and 2c = 1.
But 2c = 1 gives c = 1/2. When c = 1/2, 3c2 = 3/4 ̸= 1. Hence, no such
c can exist.
However, when m = 1, we indeed have a mean value theorem.
Theorem 4.16 The Mean Value Theorem
Let O be an open subset of Rn, and let x0 and x0 + h be two points in O
such that the line segment between them lies entirely in O. If f : O → R
is a differentiable function, there exist a constant c ∈ (0, 1) such that
f(x0 + h)− f(x0) = ⟨∇f(x0 + ch),h⟩ =
n∑
i=1
hi
∂f
∂xi
(x0 + ch).
Proof
Define the function γ : [0, 1] → R by γ(t) = x0 + th. Then γ is a
differentiable function with γ′(t) = h. Let g = (f ◦ γ) : [0, 1] → R.
Then
g(t) = (f ◦ γ)(t) = f(x0 + th).
Since f and γ are differentiable, the chain rule implies that g is also
differentiable and
g′(t) = ⟨∇f(x0 + th), γ′(t)⟩ = ⟨∇f(x0 + th),h⟩.
By mean value theorem for single variable functions, we find that there
exists c ∈ (0, 1) such that
g(1)− g(0) = g′(c).
Chapter 4. Differentiating Functions of Several Variables 260
In other words, there exists c ∈ (0, 1) such that
f(x0 + h)− f(x0) = ⟨∇f(x0 + ch),h⟩.
This completes the proof.
As in the single variable case, the mean value theorem has the following
application.
Corollary 4.17
Let O be an open connected subset of Rn, and let f : O → R be a function
defined on O. If f is differentiable and ∇f(x) = 0 for all x ∈ O, then f is
a constant function.
Proof
If u and v are two points in O such that the line segment between them lies
entirely in O, then the mean value theorem implies that f(u) = f(v).
Since O is an open connected subset of Rn, Theorem 3.16 says that any
two points u and v in O can be joined by a polygonal path in O. In other
words, there are points x0,x1, . . . ,xk in O such that x0 = u, xk = v, and
for 1 ≤ i ≤ k, the line segment between xi−1 and xi lies entirely in O.
Therefore,
f(xi−1) = f(xi) for all 1 ≤ i ≤ k.
This proves that f(u) = f(v). Hence, f is a constant function.
Chapter 4. Differentiating Functions of Several Variables 261
Exercises 4.3
Question 1
Let F : R2 → R3 be the function defined as
F(x, y) = (x2 + y2, xy, x+ y).
Find a vector b in R3 and a 3× 2 matrix A such that
lim
(u,v)→(1,−1)
F(5u+ 3v, u− 2v)− b− Aw√
(u− 1)2 + (v + 1)2
= 0, where w =
[
u
v
]
.
Question 2
Let ϕ : R → R and ψ : R → R be functions that have continuous second
order derivatives, and let c be a constant. Define the function f : R2 → R
by
f(t, x) = ϕ(x+ ct) + ψ(x− ct).
Show that
∂2f
∂t2
− c2
∂2f
∂x2= 0.
Question 3
Let α be a constant, and let f : Rn \ {0} → R be the function defined by
f(x) = ∥x∥α.
Find the value(s) of α such that
∆f(x) =
n∑
i=1
∂2f
∂x2i
(x) =
∂2f
∂x21
(x) +
∂2f
∂x22
(x) + · · ·+ ∂2f
∂x2n
(x) = 0.
Chapter 4. Differentiating Functions of Several Variables 262
Question 4
Let f : R2 → R be a function such that f(0, 0) = 2 and
∂f
∂x
(x, y) = 11 and
∂f
∂y
= −7 for all (x, y) ∈ R2.
Show that
f(x, y) = 2 + 11x− 7y for all (x, y) ∈ R2.
Question 5
Let O be an open subset of R2, and let u : O → R and v : O → R be twice
continuously differentiable functions. Define the function F : O → R2 by
F(x, y) = (u(x, y), v(x, y)).
Let U be an open subset of R2 that contains F(O), and let f : U → R be a
twice continuously differentiable function. Define the function g : O → R
by
g(x, y) = (f ◦ F)(x, y) = f(u(x, y), v(x, y)).
Find gxx, gxy and gyy in terms of the first and second order partial derivatives
of u, v and f .
Chapter 4. Differentiating Functions of Several Variables 263
4.4 Second Order Approximations
In this section, we turn to consider second order approximations. We only consider
a function f : O → R defined on an open subset O of Rn and whose codomain
is R. The function is said to be twice differentiable if it has first order partial
derivatives, and each fxi
: O → R, 1 ≤ i ≤ n, is a differentiable function. Notice
that a twice differentiable function has continuous first order partial derivatives.
Hence, it is differentiable. The differentiability of each fxi
, 1 ≤ i ≤ n also implies
that f has second order partial derivatives.
Lemma 4.18
Let O be an open subset of Rn, and let f : O → R be a twice differentiable
function defined on O. If x0 and x0 + h are two points in O such that the
line segment between them lies entirely in O, then there is a c ∈ (0, 1) such
that
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ =
1
2
hTHf (x0 + ch)h
=
1
2
n∑
i=1
n∑
j=1
hihj
∂2f
∂xj∂xi
(x0 + ch).
Proof
Given x0 ∈ O, let r be a positive number such that B(x0, r) ⊂ O. Define
the function g : (−r, r) → R by
g(t) = f(x0 + th).
Since f : O → R is differentiable, chain rule implies that g : (−r, r) → R
is differentiable and
g′(t) =
n∑
i=1
hi
∂f
∂xi
(x0 + th) = ⟨∇f(x0 + th),h⟩.
Since each fxi
: O → R, 1 ≤ i ≤ n is differentiable, chain rule again
implies that g′ is differentiable and
Chapter 4. Differentiating Functions of Several Variables 264
g′′(t) =
n∑
i=1
n∑
j=1
hihj
∂f
∂xj∂xi
(x0 + th) = hTHf (x0 + th)h.
By Lagrange’s remainder theorem, there is a c ∈ (0, 1) such that
g(1)− g(0)− g′(0)(1− 0) =
g′′(c)
2
(1− 0)2.
This gives
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ =
1
2
n∑
i=1
n∑
j=1
hihj
∂2f
∂xj∂xi
(x0 + ch).
If a function has continuous second order partial derivatives, then it is twice
differentiable, and Clairaut’s theorem implies that its Hessian matrix is symmetric.
For such a function, we can prove the second order approximation theorem.
Theorem 4.19 Second Order Approximation Theorem
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a twice continuously differentiable function defined on O. We have the
followings.
(a) lim
h→0
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ −
1
2
hTHf (x0)h
∥h∥2
= 0.
(b) If Q(x) is a polynomial of degree at most two such that
lim
h→0
f(x0 + h)−Q(x0 + h)
∥h∥2
= 0,
then
Q(x) = f(x0)+⟨∇f(x0),x−x0⟩+
1
2
(x−x0)
THf (x0)(x−x0). (4.8)
Combining (a) and (b), the second order approximation theorem says that for
a twice continuously differentiable function, there exists a unique polynomial of
degree at most 2 which is a second order approximation of the function.
Chapter 4. Differentiating Functions of Several Variables 265
Proof
Let us prove part (a) first. Since O is open, there is an r > 0 such that
B(x0, r) ⊂ O. For each h in Rn with ∥h∥ < r, Lemma 4.18 says that there
is a ch ∈ (0, 1) such that
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ =
1
2
hTHf (x0 + ch)h.
Therefore, if 0 < ∥h∥ < r,∣∣∣∣∣∣∣
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ −
1
2
hTHf (x0)h
∥h∥2
∣∣∣∣∣∣∣
=
1
2
∣∣∣∣∣
n∑
i=1
n∑
j=1
hihj
∥h∥2
(
∂2f
∂xj∂xi
(x0 + chh)−
∂2f
∂xj∂xi
(x0)
)∣∣∣∣∣
≤ 1
2
n∑
i=1
n∑
j=1
|hi||hj|
∥h∥2
∣∣∣∣ ∂2f
∂xj∂xi
(x0 + chh)−
∂2f
∂xj∂xi
(x0)
∣∣∣∣
≤ 1
2
n∑
i=1
n∑
j=1
∣∣∣∣ ∂2f
∂xj∂xi
(x0 + chh)−
∂2f
∂xj∂xi
(x0)
∣∣∣∣ .
Since ch ∈ (0, 1), lim
h→0
(x0 + chh) = x0. For all 1 ≤ i ≤ n, 1 ≤ j ≤ n,
fxjxi
is continuous. Hence,
lim
h→0
∂2f
∂xj∂xi
(x0 + chh) =
∂2f
∂xj∂xi
(x0).
This proves that
lim
h→0
f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ −
1
2
hTHf (x0)h
∥h∥2
= 0.
To prove part (b), let
P (x) = f(x0) + ⟨∇f(x0),x− x0⟩+
1
2
(x− x0)
THf (x0)(x− x0).
Part (a) says that
lim
h→0
f(x0 + h)− P (x0 + h)
∥h∥2
= 0. (4.9)
Chapter 4. Differentiating Functions of Several Variables 266
Since Q(x) is a polynomial of degree at most two in x, Q(x0 + h) is a
polynomial of degree at most two in h. Therefore, we can write Q(x0 +h)
as
Q(x0 + h) = c+
n∑
i=1
bihi +
1
2
n∑
i=1
aiih
2
i +
∑
1≤i<j≤n
aijhihj.
Since
lim
h→0
f(x0 + h)−Q(x0 + h)
∥h∥2
= 0,
subtracting (4.9) gives
lim
h→0
P (x0 + h)−Q(x0 + h)
∥h∥2
= 0. (4.10)
It follows that
lim
h→0
(P (x0 + h)−Q(x0 + h)) = 0, (4.11)
and
lim
h→0
P (x0 + h)−Q(x0 + h)
∥h∥
= 0. (4.12)
Since f has continuous second order partial derivatives, fxjxi
(x0) =
fxixj
(x0). Thus,
P (x0 + h)−Q(x0 + h)
= (f(x0)− c) +
n∑
i=1
hi
(
∂f
∂xi
(x0)− bi
)
+
1
2
n∑
i=1
h2i
(
∂2f
∂x2i
(x0)− aii
)
+
∑
1≤i<j≤n
hihj
(
∂2f
∂xj∂xi
(x0)− aij
)
.
Eq. (4.11) implies that c = f(x0). Then eq. (4.12) implies that
bi =
∂f
∂xi
(x0) for all 1 ≤ i ≤ n.
Finally, (4.10) implies that for any 1 ≤ i ≤ j ≤ n,
aij =
∂2f
∂xi∂xj
(x0).
This completes the proof that Q(x) = P (x).
Chapter 4. Differentiating Functions of Several Variables 267
Example 4.26
Find a polynomial Q(x, y) of degree at most 2 such that
lim
(x,y)→(1,2)
sin(4x2 − y2)−Q(x, y)
(x− 1)2 + (y − 2)2
= 0.
Solution
Since g(x, y) = 4x2 − y2 is a polynomial function, it is infinitely
differentiable. Since the sine function is also infinitely differentiable, the
function f(x, y) = sin(4x2 − y2) is infinitely differentiable.
fx(x, y) = 8x cos(4x2 − y2), fy(x, y) = −2y cos(4x2 − y2),
fxx(x, y) = 8 cos(4x2 − y2)− 64x2 sin(4x2 − y2),
fxy(x, y) = fyx(x, y) = 16xy sin(4x2 − y2),
fyy(x, y) = −2 cos(4x2 − y2)− 4y2 sin(4x2 − y2).
Hence,
f(1, 2) = 0, fx(1, 2) = 8, fy(1, 2) = −4,
fxx(1, 2) = 8, fxy(1, 2) = 0, fyy(1, 2) = −2.
By the second order approximation theorem,
Q(x, y) = f(1, 2) + fx(1, 2)(x− 1) + fy(1, 2)(y − 2) +
1
2
fxx(1, 2)(x− 1)2
+ fxy(1, 2)(x− 1)(y − 2) +
1
2
fyy(1, 2)(y − 2)2
= 8(x− 1)− 4(y − 2) + 4(x− 1)2 − (y − 2)2
= 4x2 − y2.
Example 4.27
Determine whether the limit lim
(x,y)→(0,0)
ex+y − 1− x− y
x2 + y2
exists. If yes, find
the limit.
Chapter 4. Differentiating Functions of Several Variables 268
Solution
Since the exponential funtion and the function g(x, y) = x+y are infinitely
differentiable, the function f(x, y) = ex+y is infinitely differentiable. By
the second order approximation theorem,
lim
(x,y)→(0,0)
f(x, y)−Q(x, y)
x2 + y2
= 0,
where
Q(x, y) = f(0, 0) + x
∂f
∂x
(0, 0) + y
∂f
∂y
(0, 0)
+
1
2
x2
∂2f
∂x2
(0, 0) + xy
∂2f
∂x∂y
(0, 0) +
1
2
y2
∂2f
∂y2
(0, 0).
Now
∂f
∂x
(x, y) =
∂f
∂y
(x, y) =
∂2f
∂x2
(x, y) =
∂2f
∂x∂y
(x, y) =
∂2f
∂y2
(x, y) = ex+y.
Thus,
f(0, 0) =
∂f
∂x
(0, 0) =
∂f
∂y
(0, 0) =
∂2f
∂x2
(0, 0) =
∂2f
∂x∂y
(0, 0) =
∂2f
∂y2
(0, 0) = 1.
It follows that
Q(x, y) = 1 + x+ y +
1
2
x2 + xy +
1
2
y2.
Hence,
lim
(x,y)→(0,0)
ex+y − 1− x− y − 1
2
x2 − xy − 1
2
y2
x2 + y2
= 0. (4.13)
If
lim
(x,y)→(0,0)
ex+y − 1− x− y
x2 + y2
= a
exists, subtracting (4.13) shows that
a = lim
(x,y)→(0,0)
h(x, y), where h(x, y) =
1
2
x2 + xy +
1
2
y2
x2 + y2
.
Chapter 4. Differentiating Functions of Several Variables 269
This implies that if {wk} is a sequence in R2 \ {0} that converges to (0, 0),
then the sequence {h(wk)} converges to a. For k ∈ Z+, let
uk =
(
1
k
, 0
)
, vk =
(
1
k
,
1
k
)
.
Then {uk} and {vk} are sequencesin R2 \ {0} that converge to (0, 0).
Hence, the sequences {h(uk)} and {h(vk)} both converge to a. Since
h(uk) =
1
2
, h(vk) = 1 for all k ∈ Z+,
the sequence {h(uk)} converges to 1
2
, while the sequence {h(vk)}
converges to 1. This gives a contradiction. Hence, the limit
lim
(x,y)→(0,0)
ex+y − 1− x− y
x2 + y2
does not exist.
Chapter 4. Differentiating Functions of Several Variables 270
Exercises 4.4
Question 1
Let f : R2 → R be the function
f(x, y) = x2y + 4xy2.
Find a polynomial Q(x, y) of degree at most 2 such that
lim
(x,y)→(1,−1)
f(x, y)−Q(x, y)
(x− 1)2 + (y + 1)2
= 0.
Question 2
Determine whether the limit lim
(x,y)→(0,0)
sin(x+ y)− x− y
x2 + y2
exists. If yes,
find the limit.
Question 3
Determine whether the limit lim
(x,y)→(0,0)
cos(x+ y)− 1
x2 + y2
exists. If yes, find
the limit.
Chapter 4. Differentiating Functions of Several Variables 271
4.5 Local Extrema
In this section, we use differential calculus to study local extrema of a function
f : O → R that is defined on an open subset O of Rn. The definition of local
extrema that we give here is only restricted to such functions.
Definition 4.17 Local Maximum and Local Minimum
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O.
1. The point x0 is called a local maximizer of f provided that there is a
δ > 0 such that B(x0, δ) ⊂ O and for all x ∈ B(x0, δ),
f(x) ≤ f(x0).
The value f(x0) is called a local maximum value of f .
2. The point x0 is called a local minimizer of f provided that there is a
δ > 0 such that B(x0, δ) ⊂ O and for all x ∈ B(x0, δ),
f(x) ≥ f(x0).
The value f(x0) is called a local minimum value of f .
3. The point x0 is called a local extremizer if it is either a local maximizer
or a local minimizer. The value f(x0) is called a local extreme value if
it is either a local maximum value or a local minimum value.
From the definition, it is obvious that x0 is a local minimizer of the function
f : O → R if and only if it is a local maximizer of the function −f : O → R.
Example 4.28
(a) For the function f : R2 → R, f(x, y) = x2 + y2, (0, 0) is a local
minimizer.
(b) For the function g : R2 → R, g(x, y) = −x2 − y2, (0, 0) is a local
maximizer.
Chapter 4. Differentiating Functions of Several Variables 272
(c) For the function h : R2 → R, h(x, y) = x2 − y2, 0 = (0, 0) is neither
a local maximizer nor a local minimizer. For any δ > 0, let r = δ/2.
The points u = (r, 0) and v = (0, r) are in B(0, δ), but
h(u) = r2 > 0 = h(0), h(v) = −r2 < 0 = h(0).
Figure 4.10: The functions f(x, y), g(x, y) and h(x, y) defined in Example 4.28.
The following theorem gives a necessary condition for a point to be a local
extremum if the function has partial derivatives at that point.
Theorem 4.20
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. If x0 is a local extremizer and f has partial
derivatives at x0, then the gradient of f at x0 is the zero vector, namely,
∇f(x0) = 0.
Proof
Without loss of generality, assume that x0 is a local minimizer. Then there
is a δ > 0 such that B(x0, δ) ⊂ O and
f(x) ≥ f(x0) for all x ∈ B(x0, δ). (4.14)
For 1 ≤ i ≤ n, consider the function gi : (−δ, δ) → R defined by gi(t) =
f(x0 + tei). By the definition of partial derivatives, gi is differentiable at
t = 0 and
Chapter 4. Differentiating Functions of Several Variables 273
g′i(0) =
∂f
∂xi
(x0).
Eq. (4.14) implies that
gi(t) ≥ gi(0) for all t ∈ (−δ, δ).
In other words, t = 0 is a local minimizer of the function gi : (−δ, δ) → R.
From the theory of single variable analysis, we must have g′i(0) = 0. Hence,
fxi
(x0) = 0 for all 1 ≤ i ≤ n. This proves that ∇f(x0) = 0.
Theorem 4.20 prompts us to make the following definition.
Definition 4.18 Stationary Points
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. If f has partial derivatives at x0 and ∇f(x0) =
0, we call x0 a stationary point of f .
Theorem 4.20 says that if f : O → R has partial derivatives at x0, a necessary
condition for x0 to be a local extremizer is that it is a stationary point.
Example 4.29
For all the three functions f , g and h defined in Example 4.28, the point
0 = (0, 0) is a stationary point. However, 0 is local minimizer of f , a local
maximizer of g, but neither a local maximizer nor a local minimizer of h.
The behavior of the function h(x, y) = x2 − y2 in Example 4.28 prompts us
to make the following definition.
Chapter 4. Differentiating Functions of Several Variables 274
Definition 4.19 Saddle Points
Let O be an open subset of Rn that contains the point x0, and let f : O → R
be a function defined on O. The point x0 is a saddle point of the function f
if it is a stationary point of f , but it is not a local extremizer. In other words,
∇f(x0) = 0, but for any δ > 0, there exist x1 and x2 in B(x0, δ) ∩O such
that
f(x1) > f(x0) and f(x2) < f(x0).
Example 4.30
(0, 0) is a saddle point of the function h : R2 → R, h(x, y) = x2 − y2.
By definition, if x0 is a stationary point of the function f : O → R, then it is
either a local maximizer, a local minimizer, or a saddle point. If f : O → R has
continuous second order partial derivatives at x0, we can use the second derivative
test to partially determine whether x0 is a local maximizer, a local minimizer, or
a saddle point. When n = 1, we have seen that a stationary point x0 of a function
f is a local minimum if f ′′(x0) > 0. It is a local maximum if f ′′(x0) < 0. For
multivariable functions, it is natural to expect that whether x0 is a local extremizer
depends on the definiteness of the Hessian matrix Hf (x0).
In Section 2.1, we have discussed the classification of a symmetric matrix. It
is either positive semi-definite, negative semi-definite or indefinite. Among the
positive semi-definite ones, there are those that are positive definite. Among the
negative semi-definite matrices, there are those which are negative definite.
Theorem 4.21 Second Derivative Test
Let O be an open subset of Rn, and let f : O → R be a twice continuously
differentiable function defined on O. Assume that x0 is a stationary point
of f : O → R.
(i) If Hf (x0) is positive definite, then x0 is a local minimizer of f .
(ii) If Hf (x0) is negative definite, then x0 is a local maximizer of f .
(iii) If Hf (x0) is indefinite, then x0 is a saddle point.
Chapter 4. Differentiating Functions of Several Variables 275
The cases that are not covered in the second derivative test are the cases where
Hf (x0) is positive semi-definite but not positive definite, or Hf (x0) is negative
semi-definite but not negative definite. These are the inconclusive cases.
Proof of the Second Derivative Test
Notice that (i) and (ii) are equivalent since x0 is a local minimizer of f if
and only if it is a local maximizer of −f , and H−f = −Hf . A symmetric
matrix A is positive definite if and only if −A is negative definite. Thus,
we only need to prove (i) and (iii).
Since x0 is a stationary point, ∇f(x0) = 0. It follows from the second
order approximation theorem that
lim
h→0
f(x0 + h)− f(x0)− 1
2
hTHf (x0)h
∥h∥2
= 0. (4.15)
To prove (i), asume that Hf (x0) is positive definite. By Theorem 2.9, there
is a positive number c such that
hTHf (x0)h ≥ c∥h∥2 for all h ∈ Rn.
Eq. 4.15 implies that there is a δ > 0 such that B(x0, δ) ⊂ O and for all h
with 0 < ∥h∥ < δ,∣∣∣∣f(x0 + h)− f(x0)− 1
2
hTHf (x0)h
∥h∥2
∣∣∣∣ < c
3
.
Therefore,∣∣∣∣f(x0 + h)− f(x0)−
1
2
hTHf (x0)h
∣∣∣∣ ≤ c
3
∥h∥2 for all ∥h∥ < δ.
This implies that for all h with ∥h∥ < δ,
f(x0 + h)− f(x0) ≥
1
2
hTHf (x0)h− c
3
∥h∥2 ≥ c
6
∥h∥2 ≥ 0.
Thus, f(x) ≥ f(x0) for all x ∈ B(x0, δ). This shows that x0 is a local
minimizer of f .
Chapter 4. Differentiating Functions of Several Variables 276
Now to prove (iii), assume that Hf (x0) is indefinite. Then there exist unit
vectors u1 and u2 so that
ε1 = uT
1Hf (x0)u1 < 0, ε2 = uT
2Hf (x0)u2 > 0.
Let ε = 1
2
min{|ε1|, ε2}. Eq. (4.15) implies thatthere is a δ0 > 0 such that
B(x0, δ0) ⊂ O and for all h with 0 < ∥h∥ < δ0,∣∣∣∣f(x0 + h)− f(x0)−
1
2
hTHf (x0)h
∣∣∣∣ < ε∥h∥2. (4.16)
For any δ > 0, let r = 1
2
min{δ, δ0}. Then the points x1 = x0 + ru1 and
x2 = x0 + ru2 are in the ball B(x0, δ) and the ball B(x0, δ0). Eq. (4.16)
implies that for i = 1, 2,
−r2ε ≤ f(x0 + rui)− f(x0)−
r2
2
uT
i Hf (x0)ui < r2ε.
Therefore,
f(x0 + ru1)− f(x0) < r2
(
1
2
uT
1Hf (x0)u1 + ε
)
= r2
(
1
2
ε1 + ε
)
≤ 0
since ε ≤ −1
2
ε1; while
f(x0 + ru2)− f(x0) > r2
(
1
2
uT
2Hf (x0)u2 − ε
)
= r2
(
1
2
ε2 − ε
)
≥ 0
since ε ≤ 1
2
ε2. Thus, x1 and x2 are points in B(x0, δ), but f(x1) < f(x0)
while f(x2) > f(x0). These show that x0 is a saddle point.
A symmetric matrix is positive definite if and only if all its eigenvalues are
positive. It is negative definite if and only if all its eigenvalues are negative. It
is indefinite if it has at least one positive eigenvalue, and at least one negative
eigenvalue. For a diagonal matrix, its eigenvalues are the entries on the diagonal.
Let us revisit Example 4.28.
Chapter 4. Differentiating Functions of Several Variables 277
Example 4.31
For the functions considered in Example 4.28, we have seen that (0, 0) is a
stationary point of each of them. Notice that Hf (0, 0) =
[
2 0
0 2
]
is positive
definite, Hg(0, 0) =
[
−2 0
0 −2
]
is negative definite, Hh(0, 0) =
[
2 0
0 −2
]
is indefinite. Therefore, (0, 0) is a local minimizer of f , a local maximizer
of g, and a saddle point of h.
Now let us look at an example which shows that when the Hessian matrix is
positive semi-definite but not positive definite, we cannot make any conclusion
about the nature of a stationary point.
Example 4.32
Consider the functions f : R2 → R and g : R2 → R given respectively by
f(x, y) = x2 + y4, g(x, y) = x2 − y4.
These are infinitely differentiable functions. It is easy to check that (0, 0) is
a stationary point of both of them. Now,
Hf (0, 0) = Hg(0, 0) =
[
2 0
0 0
]
is a positive semi-definite matrix. However, (0, 0) is a local minimizer of
f , but a saddle point of g.
To determine the definiteness of an n × n symmetric matrix by looking at
the sign of its eigenvalues is ineffective when n ≥ 3. There is an easier way to
determine whether a symmetric matrix is positive definite. Let us first introduce
the definition of principal submatrices.
Chapter 4. Differentiating Functions of Several Variables 278
Definition 4.20 Principal Submatrices
Let A be an n × n matrix. For 1 ≤ k ≤ n, the kth-principal submatrix Mk
of A is the k × k matrix consists of the first k rows and first k columns of
A.
Example 4.33
For the matrix A =
1 2 3
4 5 6
7 8 9
, the first, second and third principal
submatrices are
M1 =
[
1
]
, M2 =
[
1 2
4 5
]
, M3 =
1 2 3
4 5 6
7 8 9

respectively.
Theorem 4.22 Sylvester’s Criterion for Positive Definiteness
An n× n symmetric matrix A is positive definite if and only if detMk > 0
for all 1 ≤ k ≤ n, where Mk is its kth principal submatrix.
The proof of this theorem is given in Appendix A. Using the fact that a symmetric
matrix A is negative definite if and only if −A is positive definite, it is easy to
obtain a criterion for a symmetric matrix to be negative definite in terms of the
determinants of its principal submatrices.
Theorem 4.23 Sylvester’s Criterion for Negative Definiteness
An n × n symmetric matrix A is negative definite if and only if
(−1)k detMk > 0 for all 1 ≤ k ≤ n, where Mk is its kth principal
submatrix.
Chapter 4. Differentiating Functions of Several Variables 279
Example 4.34
Consider the matrix
A =
 1 2 −3
−1 4 2
−3 5 8
 .
Since
detM1 = 1, detM2 = 6, detM3 = detA = 5
are all positive, A is positive definite.
For a function f : O → R defined on an open subset O of R2, we have the
following.
Theorem 4.24
Let O be an open subset of R2. Suppose that (x0, y0) is a stationary point
of the twice continuously differentiable function f : O → R. Let
D(x0, y0) =
∂2f
∂x2
(x0, y0)
∂2f
∂y2
(x0, y0)−
[
∂2f
∂x∂y
(x0, y0)
]2
.
(i) If
∂2f
∂x2
(x0, y0) > 0 and D(x0, y0) > 0, then the point (x0, y0) is a
local minimizer of f .
(ii) If
∂2f
∂x2
(x0, y0) < 0 and D(x0, y0) > 0, then the point (x0, y0) is a
local maximizer of f .
(iii) If D(x0, y0) < 0, the point (x0, y0) is a saddle point of f .
Proof
We notice that
Hf (x0, y0) =

∂2f
∂x2
(x0, y0)
∂2f
∂x∂y
(x0, y0)
∂2f
∂x∂y
(x0, y0)
∂2f
∂y2
(x0, y0)
 .
Chapter 4. Differentiating Functions of Several Variables 280
Hence,
∂2f
∂x2
(x0, y0) is the determinant of the first principal submatrix of
Hf (x0, y0), while D(x0, y0) is the determinant of Hf (x0, y0), the second
principal submatrix of Hf (x0, y0). Thus, (i) and (ii) follow from the
Sylvester criteria as well as the second derivative test.
For (iii), we notice that the 2× 2 matrix Hf (x0, y0) is indefinite if and only
if it has one positive eigenvalue and one negative eigenvalue, if and only if
D(x0, y0) = detHf (x0, y0) < 0.
Now we look at some examples of the applications of the second derivative
test.
Example 4.35
Let f : R2 → R be the function defined as
f(x, y) = x4 + y4 + 4xy.
Find the stationary points of f and classify them.
Solution
Since f is a polynomial function, it is infinitely differentiable.
∇f(x, y) = (4x3 + 4y, 4y3 + 4x).
To find the stationary points, we need to solve the system of equationsx3 + y = 0
y3 + x = 0
.
From the first equation, we have y = −x3. Substitute into the second
equation gives
−x9 + x = 0,
or equivalently,
x(x8 − 1) = 0.
Chapter 4. Differentiating Functions of Several Variables 281
Thus, x = 0 or x = ±1. When x = 0, y = 0. When x = ±1, y = ∓1.
Therefore, the stationary points of f are u1 = (0, 0), u2 = (1,−1) and
u3 = (−1, 1). Now,
Hf (x, y) =
[
12x2 4
4 12y2
]
.
Therefore,
Hf (u1) =
[
0 4
4 0
]
, Hf (u2) = Hf (u3) =
[
12 4
4 12
]
.
It follows that
D(u1) = −16 < 0, D(u2) = D(u3) = 128 > 0.
Since fxx(u2) = fxx(u3) = 12 > 0, we conclude that u1 is a saddle point,
u2 and u3 are local minimizers.
Figure 4.11: The function f(x, y) = x4 + y4 + 4xy.
Chapter 4. Differentiating Functions of Several Variables 282
Example 4.36
Consider the function f : R3 → R defined as
f(x, y, z) = x3 − xy2 + 5x2 − 4xy − 2xz + y2 + 6yz + 37z2.
Show that (0, 0, 0) is a local minimizer of f .
Solution
Since f is a polynomial function, it is infinitely differentiable. Since
∇f(x, y, z) = (3x2−y2+10x−4y−2z,−2xy−4x+2y+6z,−2x+6y+74z),
we find that
∇f(0, 0, 0) = (0, 0, 0).
Hence, (0, 0, 0) is a stationary point.
Now,
Hf (x, y, z) =
6x+ 10 −2y − 4 −2
−2y − 4 −2x+ 2 6
−2 6 74
 .
Therefore,
Hf (0, 0, 0) =
10 −4 −2
−4 2 6
−2 6 74
 .
The determinants of the three principal submatrices of Hf (0, 0, 0) are
detM1 = 10, detM2 =
∣∣∣∣∣10 −4
−4 2
∣∣∣∣∣ = 4,
detM3 =
∣∣∣∣∣∣∣
10 −4 −2
−4 2 6
−2 6 74
∣∣∣∣∣∣∣ = 24.
This shows that Hf (0, 0, 0) is positive definite. Hence, (0, 0, 0) is a local
minimizer of f .
Chapter 4. Differentiating Functions of Several Variables 283
Exercises 4.5
Question 1
Let f : R2 → R be the function defined as
f(x, y) = x2 + 4y2 + 5xy − 8x− 11y + 7.
Find the stationary points of f and classify them.
Question 2
Let f : R2 → R be the function defined as
f(x, y) = x2 + 4y2 + 3xy − 5x− 18y + 1.
Find the stationary points of f and classify them.
Question 3
Let f : R2 → R be the function defined as
f(x, y) = x3 + y3 + 12xy.
Find the stationary points of f and classify them.
Question 4
Consider the function f : R3 → R defined as
f(x, y, z) = z3 − 2z2 − x2 − y2 − xy + x− y.
Show that (1,−1, 0) is a stationary point of f and determine the nature of
this stationary point.
Chapter 4. Differentiating Functions of Several Variables 284
Question 5
Consider the function f : R3 → R defined as
f(x, y, z) = z3 + 2z2 − x2 − y2 − xy + x− y.
Show that (1,−1, 0) is a stationary point of f and determine the nature of
this stationary point.
Chapter 5. The Inverse and Implicit Function Theorems 285
Chapter5
The Inverse and Implicit Function Theorems
In this chapter, we discuss the inverse function theorem and implicit function
theorem, which are two important theorems in multivariable analysis. Given a
function that maps a subset of Rn to Rn, the inverse function theorem gives
sufficient conditions for the existence of a local inverse and its differentiability.
Given a system ofm equations with n+m variables, the implicit function theorem
gives sufficient conditions to solve m of the variables in terms of the other n
variables locally such that the solutions are differentiable functions. We want to
emphasize that these theorems are local, in the sense that each of them asserts the
existence of a function defined in a neighbourhood of a point.
In some sense, the two theorems are equivalent, which means one can deduce
one from the other. In this book, we will prove the inverse function theorem first,
and use it to deduce the implicit function theorem.
5.1 The Inverse Function Theorem
Let D be a subset of Rn. If the function F : D → Rn is one-to-one, we can define
the inverse function F−1 : F(D) → Rn. The question we want to study here is
the following. If D is an open set and F is differentiable at the point x0 in D, is
the inverse function F−1 differentiable at y0 = F(x0)? For this, we also want the
point y0 to be an interior point of F(D). More precisely, is there a neighbourhood
U of x0 that is mapped bijectively by F to a neighbourhood V of y0? If the answer
is yes, and F−1 is differentiable at y0, then the chain rule would imply that
DF−1(y0)DF(x0) = In.
Hence, a necessary condition for F−1 to be differentiable at y0 is that the derivative
matrix DF(x0) has to be invertible.
Chapter 5. The Inverse and Implicit Function Theorems 286
Let us study the map f : R → R given by f(x) = x2. The range of the
function is [0,∞). Notice that if x0 > 0, then I = (0,∞) is a neighbourhood of
x0 that is mapped bijectively by f to the neighbourhood J = (0,∞) of f(x0). If
x0 < 0, then I = (−∞, 0) is a neighbourhood of x0 that is mapped bijectively
by f to the neighbourhood J = (0,∞) of f(x0). However, if x0 = 0, the point
f(x0) = 0 is not an interior point of f(R) = [0,∞). Notice that f ′(x) = 2x.
Therefore, x = 0 is the point which f ′(x) = 0.
If x0 > 0, take I = (0,∞) and J = (0,∞). Then f : I → J has an inverse
given by f−1 : J → I , f−1(x) =
√
x. It is a differentiable function with
(f−1)′(x) =
1
2
√
x
.
In particular, at y0 = f(x0) = x20,
(f−1)′(y0) =
1
2
√
y0
=
1
2x0
=
1
f ′(x0)
.
Similarly, if x0 < 0, take I = (−∞, 0) and J = (0,∞). Then f : I → J has an
inverse given by f−1 : J → I , f−1(x) = −
√
x. It is a differentiable function with
(f−1)′(x) = − 1
2
√
x
.
In particular, at y0 = f(x0) = x20,
(f−1)′(y0) = − 1
2
√
y0
=
1
2x0
=
1
f ′(x0)
.
For a single variable function, the inverse function theorem takes the following
form.
Theorem 5.1 (Single Variable) Inverse Function Theorem
Let O be an open subset of R that contains the point x0, and let f : O →
R be a continuously differentiable function defined on O. Suppose that
f ′(x0) ̸= 0. Then there exists an open interval I containing x0 such that f
maps I bijectively onto the open interval J = f(I). The inverse function
f−1 : J → I is continuously differentiable. For any y ∈ J , if x is the point
in I such that f(x) = y, then
(f−1)′(y) =
1
f ′(x)
.
Chapter 5. The Inverse and Implicit Function Theorems 287
Figure 5.1: The function f : R → R, f(x) = x2.
Proof
Without loss of generality, assume that f ′(x0) > 0. Since O is an open set
and f ′ is continuous at x0, there is an r1 > 0 such that (x0−r1, x0+r1) ⊂ O
and for all x ∈ (x0 − r1, x0 + r1),
|f ′(x)− f ′(x0)| <
f ′(x0)
2
.
This implies that
f ′(x) >
f ′(x0)
2
> 0 for all x ∈ (x0 − r1, x0 + r1).
Therefore, f is strictly increasing on (x0− r1, x0+ r1). Take any r > 0 that
is less that r1. Then [x − r, x + r] ⊂ (x0 − r1, x0 + r1). By intermediate
value theorem, the function f maps [x − r, x + r] bijectively onto [f(x −
r), f(x + r)]. Let I = (x− r, x + r) and J = (f(x− r), f(x + r)). Then
f : I → J is a bijection and f−1 : J → I exists. In volume I, we have
proved that f−1 is differentiable, and
(f−1)′(y) =
1
f ′(f−1(y))
for all y ∈ J.
This formula shows that (f−1)′ : J → R is continuous.
Chapter 5. The Inverse and Implicit Function Theorems 288
Remark 5.1
In the inverse function theorem, we determine the invertibility of the
function in a neighbourhood of a point x0. The theorem says that if f is
continuously differentiable and f ′(x0) ̸= 0, then f is locally invertible at
x0. Here the assumption that f ′ is continuous is essential. In volume I, we
have seen that for a continuous function f : I → R defined on an open
interval I to be one-to-one, it is necessary that it is strictly monotonic. The
function f : R → R,
f(x) =
x+ x2 sin
(
1
x
)
, if x ̸= 0,
0, if x = 0,
is an example of a differentiable function where f ′(0) = 1 ̸= 0, but f fails
to be strictly monotonic in any neighbourhood of the point x = 0.
This annoying behavior can be removed if we assume that f ′ is continuous.
If f ′(x0) ̸= 0 and f ′ is continuous, there is a neighbourhood I of x0 such
that f ′(x) has the same sign as f ′(x0) for all x ∈ I . This implies that f is
strictly monotonic on I .
Example 5.1
Let f : R → R be the function defined as
f(x) = 2x+ 4 cosx.
Show that there is an open interval I containing 0 such that f : I → R is
one-to-one, and f−1 : f(I) → R is continuously differentiable. Determine
(f−1)′(f(0)).
Chapter 5. The Inverse and Implicit Function Theorems 289
Solution
The function f is infinitely differentiable and f ′(x) = 2 − 4 sinx. Since
f ′(0) = 2 ̸= 0, the inverse function theorem says that there is an open
interval I containing 0 such that f : I → R is one-to-one, and f−1 :
f(I) → R is continuously differentiable. Moreover,
(f−1)′(f(0)) =
1
f ′(0)
=
1
2
.
Now let us consider functions defined on open subsets of Rn, where n ≥ 2.
We first consider a linear transformation T : Rn → Rn. There is an n× n matrix
A such that
T(x) = Ax.
The mapping T : Rn → Rn is one-to-one if and only if A is invertible, if and
only if detA ̸= 0. In this case, T is a bijection and T−1 : Rn → Rn is the linear
transformation given by
T−1(x) = A−1x.
Notice that for any x and y in Rn,
DT(x) = A, DT−1(y) = A−1.
The content of the inverse function theorem is to extend this to nonlinear mappings.
Theorem 5.2 Inverse Function Theorem
Let O be an open subset of Rn that contains the point x0, and let F :
O → Rn be a continuously differentiable function defined on O. If
detDF(x0) ̸= 0, then we have the followings.
(i) There exists a neighbourhood U of x0 such that F maps U bijectively
onto the open set V = F(U).
(ii) The inverse function F−1 : V → U is continuously differentiable.
(iii) For any y ∈ V , if x is the point in U such that F(x) = y, then
DF−1(y) = DF(F−1(y))−1 = DF(x)−1.
Chapter 5. The Inverse and Implicit Function Theorems 290
Figure 5.2: The inverse function theorem.
For a linear transformation which is a degree one polynomial mapping, the
inverse function theorem holds globally. For a general continuously differentiable
mapping, the inverse function theorem says that the first order approximation of
the function at a point can determine the local invertibility of the function at that
point.
When n ≥ 2, the proof of the inverse function theorem is substantially more
complicated than the n = 1 case, as we do not have the monotonicity argument
used in the n = 1 case. The proof will be presented in Section 5.2. We will
discuss the examples and applications in this section.
Example 5.2
Let F : R2 → R2 be the mapping defined by
F(x, y) = (3x− 2y + 7, 4x+ 5y − 2).
Show that F is a bijection, and find F−1(x, y) and DF−1(x, y).
Solution
The mapping F : R2 → R2 can be written as F(x) = T(x) + b, where
T : R2 → R2 is the linear transformation
T(x, y) = (3x− 2y, 4x+ 5y),
Chapter 5. The Inverse and Implicit Function Theorems 291
and b = (7,−2).For u = (x, y), T(u) = Au, where A =
[
3 −2
4 5
]
.
Since detA = 23 ̸= 0, the linear transformation T : R2 → R2 is one-
to-one. Hence, F : R2 → R2 is also one-to-one. Given v ∈ R2, let
u = A−1(v − b). Then F(u) = v. Hence, F is also onto. The inverse
F−1 : R2 → R2 is given by
F−1(v) = A−1(v − b).
Since
A−1 =
1
23
[
5 2
−4 3
]
,
we find that
F−1(x, y) =
(
5(x− 7) + 2(y + 2)
23
,
−4(x− 7) + 3(y + 2)
23
)
=
(
5x+ 2y − 31
23
,
−4x+ 3y + 34
23
)
,
and
DF−1(x, y) =
1
23
[
5 2
−4 3
]
.
Example 5.3
Determine the values of a such that the mapping F : R3 → R3 defined by
F(x, y, z) = (2x+ y + az, x− y + 3z, 3x+ 2y + z + 7)
is invertible.
Solution
The mapping F : R3 → R3 can be written as F(x) = T(x) + b, where
T : R3 → R3 is the linear transformation
T(x, y, z) = (2x+ y + az, x− y + 3z, 3x+ 2y + z),
Chapter 5. The Inverse and Implicit Function Theorems 292
and b = (0, 0, 7). Thus, F is a degree one polynomial mapping with
DF(x) =
2 1 a
1 −1 3
3 2 1
 .
The mapping F is invertible if and only if it is one-to-one, if and only if T
is one-to-one, if and only if detDF(x) ̸= 0. Since
detDF(x) = 5a− 6,
the mapping F is invertible if and only if a ̸= 6/5.
Example 5.4
Let Φ : R2 → R2 be the mapping defined as
Φ(r, θ) = (r cos θ, r sin θ).
Determine the points (r, θ) ∈ R2 where the inverse function theorem can
be applied to this mapping. Explain the significance of this result.
Solution
Since sin θ and cos θ are infinitely differentiable functions, the mapping Φ
is infinitely differentiable with
DΦ(r, θ) =
[
cos θ −r sin θ
sin θ r cos θ
]
.
Since
detDΦ(r, θ) = r cos2 θ + r sin2 θ = r,
the inverse function theorem is not applicable at the point (r, θ) if r = 0.
The mapping Φ is a change from polar coordinates to rectangular
coordinates. The result above shows that the change of coordinates is
locally one-to-one away from the origin of the xy-plane.
Chapter 5. The Inverse and Implicit Function Theorems 293
Example 5.5
Consider the mapping F : R2 → R2 given by
F(x, y) = (x2 − y2, xy).
Show that there is a neighbourhood U of the point u0 = (1, 1) such that F :
U → R2 is one-to-one, V = F(U) is an open set, and G = F−1 : V → U
is continuously differentiable. Then find
∂G1
∂y
(0, 1).
Solution
The mapping F is a polynomial mapping. Thus, it is continuously
differentiable. Notice that F(u0) = (0, 1) and
DF(x, y) =
[
2x −2y
y x
]
, DF(u0) =
[
2 −2
1 1
]
.
Since detDF(u0) = 4 ̸= 0, the inverse function theorem implies that there
is a neighbourhood U of the point u0 such that F : U → R2 is one-to-
one, V = F(U) is an open set, and G = F−1 : V → U is continuously
differentiable. Moreover,
DG(0, 1) = DF(1, 1)−1 =
1
4
[
1 2
−1 2
]
.
From here, we find that
∂G1
∂y
(0, 1) =
2
4
=
1
2
.
Example 5.6
Consider the system of equations
sin(x+ y) + x2y + 3xy2 = 2,
2xy + 5x2 − 2y2 = 1.
Chapter 5. The Inverse and Implicit Function Theorems 294
Observe that (x, y) = (1,−1) is a solution of this system. Show that there
is a neighbourhood U of u0 = (1,−1) and an r > 0 such that for all (a, b)
satisfying (a− 2)2 + (b− 1)2 < r2, the system
sin(x+ y) + x2y + 3xy2 = a,
2xy + 5x2 − 2y2 = b
has a unique solution (x, y) that lies in U .
Solution
Let F : R2 → R2 be the function defined by
F(x, y) =
(
sin(x+ y) + x2y + 3xy2, 2xy + 5x2 − 2y2
)
.
Since the sine function is infinitely differentiable, sin(x + y) is infinitely
differentiable. The functions g(x, y) = x2y + 3xy2 and F2(x, y) =
2xy + 5x2 − 2y2 are polynomial functions. Hence, they are also infinitely
differentiable. This shows that F is infinitely differentiable. Since
DF(x, y) =
[
cos(x+ y) + 2xy + 3y2 cos(x+ y) + x2 + 6xy
2y + 10x 2x− 4y
]
,
we find that
DF(1,−1) =
[
2 −4
8 6
]
.
It follows that detDF(1,−1) = 44 ̸= 0.
By the inverse function theorem, there exists a neighbourhood U1 of u0
such that F : U1 → R2 is one-to-one and V = F(U1) is an open set.
Since F(u0) = (2, 1), the point v0 = (2, 1) is a point in the open set
V . Hence, there exists r > 0 such that B(v0, r) ⊂ V . Since B(v0, r)
is open and F is continuous, U = F−1 (B(v0, r)) is an open subset of
R2. The map F : U → B(v0, r) is a bijection. For all (a, b) satisfying
(a − 2)2 + (b − 1)2 < r2, (a, b) is in B(v0, r). Hence, there is a unique
(x, y) in U such that F(x, y) = (a, b). This means that the system
Chapter 5. The Inverse and Implicit Function Theorems 295
sin(x+ y) + x2y + 3xy2 = a,
2xy + 5x2 − 2y2 = b
has a unique solution (x, y) that lies in U .
At the end of this section, let us prove the following theorem.
Theorem 5.3
Let A be an n×n matrix, and let x0 and y0 be two points in Rn. Define the
mapping F : Rn → Rn by
F(x) = y0 + A (x− x0) .
Then F is infinitely differentiable with DF(x) = A. It is one-to-one and
onto if and only if detA ̸= 0. In this case,
F−1(y) = x0 + A−1 (y − y0) , and DF−1(y) = A−1.
In particular, F−1 is also infinitely differentiable.
Proof
Obviously, F is a polynomial mapping. Hence, F is infinitely differentiable.
By a straightforward computation, we find that DF = A.
Notice that F = F2 ◦ T ◦ F1, where F1 : Rn → Rn is the translation
F1(x) = x − x0, T : Rn → Rn is the linear transformation T(x) = Ax,
and F2 : Rn → Rn is the translation F2(y) = y+y0. Since translations are
bijective mappings, F is one-to-one and onto if and only if T : Rn → Rn
is one-to-one and onto, if and only if detA ̸= 0.
If
y = y0 + A (x− x0) ,
then
x = x0 + A−1 (y − y0) .
This gives the formula for F−1(y). The formula for DF−1(y) follows.
Chapter 5. The Inverse and Implicit Function Theorems 296
Exercises 5.1
Question 1
Let f : R → R be the function defined as
f(x) = e2x + 4x sinx+ 2 cosx.
Show that there is an open interval I containing 0 such that f : I → R is
one-to-one, and f−1 : f(I) → R is continuously differentiable. Determine
(f−1)′(f(0)).
Question 2
Let F : R2 → R2 be the mapping defined by
F(x, y) = (3x+ 2y − 5, 7x+ 4y − 3).
Show that F is a bijection, and find F−1(x, y) and DF−1(x, y).
Question 3
Consider the mapping F : R2 → R2 given by
F(x, y) = (x2 + y2, xy).
Show that there is a neighbourhood U of the point u0 = (2, 1) such that F :
U → R2 is one-to-one, V = F(U) is an open set, and G = F−1 : V → U
is continuously differentiable. Then find
∂G2
∂x
(5, 2).
Question 4
Let Φ : R3 → R3 be the mapping defined as
Φ(ρ, ϕ, θ) = (ρ sinϕ cos θ, ρ sinϕ sin θ, ρ cosϕ).
Determine the points (ρ, ϕ, θ) ∈ R3 where the inverse function theorem can
be applied to this mapping. Explain the significance of this result.
Chapter 5. The Inverse and Implicit Function Theorems 297
Question 5
Consider the system of equations
4x+ y − 5xy = 2,
x2 + y2 − 3xy2 = 5.
Observe that (x, y) = (−1, 1) is a solution of this system. Show that there
is a neighbourhood U of u0 = (−1, 1) and an r > 0 such that for all (a, b)
satisfying (a− 2)2 + (b− 5)2 < r2, the system
4x+ y − 5xy = a,
x2 + y2 − 3xy2 = b
has a unique solution (x, y) that lies in U .
Chapter 5. The Inverse and Implicit Function Theorems 298
5.2 The Proof of the Inverse Function Theorem
In this section, we prove the inverse function theorem stated in Theorem 5.2.
The hardest part of the proof is the first statement, which asserts that there is a
neighbourhood U of x0 such that restricted to U , F is one-to-one, and the image
of U under F is open in Rn.
In the statement of the inverse function theorem, we assume that the derivative
matrix of the continuously differentiable mapping F : O → Rn is invertible at the
point x0. The continuities of the partial derivatives of F then implies that there is
a neighbourhood N of x0 such that the derivative matrix of F at any x in N is
also invertible.
Theorem 3.38 asserts that a linear transformation T : Rn → Rn is invertible
if and only if there is a positive constant c such that
∥T(u)−T(v)∥ ≥ c∥u− v∥ for all u,v ∈ Rn.
Definition 5.1 Stable Mappings
A mapping F : D → Rn is stable if there is a positive constant c such that
∥F(u)− F(v)∥ ≥ c∥u− v∥ for all u,v ∈ D.
In otherwords, a linear transformation T : Rn → Rn is invertible if and only
if it is stable.
Remark 5.2 Stable Mappings vs Lipschitz Mappings
Let D be a subset of Rn. Observe that if F : D → Rn is a stable mapping,
there is a constant c > 0 such that
∥F(u1)− F(u2)∥ ≥ c∥u1 − u2∥ for all u1,u2 ∈ D.
This implies that F is one-to-one, and thus the inverse F−1 : F(D) → Rn
exists. Notice that for any v1 and v2 in F(D),
∥F−1(v1)− F−1(v2)∥ ≤ 1
c
∥v1 − v2∥.
This means that F−1 : F(D) → Rn is a Lipschitz mapping.
Chapter 5. The Inverse and Implicit Function Theorems 299
For a mapping F : D → Rn that satisfies the assumptions in the statement of
the inverse function theorem, it is stable in a neighbourhood of x0.
Theorem 5.4
Let O be an open subset of Rn that contains the point x0, and let F :
O → Rn be a continuously differentiable function defined on O. If
detDF(x0) ̸= 0, then there exists a neighbourhood U of x0 such that
DF(x) is invertible for all x ∈ U , F maps U bijectively onto the open set
V = F(U), and the map F : U → V is stable.
Recall that when A is a subset of Rn, u is a point in Rn,
A+ u = {a+ u | a ∈ A}
is the translate of the set A by the vector u. The set A is open if and only if A+u
is open, A is closed if and only if A+ u is closed.
Lemma 5.5
It is sufficient to prove Theorem 5.4 when x0 = 0, F(x0) = 0 and
DF(x0) = In.
Proof of Lemma 5.5
Assume that Theorem 5.4 holds when x0 = 0, F(x0) = 0 and DF(x0) =
In.
Now given that F : O → Rn is a continuously differentiable mapping with
detDF(x0) ̸= 0, let y0 = F(x0) and A = DF(x0). Then A is invertible.
Define the open set D as D = O − x0. It is a neighbourhood of the point
0. Let G : D → Rn be the mapping
G(x) = A−1 (F(x+ x0)− y0) .
Then G(0) = 0. Using the same reasoning as the proof of Theorem 5.3,
we find that G is continuously differentiable and
DG(x) = A−1DF(x+ x0).
Chapter 5. The Inverse and Implicit Function Theorems 300
This gives
DG(0) = A−1DF(x0) = In.
By assumption, Theorem 5.4 holds for the mapping G. Namely, there exist
neighbourhoods U and V of 0 such that G : U → V is a bijection and
DG(x) is invertible for all x ∈ U . Moreover, there is a positive constant a
such that
∥G(u1)−G(u2)∥ ≥ a∥u1 − u2∥ for all u1,u2 ∈ U .
Let U be the neighbourhood of x0 given by U = U + x0. By Theorem 5.3,
the mapping H : Rn → Rn,
H(y) = A−1(y − y0)
is a continuous bijection. Therefore, V = H−1(V) is an open subset of Rn
that contains y0. By definition, F maps U bijectively to V . Since
F(x) = y0 + AG(x− x0),
we find that
DF(x) = A (DG(x− x0)) .
Since A is invertible, DF(x) is invertible for all x ∈ U . Theorem 3.38 says
that there is a positive constant α such that
∥Ax∥ ≥ α∥x∥ for all x ∈ Rn.
Therefore, for any u1 and u2 in U ,
∥F(u1)− F(u2)∥ = ∥A (G(u1 − x0)−G(u2 − x0)) ∥
≥ α∥G(u1 − x0)−G(u2 − x0)∥
≥ aα∥u1 − u2∥.
This shows that F : U → V is stable, and thus completes the proof of the
lemma.
Now we prove Theorem 5.4.
Chapter 5. The Inverse and Implicit Function Theorems 301
Proof of Theorem 5.4
By Lemma 5.5, we only need to consider the case where x0 = 0, F(x0) = 0
and DF(x0) = In.
Since F : O → Rn is continuously differentiable, the map DF : O → Mn
is continuous. Since det : Mn → R is also continuous, and detDF(0) =
1, there is an r0 > 0 such that B(0, r0) ⊂ O and for all x ∈ B(0, r0),
detDF(x) > 1
2
. In particular, DF(x) is invertible for all x ∈ B(0, r0).
Let G : O → Rn be the mapping defined as
G(x) = F(x)− x,
so that F(x) = x+G(x). The mapping G is continuosly differentiable. It
satisfies G(0) = 0 and
DG(0) = DF(0)− In = 0.
Since G is continuously differentiable, for any 1 ≤ i ≤ n, 1 ≤ j ≤ n, there
exists ri,j > 0 such that B(0, ri,j) ⊂ O and for all x ∈ B(0, ri,j),∣∣∣∣∂Gi
∂xj
(x)
∣∣∣∣ = ∣∣∣∣∂Gi
∂xj
(x)− ∂Gi
∂xj
(0)
∣∣∣∣ < 1
2n
.
Let
r = min ({ri,j | 1 ≤ i ≤ n, 1 ≤ j ≤ n} ∪ {r0}) .
Then r > 0,B(0, r) ⊂ B(0, r0) andB(0, r) ⊂ B(0, ri,j) for all 1 ≤ i ≤ n,
1 ≤ j ≤ n. The ball B(0, r) is a convex set. If u and v are two points
in B(0, r), mean value theorem implies that for 1 ≤ i ≤ n, there exists
zi ∈ B(0, r) such that
Gi(u)−Gi(v) =
n∑
j=1
(uj − vj)
∂Gi
∂xj
(zi).
It follows that
|Gi(u)−Gi(v)| ≤
n∑
j=1
|uj − vj|
∣∣∣∣∂Gi
∂xj
(zi)
∣∣∣∣
≤ 1
2n
n∑
j=1
|uj − vj| ≤
1
2
√
n
∥u− v∥.
Chapter 5. The Inverse and Implicit Function Theorems 302
Therefore,
∥G(u)−G(v)∥ =
√√√√ n∑
i=1
(Gi(u)−Gi(v))
2 ≤ 1
2
∥u− v∥.
This shows that G : B(0, r) → Rn is a map satisfying G(0) = 0, and
∥G(u)−G(v)∥ ≤ 1
2
∥u− v∥ for all u,v ∈ B(0, r).
By Theorem 2.44, the map F : B(0, r) → Rn is one-to-one, and its image
contains the open ball B(0, r/2). Let V = B(0, r/2). Then V is an open
subset of Rn that is contained in the image of F. Since F : B(0, r) → Rn
is continuous, U = F|−1
B(0,r)(V ) is an open set. By definition, F : U → V
is a bijection. Since U is contained in B(0, r0), DF(x) is invertible for all
x in U . Finally, for any u and v in U ,
∥F(u)− F(v)∥ ≥ ∥u− v∥ − ∥G(u)−G(v)∥ ≥ 1
2
∥u− v∥.
This completes the proof of the theorem.
To complete the proof of the inverse function theorem, it remains to prove that
F−1 : V → U is continuously differentiable, and
DF−1(y) = DF(F−1(y))−1.
Theorem 5.6
Let O be an open subset of Rn that contains the point x0, and let F :
O → Rn be a continuously differentiable function defined on O. If
detDF(x0) ̸= 0, then there exists a neighbourhood U of x0 such that
F maps U bijectively onto the open set V = F(U), the inverse function
F−1 : V → U is continuously differentiable, and for any y ∈ V , if x is the
point in U such that F(x) = y, then
DF−1(y) = DF(x)−1.
Chapter 5. The Inverse and Implicit Function Theorems 303
Proof
Theorem 5.4 asserts that there exists a neighbourhood U of x0 such that F
maps U bijectively onto the open set V = F(U), DF(x) is invertible for
all x in U , and there is a positive constant c such that
∥F(u1)− F(u2)∥ ≥ c∥u1 − u2∥ for all u1,u2 ∈ U. (5.1)
Now given y in V , we want to show that F−1 is differentiable at y and
DF−1(y) = DF(x)−1, where x = F−1(y). Since V is open, there is an
r > 0 such that B(y, r) ⊂ V . For k ∈ Rn such that ∥k∥ < r, let
h(k) = F−1(y + k)− F−1(y).
Then
F(x) = y and F(x+ h) = y + k.
Eq. (5.1) implies that
∥h∥ ≤ 1
c
∥k∥. (5.2)
Let A = DF(x). By assumption, A is invertible. Notice that
F−1(y + k)− F−1(y)− A−1k = −A−1 (k− Ah)
= −A−1 (F(x+ h)− F(x)− Ah) .
There is a positive constant β such that
∥A−1y∥ ≤ β∥y∥ for all y ∈ Rn.
Therefore, ∥∥∥∥F−1(y + k)− F−1(y)− A−1k
∥k∥
∥∥∥∥
≤ β
∥k∥
∥ (F(x+ h)− F(x)− Ah) ∥
≤ β
c
∥∥∥∥F(x+ h)− F(x)− Ah
∥h∥
∥∥∥∥ .
(5.3)
Since F is differentiable at x,
lim
h→0
F(x+ h)− F(x)− Ah
∥h∥
= 0.
Chapter 5. The Inverse and Implicit Function Theorems 304
Eq. (5.2) implies that lim
k→0
h = 0. Eq. (5.3) then implies that
lim
k→0
F−1(y + k)− F−1(y)− A−1k
∥k∥
= 0.
This proves that F−1 is differentiable at y and
DF−1(y) = A−1 = DF(x)−1.
Now the map DF−1 : V → GL (n,R) is the compositions of the maps
F−1 : V → U , DF : U → GL (n,R) and I : GL (n,R) → GL (n,R)
which takes A to A−1. Since each of these maps is continuous, the map
DF−1 : V → GL (n,R) is continuous. This completes the proof that
F−1 : V → U is continuously differentiable.
At the end of this section, let us give a brief discussion about the concept of
homeomorphism and diffeomorphism.
Definition 5.2 Homeomorphism
Let A be a subset of Rm and let B be a subset of Rn. We say that A
and B are homeomorphic if there exists a continuous bijective function
F : A → B whose inverse F−1 : B → A is also continuous. Such a
function F is called a homeomorphism between A and B.
Definition 5.3 Diffeomorphism
Let O and U be open subsets of Rn. We say that U and O are diffeomorphic
if there exists a homeomorphism F : O → U between O and U such that F
and F−1 are differentiable.
Example 5.7
Let A = {(x, y) |x2 + y2 < 1} and B = {(x, y) | 4x2 + 9y2 < 36}. Define
the map F : R2 → R2 by
F(x, y) = (3x, 2y).
Chapter 5. The Inverse and Implicit Function Theorems 305
Then F is an invertible linear transformationwith
F−1(x, y) =
(x
3
,
y
2
)
.
The mappings F and F−1 are continuously differentiable. It is easy to show
that F maps A bijectively onto B. Hence, F : A → B is a diffeomorphism
between A and B.
Figure 5.3: A = {(x, y) |x2 + y2 < 1} and B = {(x, y) |, 4x2 + 9y2 < 36} are
diffeomorphic.
Theorem 5.3 gives the following.
Theorem 5.7
Let A be an invertible n× n matrix, and let x0 and y0 be two points in Rn.
Define the mapping F : Rn → Rn by
F(x) = y0 + A (x− x0) .
If O is an open subset of Rn, then F : O → F(O) is a diffeomorphism.
The inverse function theorem gives the following.
Chapter 5. The Inverse and Implicit Function Theorems 306
Theorem 5.8
Let O be an open subset of Rn, and let F : O → Rn be a continuously
differentiable mapping such that DF(x) is invertible for all x ∈ O. If U is
an open subset contained in O such that F : U → Rn is one-to-one, then
F : U → F(U) is a diffeomorphism.
The proof of this theorem is left as an exercise.
Chapter 5. The Inverse and Implicit Function Theorems 307
Exercises 5.2
Question 1
Let F : R2 → R2 be the mapping given by
F(x, y) = (xey + xy, 2x2 + 3y2).
Show that there is a neighbourhood U of (−1, 0) such that the mapping
F : U → R2 is stable.
Question 2
Let O be an open subset of Rn, and let F : O → Rn be a continuously
differentiable mapping such that detDF(x) ̸= 0 for all x ∈ O. Show that
F(O) is an open set.
Question 3
Let O be an open subset of Rn, and let F : O → Rn be a continuously
differentiable mapping such that DF(x) is invertible for all x ∈ O. If U is
an open subset contained in O such that F : U → Rn is one-to-one, then
F : U → F(U) is a diffeomorphism.
Question 4
Let O be an open subset of Rn, and let F : O → Rn be a differentiable
mapping. Assume that there is a positive constant c such that
∥F(u)− F(v)∥ ≥ c∥u− v∥ for all u,v ∈ O.
Use first order approximation theorem to show that for any x ∈ O and any
h ∈ Rn,
∥DF(x)h∥ ≥ c∥h∥.
Chapter 5. The Inverse and Implicit Function Theorems 308
Question 5
Let O be an open subset of Rn, and let F : O → Rn be a continuously
differentiable mapping.
(a) If F : O → Rn is stable, show that the derivative matrix DF(x) is
invertible at every x in O.
(b) Assume that the derivative matrix DF(x) is invertible at every x in O.
If C is a compact subset of O, show that the mapping F : C → Rn is
stable.
Chapter 5. The Inverse and Implicit Function Theorems 309
5.3 The Implicit Function Theorem
The implicit function theorem is about the possibility of solving m variables from
a system of m equations with n+m variables. Let us study some special cases.
Consider the function f : R2 → R given by f(x, y) = x2 + y2 − 1. For
a point (x0, y0) that satisfies f(x0, y0) = 0, we want to ask whether there is a
neighbourhood I of x0, a neighbourhood J of y0, and a function g : I → R such
that for (x, y) ∈ I × J , f(x, y) = 0 if and only if y = g(x).
Figure 5.4: The points in the (x, y) plane satisfying x2 + y2 − 1 = 0.
If (x0, y0) is a point with y0 > 0 and f(x0, y0) = 0, then we can take the
neighbourhoods I = (−1, 1) and J = (0,∞) of x0 and y0 respectively, and define
the function g : I → R by
g(x) =
√
1− x2.
We then find that for (x, y) ∈ I × J , f(x, y) = 0 if and only if y =
√
1− x2 =
g(x).
If (x0, y0) is a point with y0 < 0 and f(x0, y0) = 0, then we can take the
neighbourhoods I = (−1, 1) and J = (−∞, 0) of x0 and y0 respectively, and
define the function g : I → R by
g(x) = −
√
1− x2.
We then find that for (x, y) ∈ I × J , f(x, y) = 0 if and only if y = −
√
1− x2 =
g(x).
However, if (x0, y0) = (1, 0), any neighbourhood J of y0 must contain an
interval of the form (−r, r). If I is a neighbourhood of 1, (x, y) is a point in
Chapter 5. The Inverse and Implicit Function Theorems 310
I × (−r, r) such that f(x, y) = 0, then (x,−y) is another point in I × (−r, r)
satisfying f(x,−y) = 0. This shows that there does not exist any function g :
I → R such that when (x, y) ∈ I × J , f(x, y) = 0 if and only if y = g(x).
We say that we cannot solve y as a function of x in a neighbourhood of the point
(1, 0).
Similarly, we cannot solve y as a function of x in a neighbourhood of the point
(−1, 0).
However, in a neighbourhood of the points (1, 0) and (−1, 0), we can solve x
as a function of y.
For a function f : O → R defined on an open subset O of R2, the implicit
function theorem takes the following form.
Theorem 5.9 Dini’s Theorem
Let O be an open subset of R2 that contains the point (x0, y0), and let
f : O → R be a continuously differentiable function defined on O such
that f(x0, y0) = 0. If
∂f
∂y
(x0, y0) ̸= 0, then there is a neighbourhood I
of x0, a neighbourhood J of y0, and a continuously differentiable function
g : I → J such that for any (x, y) ∈ I × J , f(x, y) = 0 if and only if
y = g(x). Moreover, for any x ∈ I ,
∂f
∂x
(x, g(x)) +
∂f
∂y
(x, g(x))g′(x) = 0.
Dini’s theorem says that to be able to solve y as a function of x, a sufficient
condition is that the function f has continuous partial derivatives, and fy does not
vanish. By interchanging the roles of x and y, we see that if fx does not vanish,
we can solve x as a function of y.
For the function f : R2 → R, f(x, y) = x2 + y2 − 1, the points on the set
x2 + y2 = 1 which fy(x, y) = 2y vanishes are the points (1, 0) and (−1, 0). In
fact, we have seen that we cannot solve y as functions of x in neighbourhoods of
these two points.
Chapter 5. The Inverse and Implicit Function Theorems 311
Proof of Dini’s Theorem
Without loss of generality, assume that fy(x0, y0) > 0. Let u0 = (x0, y0).
Since fy : O → R is continuous, there is an r1 > 0 such that the closed
rectangle R = [x0 − r1, x0 + r1] × [y0 − r1, y0 + r1] lies in O, and for all
(x, y) ∈ R, fy(x, y) > fy(x0, y0)/2 > 0. For any x ∈ [x0− r1, x0+ r1], the
function hx : [y0 − r1, y0 + r1] → R has derivative h′x(y) = fy(x, y) that
is positive. Hence, hx(y) = g(x, y) is strictly increasing in y. This implies
that
f(x, y0 − r1) < f(x, y0) < f(x, y0 + r1).
When x = x0, we find that
f(x0, y0 − r1) < 0 < f(x0, y0 + r1).
Since f is continuously differentiable, it is continuous. Hence, there is an
r2 > 0 such that r2 ≤ r1, and for all x ∈ [x0 − r2, x0 + r2],
f(x, y0 − r1) < 0 and f(x, y0 + r1) > 0.
Let I = (x0 − r2, x0 + r2). For x ∈ I , since hx : [y0 − r1, y0 + r1] → R is
continuous, and
hx(y0 − r1) < 0 < hx(y0 + r1),
intermediate value theorem implies that there is a y ∈ (y0 − r1, y0 + r1)
such that hx(y) = 0. Since hx is strictly increasing, this y is unique, and
we denote it by g(x). This defines the function g : I → R. Let J =
(y0 − r1, y0 + r1). By our argument, for each x ∈ I , y = g(x) is a unique
y ∈ J such that f(x, y) = 0. Thus, for any (x, y) ∈ I × J , f(x, y) = 0 if
and only if y = g(x).
It remains to prove that g : I → R is continuosly differentiable. By our
convention above, there is a positive constant c such that
∂f
∂y
(x, y) ≥ c for all (x, y) ∈ I × J.
Chapter 5. The Inverse and Implicit Function Theorems 312
Fixed x ∈ I . There exists an r > 0 such that (x − r, x + r) ⊂ I . For h
satisfying 0 < |h| < r, x + h is in I . By mean value theorem, there is a
ch ∈ (0, 1) such that
f(x+ h, g(x+ h))− f(x, g(x)) = h
∂f
∂x
(uh) + (g(x+ h)− g(x))
∂f
∂y
(uh),
where
uh = (x, g(x)) + ch(h, g(x+ h)− g(x)). (5.4)
Since
f(x+ h, g(x+ h)) = 0 = f(x, g(x)),
we find that
g(x+ h)− g(x)
h
= −fx(uh)
fy(uh)
. (5.5)
Since fx is continuous on the compact set R, it is bounded. Namely, there
exists a constant M such that
|fx(x, y)| ≤M for all (x, y) ∈ R.
Eq. (5.5) then implies that
|g(x+ h)− g(x)| ≤ M
c
|h|.
Taking h→ 0 proves that g is continuous at x. From (5.4), we find that
lim
h→0
uh = (x, g(x)).
Since fx and fy are continuous at (x, g(x)), eq. (5.5) gives
lim
h→0
g(x+ h)− g(x)
h
= − lim
h→0
fx(uh)
fy(uh)
= −fx(x, g(x))
fy(x, g(x))
.
This proves that g is differentiable at x and
∂f
∂x
(x, g(x)) +
∂f
∂y
(x, g(x))g′(x) = 0.
Chapter 5. The Inverse and Implicit Function Theorems 313
Figure 5.5: Proof of Dini’s Theorem.Example 5.8
Consider the equation
xy3 + sin(x+ y) + 4x2y = 3.
Show that in a neighbourhood of (−1, 1), this equation defines y as a
function of x. If this function is denoted as y = g(x), find g′(−1).
Solution
Let f : R2 → R be the function defined as
f(x, y) = xy3 + sin(x+ y) + 4x2y − 3.
Since sine function and polynomial functions are infinitely differentiable,
f is infinitely differentiable.
∂f
∂y
(x, y) = 3xy2 + cos(x+ y) + 4x2,
∂f
∂y
(−1, 1) = 2 ̸= 0.
By Dini’s theorem, there is a neighbourhood of (−1, 1) such that y can be
solved as a function of x. Now,
∂f
∂x
(x, y) = y3 + cos(x+ y) + 8xy,
∂f
∂x
(−1, 1) = −6.
Hence, g′(0) = −−6
2
= 3.
Chapter 5. The Inverse and Implicit Function Theorems 314
Now we turn to the general case. First we consider polynomial mappings of
degree at most one. Let A = [aij] be an m × n matrix, and let B = [bij] be an
m×m matrix. Given x ∈ Rn, y ∈ Rm, c ∈ Rm, the system of equations
Ax+By = c
is the following m equations in m+ n variables x1, . . . , xn, y1, . . . , ym.
a11x1 + a12x2 + · · ·+ a1nxn + b11y1 + b12y2 + · · ·+ b1mym = c1,
a21x1 + a22x2 + · · ·+ a2nxn + b21y1 + b22y2 + · · ·+ b2mym = c2,
...
am1x1 + am2x2 + · · ·+ amnxn + bm1y1 + bm2y2 + · · ·+ bmmym = cm.
Let us look at an example.
Example 5.9
Consider the linear system
2x1 + 3x2 − 5x3 + 2y1 − y2 = 1
3x1 − x2 + 2x3 − 3y1 + y2 = 0
Show that y = (y1, y2) can be solved as a function of x = (x1, x2, x3).
Write down the function G : R3 → R2 such that the solution is given by
y = G(x), and find DG(x).
Solution
Let
A =
[
2 3 −5
3 −1 2
]
, B =
[
2 −1
−3 1
]
.
Then the system can be written as
Ax+By = c, where c =
[
1
0
]
.
This implies that
By = c− Ax. (5.6)
Chapter 5. The Inverse and Implicit Function Theorems 315
For every x ∈ R3, c − Ax is a vector in R2. Since detB = −1 ̸= 0, B is
invertible. Therefore, there is a unique y satisfying (5.6). It is given by
G(x) = y = B−1 (c− Ax)
= −
[
1 1
3 2
][
1
0
]
+
[
1 1
3 2
][
2 3 −5
3 −1 2
]
x
= −
[
1
3
]
+
[
5 2 −3
12 7 −11
]
x
=
[
5x1 + 2x2 − 3x3 − 1
12x1 + 7x2 − 11x3 − 3
]
.
It follows that DG =
[
5 2 −3
12 7 −11
]
.
The following theorem gives a general scenario.
Theorem 5.10
Let A = [aij] be an m × n matrix, and let B = [bij] be an m ×m matrix.
Define the function F : Rm+n → Rm by
F(x,y) = Ax+By − c,
where c is a constant vector in Rm. The equation F(x,y) = 0 defines the
variable y = (y1, . . . , ym) as a function of x = (x1, . . . , xn) if and only if
the matrix B is invertible. If we denote this function as G : Rn → Rm,
then
G(x) = B−1 (c− Ax) ,
and
DG(x) = −B−1A.
Chapter 5. The Inverse and Implicit Function Theorems 316
Proof
The equation F(x,y) = 0 defines the variables y as a function of x if and
only for for each x ∈ Rn, there is a unique y ∈ Rm satisfying
By = c− Ax.
This is a linear system for the variable y. By the theory of linear algebra,
a unique solution y exists if and only if B is invertible. In this case, the
solution is given by
y = B−1 (c− Ax) .
The rest of the assertion follows.
Write a point in Rm+n as (x,y), where x ∈ Rn and y ∈ Rm. If F : Rm+n →
Rm is a function that is differentiable at the point (x,y), them×(m+n) derivative
matrix DF(x,y) can be written as
DF(x,y) =
[
DxF(x,y) DyF(x,y)
]
,
where
DxF(x,y) =

∂F1
∂x1
(x,y)
∂F1
∂x2
(x,y) · · · ∂F1
∂xn
(x,y)
∂F2
∂x1
(x,y)
∂F2
∂x2
(x,y) · · · ∂F2
∂xn
(x,y)
...
... . . . ...
∂Fm
∂x1
(x,y)
∂Fm
∂x2
(x,y) · · · ∂Fm
∂xn
(x,y)

,
DyF(x,y) =

∂F1
∂y1
(x,y)
∂F1
∂y2
(x,y) · · · ∂F1
∂ym
(x,y)
∂F2
∂y1
(x,y)
∂F2
∂y2
(x,y) · · · ∂F2
∂ym
(x,y)
...
... . . . ...
∂Fm
∂y1
(x,y)
∂Fm
∂y2
(x,y) · · · ∂Fm
∂ym
(x,y)

.
Chapter 5. The Inverse and Implicit Function Theorems 317
Notice that DyF(x,y) is a square matrix.
When A = [aij] is an m × n matrix, B = [bij] is an m × m matrix, c is a
vector in Rm, and F : Rm+n → Rm is the function defined as
F(x,y) = Ax+By − c,
it is easy to compute that
DxF(x,y) = A, DyF(x,y) = B.
Theorem 5.10 says that we can solve y as a function of x from the system of m
equations
F(x,y) = 0
if and only if
B = DyF(x,y)
is invertible. In this case, if G : Rn → Rm is the function so that y = G(x) is the
solution, then
DG(x) = −B−1A = −DyF(x,y)
−1DxF(x,y).
In fact, this latter follows from F(x,G(x)) = 0 and the chain rule.
The special case of degree one polynomial mappings gives us sufficient insight
into the general implicit function theorem. However, for nonlinear mappings, the
conclusions can only be made locally.
Theorem 5.11 Implicit Function Theorem
Let O be an open subset of Rm+n, and let F : O → Rm be a continuously
differentiable function defined on O. Assume that x0 is a point in Rn and
y0 is a point in Rm such that the point (x0,y0) is in O and F(x0,y0) = 0.
If detDyF(x0,y0) ̸= 0, then we have the followings.
(i) There is a neighbourhood U of x0, a neighbourhood V of y0, and a
continuously differentiable function G : U → Rm such that for any
(x,y) ∈ U × V , F(x,y) = 0 if and only if y = G(x).
(ii) For any x ∈ U ,
DxF(x,G(x)) +DyF(x,G(x))DG(x) = 0.
Chapter 5. The Inverse and Implicit Function Theorems 318
Here we will give a proof of the implicit function theorem using the inverse
function theorem. The idea of the proof is to construct a mapping which one can
apply the inverse function theorem. Let us look at an example first.
Example 5.10
Let F : R5 → R2 be the function defined as
F(x1, x3, x3, y1, y2) = (x1y
2
2, x2x3y
2
1 + x1y2).
Define the mapping H : R5 → R5 as
H(x,y) = (x,F(x,y)) = (x1, x2, x3, x1y
2
2, x2x3y
2
1 + x1y2).
Then we find that
DH(x,y) =

1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
y22 0 0 0 2x1y2
y2 x3y
2
1 x2y
2
1 2x2x3y1 x1
 .
Notice that
DH(x,y) =
[
I3 0
DxF(x,y) DyF(x,y)
]
.
Proof of the Implicit Function Theorem
Let H : O → Rm+n be the mapping defined as
H(x,y) = (x,F(x,y)) .
Notice that F(x,y) = 0 if and only if H(x,y) = (x,0). Since the first n
components of H are infinitely differentiable functions, the mapping H :
O → Rm+n is continuously differentiable.
Chapter 5. The Inverse and Implicit Function Theorems 319
Now,
DH(x,y) =
[
In 0
DxF(x,y) DyF(x,y)
]
.
Therefore,
detDH(x0,y0) = detDyF(x0,y0) ̸= 0.
By the inverse function theorem, there is a neighbourhood W of (x0,y0)
and a neighbourhood Z of H(x0,y0) = (x0,0) such that H : W → Z is
a bijection and H−1 : Z → W is continuously differentiable. For u ∈ Rn,
v ∈ Rm so that (u,v) ∈ Z, let
H−1(u,v) = (Φ(u,v),Ψ(u,v)),
where Φ is a map from Z to Rn and Ψ is a map from Z to Rm. Since H−1
is continuously differentiable, Φ and Ψ are continuously differentiable.
Given r > 0, letDr be the open cubeDr =
m+n∏
i=1
(−r, r). SinceW and Z are
open sets that contain (x0,y0) and (x0,0) respectively, there exists r > 0
such that
(x0,y0) +Dr ⊂ W, (x0,0) +Dr ⊂ Z.
If Ar =
n∏
i=1
(−r, r), Br =
m∏
i=1
(−r, r), U = x0 + Ar, V = y0 +Br, then
(x0,y0) +Dr = U × V, (x0,0) +Dr = U ×Br.
Hence, U × V ⊂ W and U ×Br ⊂ Z. Define G : U → Rm by
G(x) = Ψ(x,0).
Since Ψ is continuously differentiable, G is continuously differentiable. If
x ∈ U , y ∈ V , then (x,y) ∈ W . For such (x,y), F(x,y) = 0 implies
H(x,y) = (x,0). Since H : W → Z is a bijection, (x,0) ∈ Z and
H−1(x,0) = (x,y). Comparing the last m components give
y = Ψ(x,0) = G(x).
Chapter 5. The Inverse and Implicit Function Theorems 320
Conversely, since H(H−1(u,v)) = (u,v) for all (u,v) ∈ Z, we find that
(Φ(u,v),F(Φ(u,v),Ψ(u,v))) = (u,v)
for all (u,v) ∈ Z. For all u ∈ U , (u,0) is in Z. Therefore,
Φ(u,0) = u, F(Φ(u,0),Ψ(u,0)) = 0.
This implies that if x ∈ U , then F(u,G(u)) = 0. In other words, if (x,y)
is in U × V and y = G(x), we must have F(x,y) = 0. Since we have
shown that G : U → Rm is continuously differentiable, the formula
DxF(x,G(x)) +DyF(x,G(x))DG(x) = 0
follows from F(x,G(x)) = 0 and the chain rule.
Example 5.11
Consider the system of equations
2x2y + 3xy2u+ xyv + uv = 7
4xu− 5yv + u2y + v2x = 1
(5.7)Notice that when (x, y) = (1, 1), (u, v) = (1, 1) is a solution of this system.
Show that there are neighbourhoods U and V of (1, 1), and a continuously
differentiable function G : U → R2 such that if (x, y, u, v) ∈ U × V ,
then (x, y, u, v) is a solution of the system of equations above if and only
if u = G1(x, y) and v = G2(x, y). Also, find the values of
∂G1
∂x
(1, 1),
∂G1
∂y
(1, 1),
∂G2
∂x
(1, 1) and
∂G2
∂y
(1, 1).
Solution
Define the function F : R4 → R2 by
F(x, y, u, v) = (2x2y+3xy2u+xyv+uv−7, 4xu−5yv+u2y+v2x−1).
Chapter 5. The Inverse and Implicit Function Theorems 321
This is a polynomial mapping. Hence, it is continuously differentiable. It
is easy to check that F(1, 1, 1, 1) = 0. Now,
D(u,v)F(x, y, u, v) =
[
3xy2 + v xy + u
4x+ 2uy −5y + 2vx
]
.
Thus,
detD(u,v)F(1, 1, 1, 1) =
[
4 2
6 −3
]
= −24 ̸= 0.
By implicit function theorem, there are neighbourhoods U and V of (1, 1),
and a continuously differentiable function G : U → R2 such that, if
(x, y, u, v) ∈ U×V , then (x, y, u, v) is a solution of the system of equations
(5.7) if and only if u = G1(x, y) and v = G2(x, y).
Finally,
D(x,y)F(x, y, u, v) =
[
4xy + 3y2u+ yv 2x2 + 6xyu+ xv
4u+ v2 −5v + u2
]
,
D(x,y)F(1, 1, 1, 1) =
[
8 9
5 −4
]
.
Chain rule gives
DG(1, 1) = −D(u,v)F(1, 1, 1, 1)
−1D(x,y)F(1, 1, 1, 1)
=
1
24
[
−3 −2
−6 4
][
8 9
5 −4
]
=
1
24
[
−34 −19
−28 −70
]
.
Therefore,
∂G1
∂x
(1, 1) = −17
12
,
∂G1
∂y
(1, 1) = −19
24
,
∂G2
∂x
(1, 1) = −7
6
,
∂G2
∂y
(1, 1) = −35
12
.
Chapter 5. The Inverse and Implicit Function Theorems 322
Remark 5.3 The Rank of a Matrix
In the formulation of the implicit function theorem, the assumption that
detDyF(x0,y0) ̸= 0 can be replaced by the assumption that there are m
variables u1, . . . , um among the n+m variables x1, . . . , xn, y1, . . . , ym such
that detD(u1,...,um)F(x0,y0) ̸= 0.
Recall that the rank r of an m × k matrix A is the dimension of its row
space or the dimension of its column space. Thus, the rank r of a m × k
matrixA is the maximum number of column vectors ofAwhich are linearly
independent, or the maximum number of row vectors of A that are linearly
independent. Hence, the maximum possible value of r is max{m, k}. If
r = max{m, k}, we say that the matrix A has maximal rank. For a m× k
matrix where m ≤ k, it has maximal rank if r = m. In this case, there is
a m×m submatrix of A consists of m linearly independent vectors in Rm.
The determinant of this submatrix is nonzero.
Thus, the condition detDyF(x0,y0) ̸= 0 in the formulation of the implicit
function theorem can be replaced by the condition that the m × (m + n)
matrix DF(x0,y0) has maximal rank.
Example 5.12
Consider the system
2x2y + 3xy2u+ xyv + uv = 7
4xu− 5yv + u2y + v2x = 1
(5.8)
defined in Example 5.11. Show that there are neighbourhoods U and V of
(1, 1), and a continuously differentiable function H : V → R2 such that if
(x, y, u, v) ∈ U×V , then (x, y, u, v) is a solution of the system of equations
if and only if x = H1(u, v) and y = H2(u, v). Find DH(1, 1).
Chapter 5. The Inverse and Implicit Function Theorems 323
Solution
Define the function F : R4 → R2 as in the solution of Example 5.11. Since
detD(x,y)F(1, 1, 1, 1) =
[
8 9
5 −4
]
= −77 ̸= 0,
the implicit function theorem implies there are neighbourhoods U and V of
(1, 1), and a continuously differentiable function H : V → R2 such that if
(x, y, u, v) ∈ U×V , then (x, y, u, v) is a solution of the system of equations
(5.8) if and only if x = H1(u, v) and y = H2(u, v). Moreover,
DH(1, 1) = −D(x,y)F(1, 1, 1, 1)
−1D(u,v)F(1, 1, 1, 1)
=
1
77
[
−4 −9
−5 8
][
4 2
6 −3
]
=
1
77
[
−70 19
28 −34
]
.
Remark 5.4
The function G : U → R2 in Example 5.11 and the function H : V → R2
in Example 5.12 are in fact inverses of each other.
Notice that DG(1, 1) is invertible. By the inverse function theorem, there is
a neighbourhood U ′ of (1, 1) such that V ′ = G(U) is open, and G : U ′ →
V ′ is a bijection with continuously differentiable inverse. By shrinking
down the sets U and V , we can assume that U = U ′, and V = V ′. If
(x, y) ∈ U and (u, v) ∈ V , F(x, y, u, v) = 0 if and only if (u, v) =
G(x, y), if and only if (x, y) = H(u, v). This implies that G : U → V and
H : V → U are inverses of each other.
At the end of this section, let us consider a geometric application of the implicit
function theorem. First let us revisit the example where f(x, y) = x2+ y2− 1. At
each point (x0, y0) such that f(x0, y0) = 0,
x20 + y20 = 1.
Hence, ∇f(x0, y0) = (2x0, 2y0) ̸= 0. Notice that the vector ∇f(x0, y0) =
Chapter 5. The Inverse and Implicit Function Theorems 324
(2x0, 2y0) is normal to the circle x2 + y2 = 1 at the point (x0, y0).
Figure 5.6: The tangent vector and normal vector at a point on the circle x2+y2−
1 = 0.
If y0 > 0, let U = (−1, 1) × (0,∞). Restricted to U , the points where
f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(x) =
√
1− x2.
If y0 < 0, let U = (−1, 1) × (−∞, 0). Restricted to U , the points where
f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(x) = −
√
1− x2.
If y0 = 0, then x0 = 1 or −1. In fact, we can consider more generally the
cases where x0 > 0 and x0 < 0.
If x0 > 0, let U = (0,∞) × (−1, 1). Restricted to U , the points where
f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(y) =
√
1− y2.
If x0 < 0, let U = (−∞, 0) × (−1, 1). Restricted to U , the points where
f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(y) = −
√
1− y2.
Definition 5.4 Surfaces
Let S be a subset of Rk for some positive integer k. We say that S is a
n-dimensional surface if for each x0 on S, there is an open subset D of
Rn, an open neighbourhood U of x0 in Rk, and a one-to-one differentiable
mapping G : D → Rk such that G(D) ⊂ S, G(D) ∩ U = S ∩ U , and
DG(u) has rank n at each u ∈ D.
Chapter 5. The Inverse and Implicit Function Theorems 325
Example 5.13
We claim that the n-sphere
Sn = {(x1, . . . , xn, xn+1) |x21 + · · ·+ x2n + x2n+1 = 1}
is an n-dimensional surface. Let (a1, . . . , an, an+1) be a point on Sn. Then
at least one of the components a1, . . . , an, an+1 is nonzero. Without loss of
generality, assume that an+1 > 0. Let
D =
{
(x1, . . . , xn) |x21 + · · ·+ x2n < 1
}
,
U = {(x1, . . . , xn, xn+1) |xn+1 > 0} ,
and define the mapping G : D → U by
G(x1, . . . , xn) =
(
x1, . . . , xn,
√
1− x21 − · · · − x2n
)
.
Then G is a differentiable mapping, G(D) ⊂ Sn and G(D)∩U = Sn∩U .
Now,
DG(x1, . . . , xn) =
[
In
v
]
,
where v = ∇Gn+1(x1, . . . , xn). Since the first n-rows of DG(x1, . . . , xn)
is the n × n identity matrix, it has rank n. Thus, Sn is an n-dimensional
surface.
Generalizing Example 5.13, we find that a large class of surfaces is provided
by graphs of differentiable functions.
Theorem 5.12
Let D be an open subset of Rn, and let g : D → R be a differentiable
mapping. Then the graph of g given by
Gg = {(x1, . . . , xn, xn+1) | (x1, . . . , xn) ∈ D, xn+1 = g(x1, . . . , xn)} ,
is an n-dimensional surface.
A hyperplane in Rn+1 is the set of points in Rn+1 which satisfies an equation
Chapter 5. The Inverse and Implicit Function Theorems 326
of the form
a1x1 + · · ·+ anxn + an+1xn+1 = b,
where a = (a1, . . . , an, an+1) is a nonzero vector in Rn+1. By definition, if u and
v are two points on the plane, then
⟨a,u− v⟩ = 0.
This shows that a is a vector normal to the plane.
When D is an open subset of Rn, and g : D → R is a differentiable mapping,
the graph Gg of g is an n-dimensional surface. If u = (u1, . . . , un) is a point on
D, (u, g(u)) is a point on Gg, we have seen that the equation of the tangent plane
at the point (u, g(u)) is given by
xn+1 = f(u) +
n∑
i=1
∂g
∂xi
(u, g(u))(xi − ui).
Implicit function theorem gives the following.
Theorem 5.13
Let O be an open subset of Rn+1, and let f : O → R be a continuously
differentiable function. If x0 is a point in O such that f(x0) = 0 and
∇f(x0) ̸= 0, then there is neighbourhood U of x0 contained in O such
that restricted to U , f(x) = 0 isthe graph of a continuously differentiable
function g : D → R, and ∇f(x) is a vector normal to the tangent plane of
the graph at the point x.
Proof
Assume that x0 = (a1, . . . , an, an+1). Since ∇f(x0) ̸= 0, there is a 1 ≤
k ≤ n + 1 such that
∂f
∂xk
(x0) ̸= 0. Without loss of generality, assume that
k = n+ 1.
Chapter 5. The Inverse and Implicit Function Theorems 327
Given a point x = (x1, . . . , xn, xn+1) in Rn+1, let u = (x1, . . . , xn) so that
x = (u, xn+1). By the implicit function theorem, there is a neighbourhood
D of u0 = (a1, . . . , an), an r > 0, and a continuously differentiable
function g : D → R such that if U = D × (an+1 − r, an+1 + r),
(u, un+1) ∈ U , then f(u, un+1) = 0 if and only if un+1 = g(u). In other
words, in the neighbourhood U of x0 = (u0, an+1), f(u, un+1) = 0 if and
only if (u, un+1) is a point on the graph of the function g. The equation of
the tangent plane at the point (u, un+1) is
xn+1 − un+1 =
n∑
i=1
∂g
∂xi
(u)(xi − ui).
By chain rule,
∂g
∂xi
(u) = −
∂f
∂xi
(u, un+1)
∂f
∂xn+1
(u, un+1)
.
Hence, the equation of the tangent plane can be rewritten as
n+1∑
i=1
(xi − ui)
∂f
∂xi
(u, un+1) = 0.
This shows that ∇f(u, un+1) is a vector normal to the tangent plane.
Example 5.14
Find the equation of the tangent plane to the surface x2 + 4y2 + 9z2 = 36
at the point (6, 1,−1).
Solution
Let f(x, y, z) = x2 + 4y2 + 9z2. Then ∇f(x, y, z) = (2x, 8y, 18z). It
follows that ∇f(6, 1,−1) = 2(6, 4,−9). Hence, the equation of the tangent
plane to the surface at (6, 1,−1) is
6x+ 4y − 9z = 36 + 4 + 9 = 49.
Chapter 5. The Inverse and Implicit Function Theorems 328
Exercises 5.3
Question 1
Consider the equation
4yz2 + 3xz3 − 11xyz = 14.
Show that in a neighbourhood of (−1, 1, 2), this equation defines z as
a function of (x, y). If this function is denoted as z = g(x, y), find
∇g(−1, 1).
Question 2
Consider the system of equations
2xu2 + vyz + 3uv = 2
5x+ 7yzu− v2 = 1
(a) Show that when (x, y, z) = (−1, 1, 1), (u, v) = (1, 1) is a solution of
this system.
(b) Show that there are neighbourhoods U and V of (−1, 1, 1) and (1, 1),
and a continuously differentiable function G : U → R2 such that, if
(x, y, z, u, v) ∈ U ×V , then (x, y, z, u, v) is a solution of the system of
equations above if and only if u = G1(x, y, z) and v = G2(x, y, z).
(c) Find the values of
∂G1
∂x
(−1, 1, 1),
∂G2
∂x
(−1, 1, 1) and
∂G2
∂z
(−1, 1, 1).
Question 3
Let O be an open subset of R2n, and let F : O → Rn be a continuously
differentiable function. Assume that x0 and y0 are points in Rn such that
(x0,y0) is a point in O, F(x0,y0) = 0, and DxF(x0,y0) and DyF(x0,y0)
are invertible. Show that there exist neighbourhoods U and V of x0 and y0,
and a continuously differentiable bijective function G : U → V such that,
if (x,y) is in U × V , F(x,y) = 0 if and only if y = G(x).
Chapter 5. The Inverse and Implicit Function Theorems 329
5.4 Extrema Problems and the Method of Lagrange Multipliers
Optimization problems are very important in our daily life and in mathematical
sciences. Given a function f : D → R, we would like to know whether it has
a maximum value or a minimum value. In Chapter 3, we have dicusssed the
extreme value theorem, which asserts that a continuous function that is defined
on a compact set must have maximum and minimum values. In Chapter 4, we
showed that if a function f : D → R has (local) extremum at an interior point x0
of its domain D and it is differentiable at x0, then x0 must be a stationary point.
Namely, ∇f(x0) = 0.
Combining these various results, we can formulate a strategy for solving a
special type of optimization problems. Let us first consider the following example.
Example 5.15
Let
K =
{
(x, y) |x2 + 4y2 ≤ 100
}
,
and let f : K → R be the function defined as
f(x, y) = x2 + y2.
Find the maximum and minimum values of f : K → R, and the points
where these values appear.
Solution
Let g : R2 → R be the function defined as g(x, y) = x2 + 4y2 − 100. It is
a polynomial function. Hence, it is continuous. Since K = g−1((−∞, 0])
and (−∞, 0] is closed in R, K is a closed set. By a previous exercise,
O = intK =
{
(x, y) |x2 + 4y2 < 100
}
and
C = bdK =
{
(x, y) |x2 + 4y2 = 100
}
.
Chapter 5. The Inverse and Implicit Function Theorems 330
For any (x, y) ∈ K, ∥(x, y)∥2 = x2 + y2 ≤ x2 + 4y2 ≤ 100. Therefore,
K is bounded. Since K is closed and bounded, and the function f : K →
R, f(x) = x2 + y2 is continuous, extreme value theorem says that f has
maximum and minimum values. These values appear either in O or on C.
Since f : O → R is differentiable, if (x0, y0) is an extremizer of f : O →
R, we must have ∇f(x0, y0) = (0, 0), which gives (x0, y0) = (0, 0).
The other candidates of extremizers are on C. Therefore, we need to find
the maximum and minimum values of f(x, y) = x2 + y2 subject to the
constraint x2 + 4y2 = 100. From x2 + 4y2 = 100, we find that x2 =
100 − 4y2, and y can only take values in the interval [−5, 5]. Hence, we
want to find the maximum and minimum values of h : [−5, 5] → R,
h(y) = 100− 4y2 + y2 = 100− 3y2.
When y = 0, h has maximum value 100, and when y = ±5, it has minimum
value 100 − 3 × 25 = 25. Notice that when y = 0, x = ±10; while when
y = ±5, x = 0.
Hence, we have five candidates for the extremizers of f . Namely, u1 =
(0, 0), u2 = (10, 0), u3 = (−10, 0), u4 = (0, 5) and u5 = (0,−5). The
function values at these 5 points are
f(u1) = 0, f(u2) = f(u3) = 100, f(u4) = f(u5) = 25.
Therefore, the minimum value of f : K → R is 0, and the maximum value
is 100. The minimum value appears at the point (0, 0) ∈ intK, while the
maximum value appears at (±10, 0) ∈ bdK.
Example 5.15 gives a typical scenario of the optimization problems that we
want to study in this section.
Chapter 5. The Inverse and Implicit Function Theorems 331
Figure 5.7: The extreme values of f(x, y) = x2 + y2 on the sets K =
{(x, y) |x2 + 4y2 ≤ 100} and C = {(x, y) |x2 + 4y2 = 100}.
Optimization Problem
Let K be a compact subset of Rn with interior O, and let f : K → R
be a function continuous on K, differentiable on O. We want to find the
maximum and minimum values of f : K → R.
(i) By the extreme value theorem, f : K → R has maximum and
minimum values.
(ii) Since K is closed, K is a disjoint union of its interior O and its
boundary C. Since C is a subset of K, it is bounded. On the other
hand, being the boundary of a set, C is closed. Therefore, C is
compact.
(iii) The extreme values of f can appear in O or on C.
(iv) If x0 is an extremizer of f : K → R and it is in O, we must have
∇f(x0) = 0. Namely, x0 is a stationary point of f : O → R.
(v) If x0 is an extremizer of f : K → R and it is not in O, it is an
extremizer of f : C → R.
(vi) Since C is compact, f : C → R has maximum and minimum values.
Chapter 5. The Inverse and Implicit Function Theorems 332
Therefore, the steps to find the maximum and minimum values of f : K →
R are as follows.
Step 1 Find the stationary points of f : O → R.
Step 2 Find the extremizers of f : C → R.
Step 3 Compare the values of f at the stationary points of f : O → R and
the extremizers of f : C → R to determine the extreme values of
f : K → R.
Of particular interest is when the boundary ofK can be expressed as g(x) = 0,
where g : D → R is a continuously differentiable function defined on an open
subset D of Rn. If f is also defined and differentiable on D, the problem of
finding the extreme values of f : C → R becomes finding the extreme values of
f : D → R subject to the constraint g(x) = 0. In Example 5.15, we have used
g(x) = 0 to solve one of the variables in terms of the others and substitute into f to
transform the optimization problem to a problem with fewer variables. However,
this strategy can be quite complicated because it is often not possible to solve
one variable in terms of the others explicitly from the constraint g(x) = 0. The
method of Lagrange multipliers provides a way to solve constraint optimization
problems without having to explicitly solve some variables in terms ofthe others.
The validity of this method is justified by the implicit function theorem.
Theorem 5.14 The Method of Lagrange Multiplier (One Constraint)
Let O be an open subset of Rn+1 and let f : O → R and g : O → R be
continuously differentiable functions defined on O. Consider the subset of
O defined as
C = {x ∈ O | g(x) = 0} .
If x0 is an extremizer of the function f : C → R and ∇g(x0) ̸= 0, then
there is a constant λ, known as the Lagrange multiplier, such that
∇f(x0) = λ∇g(x0).
Chapter 5. The Inverse and Implicit Function Theorems 333
Proof
Without loss of generality, assume that x0 is a maximizer of f : C → R.
Namely,
f(x) ≤ f(x0) for all x ∈ C. (5.9)
Given that ∇g(x0) ̸= 0, there exists a 1 ≤ k ≤ n + 1 such that
∂g
∂xk
(x0) ̸= 0. Without loss of generality, assume that k = n + 1.
Let x0 = (a1, . . . , an, an+1). Given a point x = (x1, . . . , xn, xn+1) in
Rn+1, let u = (x1, . . . , xn) so that x = (u, xn+1). By implicit function
theorem, there is a neighbourhood D of u0 = (a1, . . . , an), an r > 0,
and a continuously differentiable function h : D → R such that for
(u, xn+1) ∈ D × (an+1 − r, an+1 + r), g(u, xn+1) = 0 if and only if
xn+1 = h(u). Consider the function F : D → R defined as
F (u) = f(u, h(u)).
By (5.9), we find that
F (u0) ≥ F (u) for all u ∈ D.
In other words, u0 is a maximizer of the function F : D → R. Since u0
is an interior point of D and F : D → R is continuously differentiable,
∇F (u0) = 0. Since F (u) = f(u, h(u)), we find that for 1 ≤ i ≤ n,
∂F
∂xi
(u0) =
∂f
∂xi
(u0, an+1) +
∂f
∂xn+1
(u0, an+1)
∂h
∂xi
(u0) = 0. (5.10)
On the other hand, applying chain rule to g(u, h(u)) = 0 and set u = u0,
we find that
∂g
∂xi
(u0, an+1) +
∂g
∂xn+1
(u0, an+1)
∂h
∂xi
(u0) = 0 for 1 ≤ i ≤ n.
(5.11)
By assumption,
∂g
∂xn+1
(x0) ̸= 0. Let
λ =
∂f
∂xn+1
(x0)
∂g
∂xn+1
(x0)
.
Chapter 5. The Inverse and Implicit Function Theorems 334
Then
∂f
∂xn+1
(x0) = λ
∂g
∂xn+1
(x0). (5.12)
Eqs. (5.10) and (5.11) show that for 1 ≤ i ≤ n,
∂f
∂xi
(x0) = −λ ∂g
∂xn+1
(x0)
∂h
∂xi
(u0) = λ
∂g
∂xi
(x0). (5.13)
Eqs. (5.12) and (5.13) together imply that
∇f(x0) = λ∇g(x0).
This completes the proof of the theorem.
Remark 5.5
Theorem 5.14 says that if x0 is an extremizer of the constraint optimization
problem max /min f(x) subject to g(x) = 0, then the gradient of f at x0
should be parallel to the gradient of g at x0 if the latter is nonzero. One can
refer to Figure 5.7 for an illustration. Recall that the gradient of f gives
the direction where f changes most rapidly, while the gradient of g here
represents the normal vector to the curve g(x) = 0.
Using the method of Lagrange multiplier, there are n + 2 variables
x1, . . . , xn+1 and λ to be solved. The equation ∇f(x) = λ∇g(x) gives
n + 1 equations, while the equation g(x) = 0 gives one. Therefore, we
need to solve n+ 2 variables from n+ 2 equations.
Example 5.16
Let us solve the constraint optimization problem that appears in Example
5.15 using the Lagrange multiplier method. Let f : R2 → R and g :
R2 → R be respectively the functions f(x, y) = x2 + y2 and g(x, y) =
x2+4y2− 100. They are both continuously differentiable. We want to find
the maximum and minimum values of the function f(x, y) subject to the
constraint g(x, y) = 0. Notice that ∇g(x, y) = (2x, 8y) is the zero vector
if and only if (x, y) = (0, 0), but (0, 0) is not on the curve g(x, y) = 0.
Hence, for any (x, y) satisfying g(x, y) = 0, ∇g(x, y) ̸= 0.
Chapter 5. The Inverse and Implicit Function Theorems 335
By the method of Lagrange multiplier, we need to find (x, y) satisfying
∇f(x, y) = λ∇g(x, y) and g(x, y) = 0.
Therefore,
2x = 2λx, 2y = 8λy.
This gives
x(1− λ) = 0, y(1− 4λ) = 0.
The first equation says that either x = 0 or λ = 1.
If x = 0, from x2 + 4y2 = 100, we must have y = ±5.
If λ = 1, then y(1− 4λ) = 0 implies that y = 0. From x2 +4y2 = 100, we
then obtain x = ±10.
Hence, we find that the candidates for the extremizers are (±10, 0) and
(0,±5). Since f(±10, 0) = 100 and f(0,±5) = 25, we conclude that
subject to x2+4y2 = 100, the maximum value of f(x, y) = x2+ y2 is 100,
and the minimum value of f(x, y) = x2 + y2 is 25.
Example 5.17
Use the Lagrange multiplier method to find the maximum and minimum
values of the function f(x, y, z) = 8x+ 24y + 27z on the set
S =
{
(x, y, z) |x2 + 4y2 + 9z2 = 289
}
,
and the points where each of them appears.
Solution
Let g : R3 → R be the function
g(x, y, z) = x2 + 4y2 + 9z2 − 289.
The functions f : R3 → R, f(x, y, z) = 8x + 24y + 27z and g : R3 → R
are both continuously differentiable.
Chapter 5. The Inverse and Implicit Function Theorems 336
Notice that ∇g(x, y, z) = (2x, 8y, 18z) = 0 if and only if (x, y, z) = 0, and
0 does not lie on S. By Lagrange multiplier method, to find the maximum
and minimum values of f : S → R, we need to solve the equations
∇f(x, y, z) = λ∇g(x, y, z) and g(x, y, z) = 0.
These give
8 = 2λx, 24 = 8λy, 27 = 18λz
x2 + 4y2 + 9z2 = 289.
To satisfy the first three equations, none of the λ, x, y and z can be zero.
We find that
x =
4
λ
, y =
3
λ
, z =
3
2λ
.
Substitute into the last equation, we have
64 + 144 + 81
4λ2
= 289.
This gives 4λ2 = 1. Hence, λ = ±1
2
. When λ =
1
2
, (x, y, z) = (8, 6, 3).
When λ = −1
2
, (x, y, z) = (−8,−6,−3). These are the two candidates for
the extremizers of f : S → R.
Since f(8, 6, 3) = 289 and f(−8,−6,−3) = −289, we find that the
maximum and minimum values of f : S → R are 289 and −289
respectively, and the maximum value appear at (8, 6, 3), the minimum value
appear at (−8,−6,−3).
Now we consider more general constraint optimization problems which can
have more than one constraints.
Chapter 5. The Inverse and Implicit Function Theorems 337
Theorem 5.15 The Method of Lagrange Multiplier (General)
Let O be an open subset of Rm+n and let f : O → R and G : O → Rm be
continuously differentiable functions defined on O. Consider the subset of
O defined as
C = {x ∈ O |G(x) = 0} .
If x0 is an extremizer of the function f : C → R and the matrix DG(x0)
has (maximal) rank m, then there are constants λ1, . . ., λm, known as the
Lagrange multipliers, such that
∇f(x0) =
m∑
i=1
λi∇Gi(x0).
Proof
Without loss of generality, assume that x0 is a maximizer of f : C → R.
Namely,
f(x) ≤ f(x0) for all x ∈ C. (5.14)
Given that the matrix DG(x0) has rank m, m of the column vectors are
linearly independent. Without loss of generality, assume that the column
vectors in the last m columns are linearly independent. Write a point x in
Rm+n as x = (u,v), where u = (u1, . . . , un) is in Rn and v = (v1, . . . , vm)
is in Rm. By our assumption, DvG(u0,v0) is invertible. By implicit
function theorem, there is a neighbourhood D of u0, a neighbourhood V
of v0, and a continuously differentiable function H : D → Rm such that
for (u,v) ∈ D × V , G(u,v) = 0 if and only if v = H(u). Consider the
function F : D → R defined as
F (u) = f(u,H(u)).
By (5.14), we find that
F (u0) ≥ F (u) for all u ∈ D.
Chapter 5. The Inverse and Implicit Function Theorems 338
In other words, u0 is a maximizer of the function F : D → R. Since u0
is an interior point of D and F : D → R is continuously differentiable,
∇F (u0) = 0. Since F (u) = f(u,H(u)), we find that
∇F (u0) = Duf(u0,v0) +Dvf(u0,v0)DH(u0) = 0. (5.15)
On the other hand, applying chain rule to G(u,H(u)) = 0 and set u = u0,
we find that
DuG(u0,v0) +DvG(u0,v0)DH(u0) = 0. (5.16)
Take [
λ1 λ2 · · · λm
]
= λ = Dvf(x0)DvG(x0)
−1.
Then
Dvf(x0) = λDvG(x0). (5.17)
Eqs. (5.15) and (5.16) show that
Duf(x0) = −λDvG(x0)DH(u0) = λDuG(x0). (5.18)
Eqs. (5.17) and (5.18) together imply that
∇f(x0) = λDG(x0) =
m∑
i=1
λi∇Gi(x0).
This completes the proof of the theorem.
In the general constraint optimization problem proposed in Theorem 5.15,
there are n + 2m variables u1, . . . , un, v1, . . . , vm and λ1, . . . , λm to be solved.
The components of
∇f(x) =
m∑
i=1
λi∇Gi(x)
give n + m equations, while the components of G(x) = 0 give m equations.
Hence, we have to solve n+ 2m variablesfrom n+ 2m equations. Let us look at
an example.
Chapter 5. The Inverse and Implicit Function Theorems 339
Example 5.18
Let K be the subset of R3 given by
K =
{
(x, y, z) |x2 + y2 ≤ 4, x+ y + z = 1
}
.
Find the maximum and minimum values of the function f : K → R,
f(x, y, z) = x+ 3y + z.
Solution
Notice that K is the intersection of the two closed sets K1 =
{(x, y, z) |x2 + y2 ≤ 4} and K2 = {(x, y, z) |x+ y + z = 1}. Hence, K
is a closed set. If (x, y, z) is in K, x2 + y2 ≤ 4. Thus, |x| ≤ 2, |y| ≤ 2 and
hence |z| ≤ 1 + |x| + |y| ≤ 5. This shows that K is bounded. Since K is
closed and bounded, f : K → R is continuous, f : K → R has maximum
and minimum values.
Let
D =
{
(x, y, z) |x2 + y2 < 4, x+ y + z = 1
}
,
C =
{
(x, y, z) |x2 + y2 = 4, x+ y + z = 1
}
.
Then K = C ∪ D. We can consider the extremizers of f : D → R and
f : C → R separately.
To find the extremizers of f : D → R, we can regard this as a constraint
optimization problem where we want to find the extreme values of f : O →
R, f(x, y, z) = x+ 3y + z on
O =
{
(x, y, z) |x2 + y2 < 4
}
,
subject to the constraint g(x, y, z) = 0, where g : O → R is the function
g(x, y, z) = x+ y + z − 1. Now ∇g(x, y, z) = (1, 1, 1) ̸= 0. Hence, at an
extremizer, we must have ∇f(x, y, z) = λg(x, y, z), which gives
(1, 3, 1) = λ(1, 1, 1).
Chapter 5. The Inverse and Implicit Function Theorems 340
This says that the two vectors (1, 3, 1) and (1, 1, 1) must be parallel, which
is a contradiction. Hence, f : O → R does not have extremizers.
Now, to find the extremizers of f : C → R, we can consider it as finding
the extreme values of f : R3 → R, f(x, y, z) = x + 3y + z, subject to
G(x, y, z) = 0, where
G(x, y, z) = (x2 + y2 − 4, x+ y + z − 1).
Now
DG(x, y, z) =
[
2x 2y 0
1 1 1
]
.
This matrix has rank less than 2 if and only if (2x, 2y, 0) is parallel to
(1, 1, 1), which gives x = y = z = 0. But the point (x, y, z) = (0, 0, 0) is
not on C. Therefore, DG(x, y, z) has maximal rank for every (x, y, z) ∈
C. Using the Lagrange multiplier method, to solve for the extremizer of
f : C → R, we need to solve the system
∇f(x, y, z) = λ∇G1(x, y, z) + µG2(x, y, z), G(x, y, z) = 0.
These gives
1 = 2λx+ µ, 3 = 2λy + µ, 1 = µ,
x2 + y2 = 4, x+ y + z = 1.
From µ = 1, we have 2λx = 0 and 2λy = 2. The latter implies that λ ̸= 0.
Hence, we must have x = 0. Then x2 + y2 = 4 gives y = ±2. When
(x, y) = (0, 2), z = −1. When (x, y) = (0,−2), z = 3. Hence, we only
have two candidates for extremizers, which are (0, 2,−1) and (0,−2, 3).
Since
f(0, 2,−1) = 5, f(0,−2, 3) = −3,
we find that f : K → R has maximum value 5 at the point (0, 2,−1), and
minimum value −3 at the point (0,−2, 3).
Chapter 5. The Inverse and Implicit Function Theorems 341
Exercises 5.4
Question 1
Find the extreme values of the function f(x, y, z) = 4x2 + y2 + yz + z2 on
the set
S =
{
(x, y, z) | 2x2 + y2 + z2 ≤ 8
}
.
Question 2
Find the point in the set
S =
{
(x, y) | 4x2 + y2 ≤ 36, x2 + 4y2 ≥ 4
}
that is closest to and farthest from the point (1, 0).
Question 3
Use the Lagrange multiplier method to find the maximum and minimum
values of the function f(x, y, z) = x+ 2y − z on the set
S =
{
(x, y, z) |x2 + y2 + 4z2 ≤ 84
}
,
and the points where each of them appears.
Question 4
Find the extreme values of the function f(x, y, z) = x on the set
S =
{
(x, y, z) |x2 = y2 + z2, 7x+ 3y + 4z = 60
}
.
Question 5
Let K be the subset of R3 given by
K =
{
(x, y, z) | 4x2 + z2 ≤ 68, y + z = 12
}
.
Find the maximum and minimum values of the function f : K → R,
f(x, y, z) = x+ 2y.
Chapter 5. The Inverse and Implicit Function Theorems 342
Question 6
Let A be an n×n symmetric matrix, and let QA : Rn → R be the quadratic
formQA(x) = xTAx defined byA. Show that the minimum and maximum
values of QA : Sn−1 → R on the unit sphere Sn−1 are the smallest and
largest eigenvalues of A.
Chapter 6. Multiple Integrals 343
Chapter 6
Multiple Integrals
For a single variable functions, we have discussed the Riemann integrability of
a function f : [a, b] → R defined on a compact interval [a, b]. In this chapter,
we consider the theory of Riemann integrals for multivariable functions. For a
function F : D → Rm that takes values in Rm with m ≥ 2, we define the integral
componentwise. Namely, we say that the function F : D → Rm is Riemann
integrable if and only if each of the component functions Fj : D → R, 1 ≤ j ≤ m
is Riemann integrable, and we define∫
D
F =
(∫
D
F1,
∫
D
F2, . . . ,
∫
D
Fm
)
.
Thus, in this chapter, we will only discuss the theory of integration for functions
f : D → R that take values in R.
A direct generalization of a compact interval [a, b] to Rn is a product of compact
intervals I =
n∏
i=1
[ai, bi], which is a closed rectangle. In this chapter, when we say I
is a rectangle, it means I can be written as
n∏
i=1
[ai, bi] with ai < bi for all 1 ≤ i ≤ n.
The edges of I =
n∏
i=1
[ai, bi] are [a1, b1], [a2, b2], . . ., [an, bn].
We first discuss the integration theory of functions defined on closed rectangles
of the form
n∏
i=1
[ai, bi]. For applications, we need to consider functions defined on
other subsets D of Rn.
One of the most useful theoretical tools for evaluating single integrals is the
fundamental theorem of calculus. To apply this tool for multiple integrals, we
need to consider iterated integrals. Another useful tool is the change of variables
formula. For multivariable functions, the change of variables theorem is much
more complicated. Nevertheless, we will discuss these in this chapter.
Chapter 6. Multiple Integrals 344
6.1 Riemann Integrals
In this section, we define the Riemann integral of a function f : D → R defined
on a subset D of Rn. We first consider the case where D =
n∏
i−1
[ai, bi].
Let us first consider partitions. We say that P = {x0, x1, . . . , xk} is a partition
of the interval [a, b] if a = x0 < x1 < · · · < xk−1 < xk = b. It divides [a, b] into
k subintervals J1, . . . , Jk, where Ji = [xi−1, xi].
Definition 6.1 Partitions
A partition P of a closed rectangle I =
n∏
i=1
[ai, bi] is achieved by having a
partition Pi of [ai, bi] for each 1 ≤ i ≤ n. We write P = (P1, P2, . . . , Pn)
for such a partition. The partition P divides the rectangle I into a collection
JP of rectangles, any two of which have disjoint interiors. A closed
rectangle J in JP can be written as
J = J1 × J2 × · · · × Jn,
where Ji, 1 ≤ i ≤ n is a subinterval in the partition Pi.
If the partition Pi divides [ai, bi] into ki subintervals, then the partition P =
(P1, . . . , Pn) divides the rectangle I =
n∏
i=1
[ai, bi] into |JP| = k1k2 · · · kn rectangles.
Example 6.1
Consider the rectangle I = [−2, 9] × [1, 6]. Let P1 = {−2, 0, 4, 9} and
P2 = {1, 3, 6}. The partition P1 divides the interval I1 = [−2, 9] into
the three subintervals [−2, 0], [0, 4] and [4, 9]. The partition P2 divides the
interval I2 = [1, 6] into the two subintervals [1, 3] and [3, 6]. Therefore,
the partition P = (P1, P2) divides the rectangle I into the following six
rectangles.
[−2, 0]× [1, 3], [0, 4]× [1, 3], [4, 9]× [1, 3],
[−2, 0]× [3, 6], [0, 4]× [3, 6], [4, 9]× [3, 6].
Chapter 6. Multiple Integrals 345
Figure 6.1: A partition of the rectangle [−2, 9]× [1, 6] given in Example 6.1.
Definition 6.2 Regular and Uniformly Regular Partitions
Let I =
n∏
i=1
[ai, bi] be a rectangle in Rn. We say that P = (P1, . . . , Pn) is
a regular partition of I if for each 1 ≤ i ≤ n, Pi is a regular partition of
[ai, bi] into ki intervals. We say that P is a uniformly regular partition of P
into kn rectangles if for each 1 ≤ i ≤ n, Pi is a regular partition of [ai, bi]
into k intervals.
Example 6.2
Consider the rectangle I = [−2, 7]× [−4, 8].
(a) The partition P = (P1, P2) where P1 = {−2, 1, 4, 7} and P2 =
{−4,−1, 2, 5, 8} is a regular partition of I.
(b) The partition P = (P1, P2) where P1 = {−2, 1, 4, 7} and P2 =
{−4, 0, 4, 8} is a uniformly regular partition of I into 32 = 9 rectangles.
The length of an interval [a, b] is b − a. The area of a rectangle[a, b] × [c, d]
is (b− a)× (d− c). In general, we define the volume of a closed rectangle of the
form I =
n∏
i=1
[ai, bi] in Rn as follows.
Chapter 6. Multiple Integrals 346
Figure 6.2: A regular and a uniformly regular partition of [−2, 7] × [−4, 8]
discussed in Example 6.2.
Definition 6.3 Volume of a Rectangle
The volume of the closed rectangle I =
n∏
i=1
[ai, bi] is defined as the product
of the lengths of all its edges. Namely,
vol (I) =
n∏
i=1
(bi − ai).
Example 6.3
The volume of the rectangle I = [−2, 9]× [1, 6] is
vol (I) = 11× 5 = 55.
When P = {x0, x1, . . . , xk} is a partition of [a, b], it divides [a, b] into k
subintervals J1, . . . , Jk, where Ji = [xi−1, xi]. Notice that
k∑
i=1
vol (Ji) =
k∑
i=1
(xi − xi−1) = b− a.
Assume that P = (P1, · · · , Pn) is a partition of the rectangle I =
n∏
i=1
[ai, bi] in Rn.
Then for 1 ≤ i ≤ n, Pi is a partition of [ai, bi]. If Pi divides [ai, bi] into the ki
Chapter 6. Multiple Integrals 347
subintervals Ji,1, Ji,2, . . . , Ji,ki , then the collection of rectangles in the partition P
is
JP = {J1,m1 × · · · × Jn,mn | 1 ≤ mi ≤ ki for 1 ≤ i ≤ n} .
Notice that
vol (J1,m1 × · · · × Jn,mn) = vol (J1,m1)× · · · × vol (Jn,mn).
From this, we obtain the sum of volumes formula:
∑
J∈JP
vol (J) =
kn∑
mn=1
· · ·
k1∑
m1=1
vol (J1,m1)× · · · × vol (Jn,mn)
=
[
k1∑
m1=1
vol (J1,m1)
]
× · · · ×
[
kn∑
mn=1
vol (Jn,mn)
]
= (b1 − a1)× · · · × (bn − an)
= vol (I).
Proposition 6.1
Let P be a partition of I =
n∏
i=1
[ai, bi]. Then the sum of the volumes of the
rectangles J in the partition P is equal to the volume of the rectangle I.
One of the motivations to define the integral
∫
I
f for a nonnegative function
f : I → R is to find the volume bounded between the graph of f and the rectangle
I in Rn+1. To find the volume, we partition I into small rectangles, pick a point
ξJ in each of these rectangles J, and approximate the function on J as a constant
given by the value f(ξJ). The volume between the rectangle J and the graph of
f over J is then approximated by f(ξJ) vol (J). This leads us to the concept of
Riemann sums.
If P is a partition of I =
n∏
i=1
[ai, bi], we say that A is a set of intermediate
points for the partition P if A = {ξJ |J ∈ JP} is a subset of I indexed by JP,
such that ξJ ∈ J for each J ∈ JP.
Chapter 6. Multiple Integrals 348
Definition 6.4 Riemann Sums
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a function defined on I. Given
a partition P of I, a set A = {ξJ |J ∈ JP} of intermediate points for the
partition P, the Riemann sum of f with respect to the partition P and the
set of intermediate points A = {ξJ} is the sum
R(f,P, A) =
∑
J∈JP
f(ξJ)vol (J).
Example 6.4
Let I = [−2, 9] × [1, 6], and let P = (P1, P2) be the partition of I with
P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Let f : I → R be the function
defined as f(x, y) = x2 + y. Consider a set of intermediate points A as
follows.
J ξJ f(ξJ) vol (J)
[−2, 0]× [1, 3] (−1, 1) 2 4
[−2, 0]× [3, 6] (0, 3) 3 6
[0, 4]× [1, 3] (1, 1) 2 8
[0, 4]× [3, 6] (2, 4) 8 12
[4, 9]× [1, 3] (4, 2) 18 10
[4, 9]× [3, 6] (9, 3) 84 15
The Riemann sum R(f,P, A) is equal to
2× 4 + 3× 6 + 2× 8 + 8× 12 + 18× 10 + 84× 15 = 1578.
Example 6.5
If f : I → R is the constant function f(x) = c, then for any partition P of
I and any set of intermediate points A = {ξJ},
R(f,P, A) = c vol (I).
When c > 0, this is the volume of the rectangle I× [0, c] in Rn+1.
Chapter 6. Multiple Integrals 349
As in the single variable case, Darboux sums provide bounds for Riemann
sums.
Definition 6.5 Darboux Sums
Let I =
n∏
i=1
[ai, bi] , and let f : I → R be a bounded function defined on
I. Given a partition P of I, let JP be the collection of rectangles in the
partition P. For each J in JP, let
mJ = inf {f(x) |x ∈ J} and MJ = sup {f(x) |x ∈ J} .
The Darboux lower sum L(f,P) and the Darboux upper sum U(f,P) are
defined as
L(f,P) =
∑
J∈JP
mJ vol (J) and U(f,P) =
∑
J∈JP
MJ vol (J).
Example 6.6
If f : I → R is the constant function f(x) = c, then
L(f,P) = c vol (I) = U(f,P) for any partition P of I.
Example 6.7
Consider the function f : I → R, f(x, y) = x2 + y defined in Example
6.4, where I = [−2, 9] × [1, 6]. For the partition P = (P1, P2) with P1 =
{−2, 0, 4, 9} and P2 = {1, 3, 6}, we have the followings.
J mJ MJ vol (J)
[−2, 0]× [1, 3] 02 + 1 = 1 (−2)2 + 3 = 7 4
[−2, 0]× [3, 6] 02 + 3 = 3 (−2)2 + 6 = 10 6
[0, 4]× [1, 3] 02 + 1 = 1 42 + 3 = 19 8
[0, 4]× [3, 6] 02 + 3 = 3 42 + 6 = 22 12
[4, 9]× [1, 3] 42 + 1 = 17 92 + 3 = 84 10
[4, 9]× [3, 6] 42 + 3 = 19 92 + 6 = 87 15
Chapter 6. Multiple Integrals 350
Therefore, the Darboux lower sum is
L(f,P) = 1× 4 + 3× 6 + 1× 8 + 3× 12 + 17× 10 + 19× 15 = 521;
while the Darboux upper sum is
U(f,P) = 7× 4+10× 6+19× 8+22× 12+84× 10+87× 15 = 2649.
Notice that we can only define Darboux sums if the function f : I → R is
bounded. This means that there are constants m and M such that
m ≤ f(x) ≤M for all x ∈ I.
If P is a partition of the rectangle I, and J is a rectangle in the partition P, ξJ is a
point in J, then
m ≤ mJ ≤ f(ξJ) ≤MJ ≤M.
Multipluying throughout by vol (J) and summing over J ∈ JP, we obtain the
following.
Proposition 6.2
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
If
m ≤ f(x) ≤M for all x ∈ I,
then for any partition P of I, and for any choice of intermediate points
A = {ξJ} for the partition P, we have
m vol (I) ≤ L(f,P) ≤ R(f,P, A) ≤ U(f,P) ≤ M vol (I).
To study the behaviour of the Darboux sums when we modify the partitions,
we first extend the concept of refinement of a partition to rectangles in Rn. Recall
that if P and P ∗ are partitions of the interval [a, b], P ∗ is a refinement of P if each
partition point of P is also a partition point of P ∗.
Chapter 6. Multiple Integrals 351
Definition 6.6 Refinement of a Partition
Let I =
n∏
i=1
[ai, bi], and let P = (P1, . . . , Pn) and P∗ = (P ∗
1 , . . . , P
∗
n) be
partitions of I. We say that P∗ is a refinement of P if for each 1 ≤ i ≤ n,
P ∗
i is a refinement of Pi.
Figure 6.3: A refinement of the partition of the rectangle [−2, 9] × [1, 6] given in
Figure 6.1.
Example 6.8
Let us consider the partition P = (P1, P2) of the rectangle I = [−2, 9] ×
[1, 6] given in Example 6.1, with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}.
Let P ∗
1 = {−2, 0, 1, 4, 6, 9} and P ∗
2 = {1, 3, 4, 6}. Then P∗ = (P ∗
1 , P
∗
2 ) is
a refinement of P.
If the partition P∗ is a refinement of the partition P, then for each J in JP, P∗
induces a partition of J, which we denote by P∗(J).
Example 6.9
The partition P∗ in Example 6.8 induces the partition P∗(J) =
(P ∗
1 (J), P
∗
2 (J)) of the rectangle J = [0, 4]× [3, 6], where P ∗
1 (J) = {0, 1, 4}
and P ∗
2 (J) = {3, 4, 6}. The partition P∗(J) divides the rectangle J into 4
rectangles, as shown in Figure 6.3.
Chapter 6. Multiple Integrals 352
If the partition P∗ is a refinement of the partition P, then the collection of
rectangles in P∗ is the union of the collection of rectangles in P∗(J) when J
ranges over the collection of rectangles in P. Namely,
JP∗ =
⋃
J∈JP
JP∗(J).
Using this, we can deduce the following.
Proposition 6.3
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
If P and P∗ are partitions of I and P∗ is a refinement of P, then
L(f,P∗) =
∑
J∈JP
L(f,P∗(J)), U(f,P∗) =
∑
J∈JP
U(f,P∗(J)).
From this, we can show that a refinement improves the Darboux sums, in the
sense that a lower sum increases, and an upper sum decreases.
Theorem 6.4
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
If P and P∗ are partitions of I and P∗ is a refinement of P, then
L(f,P) ≤ L(f,P∗) ≤ U(f,P∗) ≤ U(f,P).
Proof
For each rectangle J in the partition P,
mJ ≤ f(x) ≤MJ for all x ∈ J.
Applying Proposition 6.2 to the function f : J → R and the partition
P∗(J), we find that
mJ vol (J) ≤ L(f,P∗(J)) ≤ U(f,P∗(J)) ≤MJ vol (J).
Chapter 6. Multiple Integrals 353
Summing over J ∈ JP, we find that
L(f,P)≤
∑
J∈JP
L(f,P∗(J)) ≤
∑
J∈JP
U(f,P∗(J)) ≤ U(f,P).
The assertion follows from Proposition 6.3.
It is difficult to visualize the Darboux sums with a multivariable functions.
Hence, we illustrate refinements improve Darboux sums using single variable
functions, as shown in Figure 6.4 and Figure 6.5.
Figure 6.4: A refinement of the partition increases the Darboux lower sum.
Figure 6.5: A refinement of the partition decreases the Darboux upper sum.
As a consequence of Theorem 6.4, we can prove the following.
Chapter 6. Multiple Integrals 354
Corollary 6.5
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
For any two partitions P1 and P2 of I,
L(f,P1) ≤ U(f,P2).
Proof
Let P1 = (P1,1, P1,2, . . . , P1,n) and P2 = (P2,1, P2,2, . . . , P2,n). For 1 ≤
i ≤ n, let P ∗
i be the common refinement of P1,i and P2,i obtained by taking
the union of the partition points in P1,i and P2,i. Then P∗ = (P ∗
1 , . . . , P
∗
n)
is a common refinement of the partitions P1 and P2. By Theorem 6.4,
L(f,P1) ≤ L(f,P∗) ≤ U(f,P∗) ≤ U(f,P2).
Now we define lower and upper integrals of a bounded function f : I → R .
Definition 6.7 Lower Integrals and Upper Integrals
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
Let SL(f) be the set of Darboux lower sums of f , and let SU(f) be the set
of Darboux upper sums of f .
1. The lower integral of f , denoted by
∫
I
f , is defined as the least upper
bound of the Darboux lower sums.∫
I
f = supSL(f) = sup {L(f,P) |P is a partition of I} .
2. The upper integral of f , denoted by
∫
I
f , is defined as the greatest lower
bound of the Darboux upper sums.∫
I
f = inf SU(f) = inf {U(f,P) |P is a partition of I} .
Chapter 6. Multiple Integrals 355
Example 6.10
If f : I → R is the constant function f(x) = c, then for any partition P of
I,
L(f,P) = c vol (I) = U(f,P).
Therefore, both SL(f) and SU(f) are the one-element set {c vol (I)}. This
shows that ∫
I
f =
∫
I
f = c vol (I).
For a constant function, the lower integral and the upper integral are the same.
For a general bounded funtion, we have the following.
Theorem 6.6
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
Then we have ∫
I
f ≤
∫
I
f.
Proof
By Corollary 6.5, every element of SL(f) is less than or equal to any
element of SU(f). This implies that∫
I
f = supSL(f) ≤ inf SU(f) =
∫
I
f.
Example 6.11 The Dirichlet’s Function
Let I =
n∏
i=1
[ai, bi], and let f : I → R be the function defined as
f(x) =
1, if all components of x are rational,
0, otherwise.
This is known as the Dirichlet’s function. Find the lower inegral and the
upper integral of f : I → R.
Chapter 6. Multiple Integrals 356
Solution
Let P = (P1, . . . , Pn) be a partition of I. A rectangle J in the partition P
can be written in the form J =
n∏
i=1
[ui, vi]. By denseness of rational numbers
and irrational numbers, there exist a rational number αi and an irrational
number βi in (ui, vi). Let α = (α1, . . . , αn) and β = (β1, . . . , βn). Then α
and β are points in J, and
0 = f(β) ≤ f(x) ≤ f(α) = 1 for all x ∈ J.
Therefore,
mJ = inf
x∈J
f(x) = 0, MJ = sup
x∈J
f(x) = 1.
It follows that
L(f,P) =
∑
J∈JP
mJ vol (J) = 0,
U(f,P) =
∑
J∈JP
MJ vol (J) =
∑
J∈JP
vol (J) = vol (I).
Therefore,
SL(f) = {0}, while SU(f) = {vol (I)}.
This shows that the lower inegral and the upper integral of f : I → R are
given respectively by ∫
I
f = 0 and
∫
I
f = vol (I).
As we mentioned before, one of the motivations to define the integral f : I →
R is to calculate volumes. Given that f : I → R is a nonnegative continuous
function defined on the rectangle I in Rn, let
S = {(x, y) |x ∈ I, 0 ≤ y ≤ f(x)} ,
which is the solid bounded between I and the graph of f . It is reasonable to expect
that S has a volume, which we denote by vol (S). We want to define the integral
Chapter 6. Multiple Integrals 357
∫
I
f so that it gives vol (S). Notice that if P is a partition of I, then the Darboux
lower sum
L(f,P) =
∑
J∈JP
mJ vol (J)
is the sum of volumes of the collection of rectangles
{J× [0,mJ] |J ∈ JP}
in Rn+1, each of which is contained in S. Since any two of these rectangles can
only intersect on the boundaries, it is reasonable to expect that
L(f,P) ≤ vol (S).
Similarly, the Darboux upper sum
U(f,P) =
∑
J∈JP
MJ vol (J)
is the sum of volumes of the collection of rectangles
{J× [0,MJ] |J ∈ JP}
in Rn+1, the union of which contains S. Therefore, it is reasonable to expect that
vol (S) ≤ U(f,P).
Hence, the volume of S should be a number between L(f,P) and U(f,P) for
any partition P. To make the volume well-defined, there should be only one
number between L(f,P) and U(f,P) for all partitions P. By definition, any
number between the lower integral and the upper integral is in between L(f,P)
and U(f,P) for any partition P. Hence, to have the volume well-defined, we must
require the lower integral and the upper integral to be the same. This motivates
the following definition of integrability for a general bounded function.
Chapter 6. Multiple Integrals 358
Definition 6.8 Riemann integrability
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
We say that f : I → R is Riemann integrable, or simply integrable, if∫
I
f =
∫
I
f.
In this case, we define the integral of f over the rectangle I as∫
I
f =
∫
I
f =
∫
I
f.
It is the unique number larger than or equal to all Darboux lower sums, and
smaller than or equal to all Darboux upper sums.
Example 6.12
Example 6.10 says that a constant function f : I → R, f(x) = c is
integrable and ∫
I
f = c vol (I).
Example 6.13
The Dirichlet’s function defined in Example 6.11 is not Riemann integrable
since the lower integral and the upper integral are not equal.
Leibniz Notation for Riemann Integrals
The Leibniz notation of the Riemann integral of f : I → R is∫
I
f(x)dx, or equivalently,
∫
I
f(x1, . . . , xn)dx1 · · · dxn.
As in the single variable case, there are some criteria for Riemann integrability
which follows directly from the criteria that the lower integral and the upper
integral are the same.
Chapter 6. Multiple Integrals 359
Theorem 6.7
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
The following are equivalent.
(a) The function f : I → R is Riemann integrable.
(b) For every ε > 0, there is a partition P of the rectangle I such that
U(f,P)− L(f,P) < ε.
We define an Archimedes sequence of partitions exactly the same as in the
single variable case.
Definition 6.9 Archimedes Sequence of Partitions
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
If {Pk} is a sequence of partitions of the rectangle I such that
lim
k→∞
(U(f,Pk)− L(f,Pk)) = 0,
we call {Pk} an Archimedes sequence of partitions for the function f .
Then we have the following theorem.
Theorem 6.8 The Archimedes-Riemann Theorem
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on
I. The function f : I → R is Riemann integrable if and only if f has an
Archimedes sequence of partitions {Pk}. In this case, the integral
∫
I
f can
be computed by ∫
I
f = lim
k→∞
L(f,Pk) = lim
k→∞
U(f,Pk).
A candidate for an Archimedes sequence of partitions is the sequence {Pk},
Chapter 6. Multiple Integrals 360
where Pk is the uniformly regular partition of I into kn rectangles.
Example 6.14
Let I = [0, 1]× [0, 1]. Consider the function f : I → R defined as
f(x, y) =
1, if x ≥ y,
0, if x < y.
For k ∈ Z+, let Pk be the uniformly regular partition of I into k2 rectangles.
(a) For each k ∈ Z+, compute the Darboux lower sum L(f,Pk) and the
Darboux upper sum U(f,Pk).
(b) Show that f : I → R is Riemann integrable and find the integral
∫
I
f .
Solution
Fixed k ∈ Z+, let Pk = {u0, u1, . . . , uk}, where ui =
i
k
for 0 ≤ i ≤ k.
Then Pk = (Pk, Pk), and it divides I = [0, 1]× [0, 1] into the k2 rectangles
Ji,j , 1 ≤ i ≤ k, 1 ≤ j ≤ k, where Ji,j = [ui−1, ui]× [uj−1,uj]. We have
vol (Ji,j) =
1
k2
.
Let
mi,j = inf
(x,y)∈Ji,j
f(x, y) and Mi,j = sup
(x,y)∈Ji,j
f(x, y).
Notice that if i < j − 1, then
x ≤ ui < uj−1 ≤ y for all (x, y) ∈ Ji,j.
Hence,
f(x, y) = 0 for all (x, y) ∈ Ji,j.
This implies that
mi,j =Mi,j = 0 when i < j − 1.
Chapter 6. Multiple Integrals 361
If i ≥ j + 1, then
x ≥ ui−1 ≥ uj ≥ y for all (x, y) ∈ Ji,j.
Hence,
f(x, y) = 1 for all (x, y) ∈ Ji,j.
This implies that
mi,j =Mi,j = 1 when i ≥ j + 1.
When i = j − 1, if (x, y) is in Ji,j ,
x ≤ ui = uj−1 ≤ y,
and x = y if and only if (x, y) is the point (ui, uj−1). Hence, f(x, y) = 0
for all (x, y) ∈ Ji,j , except for (x, y) = (ui, uj−1), where f(ui, uj−1) = 1.
Hence,
mi,j = 0, Mi,j = 1 when i = j − 1.
When i = j, 0 ≤ f(x, y) ≤ 1 for all (x, y) ∈ Ji,j . Since (ui−1, uj) and
(ui, uj) are in Ji,j , and f(ui−1, uj) = 0 while f(ui, uj) = 1, we find that
mi,j = 0, Mi,j = 1 when i = j.
It follows that
L(f,Pk) =
k∑
i=1
k∑
j=1
mi,j vol (Ji,j) =
k∑
i=2
i−1∑
j=1
1
k2
=
1
k2
k∑
i=2
(i− 1) =
1
k2
k−1∑
i=1
i =
k(k − 1)
2k2
.
U(f,Pk) =
k∑
i=1
k∑
j=1
Mi,j vol (Ji,j) =
k−1∑
i=1
i+1∑
j=1
1
k2
+
k∑
j=1
1
k2
=
1
k
+
1
k2
k−1∑
i=1
(i+ 1) =
1
k2
(
k(k + 1)
2
− 1 + k
)
=
k2 + 3k − 2
2k2
.
Chapter 6. Multiple Integrals 362
Since
U(f,Pk)− L(f,Pk) =
2k − 1
k2
for all k ∈ Z+,
we find that
lim
k→∞
(U(f,Pk)− L(f,Pk)) = 0.
Hence, {Pk} is an Archimedes sequence of partitions for f . By the
Arichimedes-Riemann theorem, f : I → R is Riemann integrable, and∫
I
f = lim
k→∞
L(f,Pk) = lim
k→∞
k(k − 1)
2k2
=
1
2
.
Figure 6.6: This figure illustrates the different cases considered in Example 6.14
when k = 8.
As in the single variable case, there is an equivalent definition for Riemann
integrability using Riemann sums.
For a partition P = {x0, x1, . . . , xk} of an interval [a, b], we define the gap of
the partition P as
|P | = max {xi − xi−1 | 1 ≤ i ≤ k} .
For a closed rectangle I =
n∏
i=1
[ai, bi], we replace the length xi−xi−1 of an interval
in the partition by the diameter of a rectangle in the partition. Recall that the
Chapter 6. Multiple Integrals 363
diameter of a rectangle J =
n∏
i=1
[ui, vi] is
diamJ =
√
(v1 − u1)2 + · · ·+ (vn − un)2.
Definition 6.10 Gap of a Partition
Let P be a partition of the rectangle I =
n∏
i=1
[ai, bi]. Then the gap of the
partition P is defined as
|P| = max {diamJ |J ∈ JP} .
Example 6.15
Find the gap of the partition P = (P1, P2) of the rectangle I = [−2, 9] ×
[1, 6] defined in Example 6.1, where P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}.
Solution
The length of the three invervals in the partition P1 = {−2, 0, 4, 9} of the
interval [−2, 9] are 2, 4 and 5 respectively. The lengths of the two intervals
in the partition P2 = {1, 3, 6} of the interval [1, 6] are 2 and 3 respectively.
Therefore, the diameters of the 6 rectangles in the partition P are
√
22 + 22,
√
42 + 22,
√
52 + 22,
√
22 + 32,
√
42 + 32,
√
52 + 32.
From this, we see that the gap of P is
√
52 + 32 =
√
34.
In the example above, notice that |P1| = 5 and |P2| = 3. In general, it is not
difficult to see the following.
Proposition 6.9
Let P = (P1, . . . , Pn) be a partition of the closed rectangle I =
n∏
i=1
[ai, bi].
Then
|P| =
√
|P1|2 + · · ·+ |Pn|2.
Chapter 6. Multiple Integrals 364
The following theorem gives equivalent definitions of Riemann integrability
of a bounded function.
Theorem 6.10 Equivalent Definitions for Riemann Integrability
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
The following three statements are equivalent for saying that f : I → R is
Riemann integrable.
(a) The lower integral and the upper integral are the same. Namely,∫
I
f =
∫
I
f.
(b) There exists a number I that satisfies the following. For any ε > 0,
there exists a δ > 0 such that if P is a partition of the rectangle I with
|P| < δ, then
|R(f,P, A)− I| < ε
for any choice of intermediate points A = {ξJ} for the partition P.
(c) For any ε > 0, there exists a δ > 0 such that if P is a partition of the
rectangle I with |P| < δ, then
U(f,P)− L(f,P) < ε.
The most useful definition is in fact the second one in terms of Riemann sums.
It says that a bounded function f : I → R is Riemann integrable if the limit
lim
|P|→0
R(f,P, A)
exists. As a consequence of Theorem 6.10, we have the following.
Chapter 6. Multiple Integrals 365
Theorem 6.11
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I. If
f : I → R is Riemann integrable, then for any sequence {Pk} of partitions
of I satisfying
lim
k→∞
|Pk| = 0,
we have
(i)
∫
I
f = lim
k→∞
L(f,Pk) = lim
k→∞
U(f,Pk).
(ii)
∫
I
f = lim
k→∞
R(f,Pk, Ak), where for each k ∈ Z+, Ak is a choice of
intermediate points for the partition Pk.
The proof is exactly the same as the single variable case. The contrapositive
of Theorem 6.11 gives the following.
Theorem 6.12
Let I =
n∏
i=1
[ai, bi], and let f : I → R be a bounded function defined on I.
Assume that {Pk} is a sequence of partitions of I such that
lim
k→∞
|Pk| = 0.
(a) If for each k ∈ Z+, there exists a choice of intermediate points Ak for
the partition Pk such that the limit lim
k→∞
R(f,Pk, Ak) does not exist,
then f : I → R is not Riemann integrable.
(b) If for each k ∈ Z+, there exist two choices of intermediate points Ak
andBk for the partition Pk so that the two limits lim
k→∞
R(f,Pk, Ak) and
lim
k→∞
R(f,Pk, Bk) are not the same, then f : I → R is not Riemann
integrable.
Theorem 6.12 is useful for justifying that a bounded function is not Riemann
integrable, without having to compute the lower integral or the upper integral. To
Chapter 6. Multiple Integrals 366
apply this theorem, we usually consider the sequence of partitions {Pk}, where
Pk is the uniformly regular partition of I into kn rectangles.
Example 6.16
Let I = [0, 1]× [0, 1], and let f : I → R be the function defined as
f(x, y) =
0, if x is rational,
y, if x is irrational.
Show that f : I → R is not Riemann integrable.
Solution
For k ∈ Z+, let Pk be the uniformly regular partition of I into k2 rectangles.
Then Pk = (Pk, Pk), where Pk = {u0, u1, . . . , uk} with ui =
i
k
when
0 ≤ i ≤ k. Notice that |Pk| =
√
2
k
, and so lim
k→∞
Pk = 0.
The partition Pk divides the square I into k2 squares Ji,j , 1 ≤ i ≤ k,
1 ≤ j ≤ k, where Ji,j = [ui−1, ui] × [uj−1, uj]. For 1 ≤ i ≤ k, since
irrational numbers are dense, there is an irrational number ci in the interval
(ui−1, ui). For 1 ≤ i ≤ k, 1 ≤ j ≤ k, let αi,j and βi,j be the points in Ji,j
given respectively by
αi,j = (ui, uj), βi,j = (ci, uj).
Then
f(αi,j) = 0, f(βi,j) = uj.
Let Ak = {αi,j} and Bk =
{
βi,j
}
. Then the Riemann sums R(f,Pk, Ak)
and R(f,Pk, Bk) are given respectively by
R(f,Pk, Ak) =
k∑
i=1
k∑
j=1
f(αi,j) vol (Ji,j) = 0,
Chapter 6. Multiple Integrals 367
and
R(f,Pk, Bk) =
k∑
i=1
k∑
j=1
f(βi,j) vol (Ji,j) =
k∑
i=1
k∑
j=1
j
k
× 1
k2
=
k × k(k + 1)
2k3
=
k + 1
2k
.
Therefore, we find that
lim
k→∞
R(f,Pk, Ak) = 0, lim
k→∞
R(f,Pk, Bk) =
1
2
.
Since the two limits are not the same, we conclude that f : I → R is not
Riemann integrable.
Now we return to the proof of Theorem 6.10. To prove this theorem, it is easier
to show that (a) is equivalent to (c), and (b) is equivalent to (c). We will prove the
equivalence of (a) and (c). The proof of the equivalence of (b) and (c) is left to the
exercises. It is a consequence of the inequality
L(f,P) ≤ R(f,P, A) ≤ U(f,P),
which holds for any partition P of the rectangle I, and any choice of intermediate
points A for the partition P.
By Theorem 6.7, (a) is equivalent to
(a′) For every ε > 0, there is a partition P of I such that
U(f,P)− L(f,P) < ε.
Thus, to prove the equivalence of (a) and (c), it is sufficient to show the equivalence
of (a′) and (c). But then (c) implies (a′) is obvious. Hence, we are left with the
most technical part, which is the proof of (a′) implies (c).
We formulate this as a standalone theorem.
Chapter 6. Multiple Integrals368
Theorem 6.13
Let I =
n∏
i=1
[ai, bi], and let P0 be a fixed a partition of I. Given that f : I →
R is a bounded function defined on I, for any ε > 0, there is a δ > 0 such
that for all partitions P of I, if |P| < δ, then
U(f,P)− L(f,P) < U(f,P0)− L(f,P0) + ε. (6.1)
If Theorem 6.13 is proved, we can show that (a′) implies (c) in Theorem
6.10 as follows. Given ε > 0, (a′) implies that we can choose a P0 such
that
U(f,P0)− L(f,P0) <
ε
2
.
By Theorem 6.13, there is a δ > 0 such that for all partitions P of I, if
|P| < δ, then
U(f,P)− L(f,P) < U(f,P0)− L(f,P0) +
ε
2
< ε.
This proves that (a′) implies (c).
Hence, it remains for us to prove theorem 6.13. Let us introduce some additional
notations. Given the rectangle I =
n∏
i=1
[ai, bi], for 1 ≤ i ≤ n, let
Si =
vol (I)
bi − ai
= (b1 − a1)× · · · × (bi−1 − ai−1)(bi+1 − ai+1)× · · · × (bn − an).
(6.2)
This is the area of the bounday of I that is contained in the hyperplane xi = ai or
xi = bi. For example, when n = 2, I = [a1, b1] × [a2, b2], S1 = b2 − a2 is the
length of the vertical side, while S2 = b1 − a1 is the length of the horizontal side
of the rectangle I.
Chapter 6. Multiple Integrals 369
Proof of Theorem 6.13
Since f : I → R is bounded, there is a positive number M such that
|f(x)| ≤M for all x ∈ I.
Assume that P0 = (P̃1, . . . , P̃n). For 1 ≤ i ≤ n, let ki be the number of
intervals in the partition P̃i. Let
K = max{k1, . . . , kn},
and
S = S1 + · · ·+ Sn,
where Si, 1 ≤ i ≤ n are defined by (6.2). Given ε > 0, let
δ =
ε
4MKS
.
Then δ > 0. If P = (P1, . . . , Pn) is a partition of I with |P| < δ, we want to
show that (6.1) holds. Let P∗ = (P ∗
1 , . . . , P
∗
n) be the common refinement of
P0 and P such that P ∗
i is the partition of [ai, bi] that contains all the partition
points of P̃i and Pi. For 1 ≤ i ≤ n, let Ui be the collection of intervals in
Pi which contain partition points of P̃i, and let Vi be the collection of the
intervals of Pi that is not in Ui. Each interval in Vi must be in the interior
of one of the intervals in P̃i. Thus, each interval in Vi is an interval in the
partition P ∗
i . Since each partition point of P̃i can be contained in at most
two intervals of Pi, but the first and last partition points of Pi and P̃i are the
same, we find that |Ui| ≤ 2ki.
Since |Pi| ≤ |P| < δ, each interval in Pi has length less than δ. Therefore,
the sum of the lengths of the intervals in Ui is less than 2kiδ. Let
Qi =
{
J ∈ JP | the ith-edge of J is from Ui
}
.
Then ∑
J∈Qi
vol (J) < 2kiδSi ≤ 2KδSi.
Chapter 6. Multiple Integrals 370
Figure 6.7: The partitions P0 and P in the proof of Theorem 6.13, P0 is the
partition with red grids, while P is the partition with blue grids. Those shaded
rectangles are rectangles in P that contain partition points of P0.
Now let
Q =
n⋃
i=1
Qi.
Then ∑
J∈Q
vol (J) < 2Kδ
n∑
i=1
Si = 2KδS.
For each of the rectangles J that is in Q, we do a simple estimate
MJ −mJ ≤ 2M.
Therefore, ∑
J∈Q
(MJ −mJ) vol (J) < 4MKδS ≤ ε.
For the rectangles J that are in JP \ Q, each of them is a rectangle in the
partition P∗. Therefore,∑
J∈JP\Q
(MJ −mJ) vol (J) ≤ U(f,P∗)−L(f,P∗) ≤ U(f,P0)−L(f,P0).
Chapter 6. Multiple Integrals 371
Hence,
U(f,P)− L(f,P) =
∑
J∈JP
(MJ −mJ) vol (J)
=
∑
J∈JP\Q
(MJ −mJ) vol (J) +
∑
J∈Q
(MJ −mJ) vol (J)
< U(f,P0)− L(f,P0) + ε.
This completes the proof.
Finally we extend Riemann integrals to functions f : D → R that are defined
on bounded subsets D of Rn. If D is bounded, there is a positive number L such
that
∥x∥ ≤ L for all x ∈ D.
This implies that D is contained in the closed rectangle IL =
n∏
i=1
[−L,L]. To
define the Riemann integral of f : D → R, we need to extend the domain of f
from D to IL. To avoid affecting the integral, we should extend by zero.
Definition 6.11 Zero Extension
Let D be a subset of Rn, and let f : D → R be a function defined on D.
The zero extension of f : D → R is the function f̌ : Rn → R which is
defined as
f̌(x) =
f(x), if x ∈ D,
0, if x /∈ D.
If U is any subset of Rn that contains D, then the zero extension of f to U
is the function f̌ : U → R.
Obviously, if f : D → R is a bounded function, its zero extension f̌ : Rn →
R is also bounded. Since we have defined Riemann integrability for a bounded
function g : I → R that is defined on a closed rectangle I, it is natural to say that
a function f : D → R is Riemann integrable if its zero extension f̌ : I → R to a
closed rectangle I is Riemann integrable, and define∫
D
f =
∫
I
f̌ .
Chapter 6. Multiple Integrals 372
For this to be unambiguous, we have to check that if I1 and I2 are closed rectangles
that contain the bounded set D, the zero extension f̌ : I1 → R is Riemann
integrable if and only if the zero extension f̌ : I2 → R is Riemann integrable.
Moreover, ∫
I1
f̌ =
∫
I2
f̌ .
This small technicality would be proved in Section 6.2. Assuming this, we can
give the following formal definition for Riemann integrality of a bounded function
defined on a bounded domain.
Definition 6.12 Riemann Integrals of General Functions
Let D be a bounded subset of Rn, and let I =
n∏
i=1
[ai, bi] be a closed
rectangle in Rn that contains D. Given that f : D → R is a bounded
function defined on D, we say that f : D → R is Riemann integrable if
its zero extension f̌ : I → R is Riemann integrable. If this is the case, we
define the integral of f over D as∫
D
f =
∫
I
f̌ .
Example 6.17
Let I = [0, 1]× [0, 1], and let f : I → R be the function defined as
f(x, y) =
1, if x ≥ y,
0, if x < y.
which is considered in Example 6.14. Let
D = {(x, y) ∈ I | y ≤ x} ,
and let g : D → R be the constant function g(x) = 1. Then f : I → R is
the zero extension of g to the square I that contains D.
Chapter 6. Multiple Integrals 373
In Example 6.14, we have shown that f : I → R is Riemann integrable and∫
I
f(x)dx =
1
2
.
Therefore, g : D → R is Riemann integrable and∫
D
g(x)dx =
1
2
.
Remark 6.1
Here we make two remarks about the Riemann integrals.
1. When f : D → R is the constant function, we should expect that it
is Riemann integrable if and only if D has a volume, which should be
defined as
vol (D) =
∫
D
dx.
2. If f : D → R is a nonnegative continuous function defined on the
bounded set D that has a volume, we would expect that f : D → R is
Riemann integrable, and the integral
∫
D
f(x)dx gives the volume of the
solid bounded between D and the graph of f .
In Section 6.3, we will give a characterization of sets D that have volumes.
We will also prove that if f : D → R is a continuous function defined on a set D
that has volume, then f : D → R is Riemann integrable.
Chapter 6. Multiple Integrals 374
Exercises 6.1
Question 1
Let I = [−5, 8] × [2, 5], and let P = (P1, P2) be the partition of I with
P1 = {−5,−1, 2, 7, 8} and P2 = {2, 4, 5}. Find gap of the partition P.
Question 2
Let I = [−5, 8] × [2, 5], and let f : I → R be the function defined as
f(x, y) = x2 + 2y. Consider the partition P = (P1, P2) of I with P1 =
{−5,−1, 2, 7, 8} and P2 = {2, 4, 5}. Find the Darboux lower sum L(f,P)
and the Darboux upper sum U(f,P).
Question 3
Let I = [−5, 8] × [2, 5], and let f : I → R be the function defined as
f(x, y) = x2 + 2y. Consider the partition P = (P1, P2) of I with P1 =
{−5,−1, 2, 7, 8} and P2 = {2, 4, 5}. For each rectangle J = [a, b] × [c, d]
in the partition P, let αJ = (a, c) and βJ = (b, d). Find the Riemann sums
R(f,P, A) and R(f,P, B), where A = {αJ} and B = {βJ}.
Question 4
Let I = [−1, 1]× [2, 5], and let f : I → R be the function defined as
f(x, y) =
1, if x and y are rational,
0, otherwise.
(a) Given that P is a partition of I, find the Darboux lower sum L(f,P)
and the Darboux upper sum U(f,P).
(b) Find the lower integral
∫
I
f and the upper integral
∫
I
f .
(c) Explain why f : I → R is not Riemann integrable.
Chapter 6. Multiple Integrals 375
Question 5
Let I = [0, 4]× [0, 2]. Consider the function f : I → R defined as
f(x, y) = 2x+ 3y + 1.
For k ∈ Z+, let Pk be the uniformly regular partition
Mathematical Analysis volume 2 (2)

Humanas / Sociais

Ferramentas de estudo

Conteúdos escolhidos para você

Calculus Volume 2 - Apostol

(Texts and Readings in Mathematics) Terence Tao - Analysis II-Springer (2016)

(Universitext) Hervé Le Dret - Nonlinear Elliptic Partial Differential Equations_ An Introduction-Springer (2018)

introdução à gemetria simmplética-anna canas

Jerry Shurman - Multivariable Calculus -Reed College (2011)

Perguntas dessa disciplina

Pergunta 2 Funções podem representar transformações entre dois conjuntos de valores. Em algumas situações, é possível reverter essa transformação,...

10 As operações de adição, subtração e multiplicação também podem ser aplicadas às matrizes, desde que preenchidos certos requisitos. Para que duas...

omo o nome já sugere, o Teorema Fundamental do Cálculo é um dos resultados mais importantes do Cálculo Diferencial e Integral, fazendo uma ligação ent

O Teorema de Bolzano, também conhecido como Teorema do Valor Intermediário para Zero, é um importante resultado da análise matemática que estabelec...

Em cada questão, apresente todos os cálculos e raciocínios feitos, justificando suas respostas. Questão 1 (2,0 pontos) Assinale V para verdadeiro o...

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Crie sua conta grátis para liberar esse material. 🤩

Conteúdos escolhidos para você

Calculus Volume 2 - Apostol

(Texts and Readings in Mathematics) Terence Tao - Analysis II-Springer (2016)

(Universitext) Hervé Le Dret - Nonlinear Elliptic Partial Differential Equations_ An Introduction-Springer (2018)

introdução à gemetria simmplética-anna canas

Jerry Shurman - Multivariable Calculus -Reed College (2011)

Perguntas dessa disciplina

Pergunta 2 Funções podem representar transformações entre dois conjuntos de valores. Em algumas situações, é possível reverter essa transformação,...

10 As operações de adição, subtração e multiplicação também podem ser aplicadas às matrizes, desde que preenchidos certos requisitos. Para que duas...

omo o nome já sugere, o Teorema Fundamental do Cálculo é um dos resultados mais importantes do Cálculo Diferencial e Integral, fazendo uma ligação ent

O Teorema de Bolzano, também conhecido como Teorema do Valor Intermediário para Zero, é um importante resultado da análise matemática que estabelec...

Em cada questão, apresente todos os cálculos e raciocínios feitos, justificando suas respostas. Questão 1 (2,0 pontos) Assinale V para verdadeiro o...

Mais conteúdos dessa disciplina