Prévia do material em texto
Mathematical Analysis Volume II Teo Lee Peng Mathematical Analysis Volume II Teo Lee Peng January 1, 2024 Contents i Contents Contents i Preface iv Chapter 1 Euclidean Spaces 1 1.1 The Euclidean Space Rn as a Vector Space . . . . . . . . . . . 1 1.2 Convergence of Sequences in Rn . . . . . . . . . . . . . . . . 23 1.3 Open Sets and Closed Sets . . . . . . . . . . . . . . . . . . . 33 1.4 Interior, Exterior, Boundary and Closure . . . . . . . . . . . . 46 1.5 Limit Points and Isolated Points . . . . . . . . . . . . . . . . 59 Chapter 2 Limits of Multivariable Functions and Continuity 66 2.1 Multivariable Functions . . . . . . . . . . . . . . . . . . . . . 66 2.1.1 Polynomials and Rational Functions . . . . . . . . . . 66 2.1.2 Component Functions of a Mapping . . . . . . . . . . 68 2.1.3 Invertible Mappings . . . . . . . . . . . . . . . . . . 69 2.1.4 Linear Transformations . . . . . . . . . . . . . . . . . 70 2.1.5 Quadratic Forms . . . . . . . . . . . . . . . . . . . . 74 2.2 Limits of Functions . . . . . . . . . . . . . . . . . . . . . . . 79 2.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 2.4 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . 121 2.5 Contraction Mapping Theorem . . . . . . . . . . . . . . . . . 127 Chapter 3 Continuous Functions on Connected Sets and Compact Sets 132 3.1 Path-Connectedness and Intermediate Value Theorem . . . . . 132 3.2 Connectedness and Intermediate Value Property . . . . . . . . 147 3.3 Sequential Compactness and Compactness . . . . . . . . . . . 161 3.4 Applications of Compactness . . . . . . . . . . . . . . . . . . 181 3.4.1 The Extreme Value Theorem . . . . . . . . . . . . . . 181 3.4.2 Distance Between Sets . . . . . . . . . . . . . . . . . 184 Contents ii 3.4.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . 191 3.4.4 Linear Transformations and Quadratic Forms . . . . . 192 3.4.5 Lebesgue Number Lemma . . . . . . . . . . . . . . . 195 Chapter 4 Differentiating Functions of Several Variables 201 4.1 Partial Derivatives . . . . . . . . . . . . . . . . . . . . . . . . 201 4.2 Differentiability and First Order Approximation . . . . . . . . 221 4.2.1 Differentiability . . . . . . . . . . . . . . . . . . . . . 221 4.2.2 First Order Approximations . . . . . . . . . . . . . . 233 4.2.3 Tangent Planes . . . . . . . . . . . . . . . . . . . . . 237 4.2.4 Directional Derivatives . . . . . . . . . . . . . . . . . 238 4.3 The Chain Rule and the Mean Value Theorem . . . . . . . . . 248 4.4 Second Order Approximations . . . . . . . . . . . . . . . . . 263 4.5 Local Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . 271 Chapter 5 The Inverse and Implicit Function Theorems 285 5.1 The Inverse Function Theorem . . . . . . . . . . . . . . . . . 285 5.2 The Proof of the Inverse Function Theorem . . . . . . . . . . 298 5.3 The Implicit Function Theorem . . . . . . . . . . . . . . . . . 309 5.4 Extrema Problems and the Method of Lagrange Multipliers . . 329 Chapter 6 Multiple Integrals 343 6.1 Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . . . 344 6.2 Properties of Riemann Integrals . . . . . . . . . . . . . . . . . 376 6.3 Jordan Measurable Sets and Riemann Integrable Functions . . 389 6.4 Iterated Integrals and Fubini’s Theorem . . . . . . . . . . . . 431 6.5 Change of Variables Theorem . . . . . . . . . . . . . . . . . . 450 6.5.1 Translations and Linear Transformations . . . . . . . 454 6.5.2 Polar Coordinates . . . . . . . . . . . . . . . . . . . . 466 6.5.3 Spherical Coordinates . . . . . . . . . . . . . . . . . 477 6.5.4 Other Examples . . . . . . . . . . . . . . . . . . . . . 482 6.6 Proof of the Change of Variables Theorem . . . . . . . . . . . 487 6.7 Some Important Integrals and Their Applications . . . . . . . 509 Contents iii Chapter 7 Fourier Series and Fourier Transforms 517 7.1 Orthogonal Systems of Functions and Fourier Series . . . . . 518 7.2 The Pointwise Convergence of a Fourier Series . . . . . . . . 540 7.3 The L2 Convergence of a Fourier Series . . . . . . . . . . . . 556 7.4 The Uniform Convergence of a Trigonometric Series . . . . . 570 7.5 Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . 586 Appendix A Sylvester’s Criterion 615 Appendix B Volumes of Parallelepipeds 622 Appendix C Riemann Integrability 629 References 642 Preface iv Preface Mathematical analysis is a standard course which introduces students to rigorous reasonings in mathematics, as well as the theories needed for advanced analysis courses. It is a compulsory course for all mathematics majors. It is also strongly recommended for students that major in computer science, physics, data science, financial analysis, and other areas that require a lot of analytical skills. Some standard textbooks in mathematical analysis include the classical one by Apostol [Apo74] and Rudin [Rud76], and the modern one by Bartle [BS92], Fitzpatrick [Fit09], Abbott [Abb15], Tao [Tao16, Tao14] and Zorich [Zor15, Zor16]. This book is the second volume of the textbooks intended for a one-year course in mathematical analysis. We introduce the fundamental concepts in a pedagogical way. Lots of examples are given to illustrate the theories. We assume that students are familiar with the material of calculus such as those in the book [SCW20]. Thus, we do not emphasize on the computation techniques. Emphasis is put on building up analytical skills through rigorous reasonings. Besides calculus, it is also assumed that students have taken introductory courses in discrete mathematics and linear algebra, which covers topics such as logic, sets, functions, vector spaces, inner products, and quadratic forms. Whenever needed, these concepts would be briefly revised. In this book, we have defined all the mathematical terms we use carefully. While most of the terms have standard definitions, some of the terms may have definitions defer from authors to authors. The readers are advised to check the definitions of the terms used in this book when they encounter them. This can be easily done by using the search function provided by any PDF viewer. The readers are also encouraged to fully utilize the hyper-referencing provided. Teo Lee Peng Chapter 1. Euclidean Spaces 1 Chapter 1 Euclidean Spaces In this second volume of mathematical analysis, we study functions defined on subsets of Rn. For this, we need to study the structure and topology of Rn first. We start by a revision on Rn as a vector space. In the sequel, n is a fixed positive integer reserved to be used for Rn. 1.1 The Euclidean Space RnRnRn as a Vector Space If S1, S2, . . ., Sn are sets, the cartesian product of these n sets is defined as the set S = S1 × · · · × Sn = n∏ i=1 Si = {(a1, . . . , an) | ai ∈ Si, 1 ≤ i ≤ n} that contains all n-tuples (a1, . . . , an), where ai ∈ Si for all 1 ≤ i ≤ n. The set Rn is the cartesian product of n copies of R. Namely, Rn = {(x1, x2, . . . , xn) |x1, x2, . . . , xn ∈ R} . The point (x1, x2, . . . , xn) is denoted as x, whereas x1, x2, . . . , xn are called the components of the point x. We can define an addition and a scalar multiplication on Rn. If x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) are in Rn, the addition of x and y is defined as x+ y = (x1 + y1, x2 + y2, . . . , xn + yn). In other words, it is a componentwise addition. Given a real number α, the scalar multiplication of α with x is given by the componentwise multiplication αx = (αx1, αx2, . . . , αxn). The set Rn with the addition and scalar multiplication operations is a vector space. It satisfies the 10 axioms for a real vector space V . Chapter 1. Euclidean Spaces 2 The 10 Axioms for a Real Vector Space VVV Let V be a set that is equipped with two operations – the addition and the scalar multiplication. For any two vectors u and v in V , their addition is denoted by u + v. For a vector u in V and a scalar α ∈ R, the scalar multiplication of v by α is denoted by αv. We say that V with the additionand scalar multiplication is a real vector space provided that the following 10 axioms are satisfied for any u, v and w in V , and any α and β in R. Axiom 1 If u and v are in V , then u+ v is in V . Axiom 2 u+ v = v + u. Axiom 3 (u+ v) +w = u+ (v +w). Axiom 4 There is a zero vector 0 in V such that 0+ v = v = v + 0 for all v ∈ V. Axiom 5 For any v in V , there is a vector w in V such that v +w = 0 = w + v. The vector w satisfying this equation is called the negative of v, and is denoted by −v. Axiom 6 For any v in V , and any α ∈ R, αv is in V . Axiom 7 α(u+ v) = αu+ αv. Axiom 8 (α + β)v = αv + βv. Axiom 9 α(βv) = (αβ)v. Axiom 10 1v = v. Rn is a real vector space. The zero vector is the point 0 = (0, 0, . . . , 0) with all components equal to 0. Sometimes we also call a point x = (x1, . . . , xn) in Chapter 1. Euclidean Spaces 3 Rn a vector, and identify it as the vector from the origin 0 to the point x. Definition 1.1 Standard Unit Vectors In Rn, there are n standard unit vectors e1, . . ., en given by e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0), · · · , en = (0, . . . , 0, 1). Let us review some concepts from linear algebra which will be useful later. Given that v1, . . . ,vk are vectors in a vector space V , a linear combination of v1, . . . ,vk is a vector v in V of the form v = c1v1 + · · ·+ ckvk for some scalars c1, . . . , ck, which are known as the coefficients of the linear combination. A subspace of a vector space V is a subset of V that is itself a vector space. There is a simple way to construct subspaces. Proposition 1.1 Let V be a vector space, and let v1, . . . ,vk be vectors in V . The subset W = {c1v1 + · · ·+ ckvk | c1, . . . , ck ∈ R} of V that contains all linear combinations of v1, . . . ,vk is itself a vector space. It is called the subspace of V spanned by v1, . . . ,vk. Example 1.1 In R3, the subspace spanned by the vectors e1 = (1, 0, 0) and e3 = (0, 0, 1) is the set W that contains all points of the form x(1, 0, 0) + z(0, 0, 1) = (x, 0, z), which is the xz-plane. Next, we recall the concept of linear independence. Chapter 1. Euclidean Spaces 4 Definition 1.2 Linear Independence Let V be a vector space, and let v1, . . . ,vk be vectors in V . We say that the set {v1, . . . ,vk} is a linearly independent set of vectors, or the vectors v1, . . . ,vk are linearly independent, if the only k-tuple of real numbers (c1, . . . , ck) which satisfies c1v1 + · · ·+ ckvk = 0 is the trivial k-tuple (c1, . . . , ck) = (0, . . . , 0). Example 1.2 In Rn, the standard unit vectors e1, . . . , en are linearly independent. Example 1.3 If V is a vector space, a vector v in V is linearly independent if and only if v ̸= 0. Example 1.4 Let V be a vector space. Two vectors u and v in V are linearly independent if and only if u ̸= 0, v ̸= 0, and there does not exists a constant α such that v = αu. Let us recall the following definition for two vectors to be parallel. Definition 1.3 Parallel Vectors Let V be a vector space. Two vectors u and v in V are parallel if either u = 0 or there exists a constant α such that v = αu. In other words, two vectors u and v in V are linearly independent if and only if they are not parallel. Chapter 1. Euclidean Spaces 5 Example 1.5 If S = {v1, . . . ,vk} is a linearly independent set of vectors, then for any S ′ ⊂ S, S ′ is also a linearly independent set of vectors. Now we discuss the concept of dimension and basis. Definition 1.4 Dimension and Basis Let V be a vector space, and let W be a subspace of V . If W can be spanned by k linearly independent vectors v1, . . . ,vk in V , we say that W has dimension k. The set {v1, . . . ,vk} is called a basis of W . Example 1.6 In Rn, the n standard unit vectors e1, . . ., en are linearly independent and they span Rn. Hence, the dimension of Rn is n. Example 1.7 In R3, the subspace spanned by the two linearly independent vectors e1 = (1, 0, 0) and e3 = (0, 0, 1) has dimension 2. Next, we introduce the translate of a set. Definition 1.5 Translate of a Set If A is a subset of Rn, u is a point in Rn, the translate of the set A by the vector u is the set A+ u = {a+ u | a ∈ A} . Example 1.8 In R3, the translate of the set A = {(x, y, 0) |x, y ∈ R} by the vector u = (0, 0,−2) is the set B = A+ u = {(x, y,−2) |x, y ∈ R}. In Rn, the lines and the planes are of particular interest. They are closely Chapter 1. Euclidean Spaces 6 related to the concept of subspaces. Definition 1.6 Lines in RnRnRn A line L in Rn is a translate of a subspace of Rn that has dimension 1. As a set, it contains all the points x of the form x = x0 + tv, t ∈ R, where x0 is a fixed point in Rn, and v is a nonzero vector in Rn. The equation x = x0 + tv, t ∈ R, is known as the parametric equation of the line. A line is determined by two points. Example 1.9 Given two distinct points x1 and x2 in Rn, the line L that passes through these two points have parametric equation given by x = x1 + t(x2 − x1), t ∈ R. When 0 ≤ t ≤ 1, x = x1 + t(x2 − x1) describes all the points on the line segment with x1 and x2 as endpoints. Figure 1.1: A Line between two points. Chapter 1. Euclidean Spaces 7 Definition 1.7 Planes in RnRnRn A plane W in Rn is a translate of a subspace of dimension 2. As a set, it contains all the points x of the form x = x0 + t1v1 + t2v2, t1, t2 ∈ R, where x0 is a fixed point in Rn, and v1 and v2 are two linearly independent vectors in Rn. Besides being a real vector space, Rn has an additional structure. Its definition is motivated as follows. Let P (x1, x2, x3) and Q(y1, y2, y3) be two points in R3. By Pythagoras theorem, the distance between P and Q is given by PQ = √ (x1 − y1)2 + (x2 − y2)2 + (x3 − y3)2. Figure 1.2: Distance between two points in R2. Consider the triangleOPQwith verticesO, P ,Q, whereO is the origin. Then OP = √ x21 + x22 + x23, OQ = √ y21 + y22 + y23. Let θ be the minor angle between OP and OQ. By cosine rule, PQ2 = OP 2 +OQ2 − 2×OP ×OQ× cos θ. A straightforward computation gives OP 2 +OQ2 − PQ2 = 2(x1y1 + x2y2 + x3y3). Chapter 1. Euclidean Spaces 8 Figure 1.3: Cosine rule. Hence, cos θ = x1y1 + x2y2 + x3y3√ x21 + x22 + x23 √ y21 + y22 + y23 . (1.1) It is a quotient of x1y1+x2y2+x3y3 by the product of the lengths of OP and OQ. Generalizing the expression x1y1+x2y2+x3y3 from R3 to Rn defines the dot product. For any two vectors x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) in Rn, the dot product of x and y is defined as x · y = n∑ i=1 xiyi = x1y1 + x2y2 + · · ·+ xnyn. This is a special case of an inner product. Definition 1.8 Inner Product Space A real vector space V is an inner product space if for any two vectors u and v in V , an inner product ⟨u,v⟩ of u and v is defined, and the following conditions for any u,v,w in V and α, β ∈ R are satisfied. 1. ⟨u,v⟩ = ⟨v,u⟩. 2. ⟨αu+ βv,w⟩ = α⟨u,w⟩+ β⟨v,w⟩. 3. ⟨v,v⟩ ≥ 0 and ⟨v,v⟩ = 0 if and only if v = 0. Chapter 1. Euclidean Spaces 9 Proposition 1.2 Euclidean Inner Product on RnRnRn On Rn, ⟨x,y⟩ = x · y = n∑ i=1 xiyi = x1y1 + x2y2 + · · ·+ xnyn. defines an inner product, called the standard inner product or the Euclidean inner product. Definition 1.9 Euclidean Space The vector space Rn with the Euclidean inner product is called the Euclidean n-space. In the future, when we do not specify, Rn always means the Euclidean n-space. One can deduce some useful identities from the three axioms of an inner product space. Proposition 1.3 If V is an inner product space, then the following holds. (a) For any v ∈ V , ⟨0,v⟩ = 0 = ⟨v,0⟩. (b) For any vectors v1, · · · ,vk, w1, · · · ,wl in V , and for any real numbers α1, · · · , αk, β1, · · · , βl,〈 k∑ i=1 αivi, l∑ j=1 βjwj 〉 = k∑ i=1 l∑ j=1 αiβj⟨vi,wj⟩. Given that V is an inner product space, ⟨v,v⟩ ≥ 0 for any v in V . For example, for any x = (x1, x2, . . . , xn) in Rn, under the Euclidean inner product, ⟨x,x⟩ = n∑ i=1 x2i = x21 + x22 + · · ·+ x2n ≥ 0. When n = 3, the length of the vectorOP from the point O(0, 0, 0) to the point Chapter 1. Euclidean Spaces 10 P (x1, x2, x3) is OP = √ x21 + x22 + x23 = √ ⟨x,x⟩, where x = (x1, x2, x3). This motivates us to define to norm of a vector in an inner product space as follows. Definition 1.10 Norm of a Vector Given that V is an inner product space, the norm of a vector v is defined as ∥v∥ = √ ⟨v,v⟩. The norm of a vector in an inner product space satisfies some properties, which follow from the axioms for an inner product space. Proposition 1.4 Let V be an inner product space. 1. For any v in V , ∥v∥ ≥ 0 and ∥v∥ = 0 if and only if v = 0. 2. For any α ∈ R and v ∈ V , ∥αv∥ = |α| ∥v∥. Motivated by the distance between two points in R3, we make the following definition. Definition 1.11 Distance Between Two Points Given that V is an inner product space, the distance between u and v in V is defined as d(u,v) = ∥v − u∥ = √ ⟨v − u,v − u⟩. For example, the distance between the points x = (x1, . . . , xn) and y = (y1, . . . , yn) in the Euclidean space Rn is d(x,y) = √√√√ n∑ i=1 (xi − yi)2 = √ (x1 − y1)2 + · · ·+ (xn − yn)2. Chapter 1. Euclidean Spaces 11 For analysis in R, an important inequality is the triangle inequality which says that |x + y| ≤ |x| + |y| for any x and y in R. To generalize this inequality to Rn, we need the celebrated Cauchy-Schwarz inequality. It holds on any inner product space. Proposition 1.5 Cauchy-Schwarz Inequality Given that V is an inner product space, for any u and v in V , |⟨u,v⟩| ≤ ∥u∥ ∥v∥. The equality holds if and only if u and v are parallel. Proof It is obvious that if either u = 0 or v = 0, |⟨u,v⟩| = 0 = ∥u∥ ∥v∥, and so the equality holds. Now assume that both u and v are nonzero vectors. Consider the quadratic function f : R → R defined by f(t) = ∥tu− v∥2 = ⟨tu− v, tu− v⟩. Notice that f(t) = at2 + bt+ c, where a = ⟨u,u⟩ = ∥u∥2, b = −2⟨u,v⟩, c = ⟨v,v⟩ = ∥v∥2. The 3rd axiom of an inner product says that f(t) ≥ 0 for all t ∈ R. Hence, we must have b2 − 4ac ≤ 0. This gives ⟨u,v⟩2 ≤ ∥u∥2∥v∥2. Thus, we obtain the Cauchy-Schwarz inequality |⟨u,v⟩| ≤ ∥u∥ ∥v∥. Chapter 1. Euclidean Spaces 12 The equality holds if and only if b2 − 4ac = 0. The latter means that f(t) = 0 for some t = α, which can happen if and only if αu− v = 0, or equivalently, v = αu. Now we can prove the triangle inequality. Proposition 1.6 Triangle Inequality Let V be an inner product space. For any vectors v1,v2, . . . ,vk in V , ∥v1 + v2 + · · ·+ vk∥ ≤ ∥v1∥+ ∥v2∥+ · · ·+ ∥vk∥. Proof It is sufficient to prove the statement when k = 2. The general case follows from induction. Given v1 and v2 in V , ∥v1 + v2∥2 = ⟨v1 + v2,v1 + v2⟩ = ⟨v1,v1⟩+ 2⟨v1,v2⟩+ ⟨v2,v2⟩ ≤ ∥v1∥2 + 2∥v1∥∥v2∥+ ∥v2∥2 = (∥v1∥+ ∥v2∥)2 . This proves that ∥v1 + v2∥ ≤ ∥v1∥+ ∥v2∥. From the triangle inequality, we can deduce the following. Corollary 1.7 Let V be an inner product space. For any vectors u and v in V ,∣∣∥u∥ − ∥v∥ ∣∣ ≤ ∥u− v∥. Express in terms of distance, the triangle inequality takes the following form. Chapter 1. Euclidean Spaces 13 Proposition 1.8 Triangle Inequality Let V be an inner product space. For any three points v1,v2,v3 in V , d(v1,v2) ≤ d(v1,v3) + d(v2,v3). More generally, if v1,v2, . . . ,vk are k vectors in V , then d(v1,vk) ≤ k∑ i=2 d(vi−1,vi) = d(v1,v2) + · · ·+ d(vk−1,vk). Since we can define the distance function on an inner product space, inner product space is a special case of metric spaces. Definition 1.12 Metric Space Let X be a set, and let d : X ×X → R be a function defined on X ×X . We say that d is a metric on X provided that the following conditions are satisfied. 1. For any x and y in X , d(x, y) ≥ 0, and d(x, y) = 0 if and only if x = y. 2. d(x, y) = d(y, x) for any x and y in X . 3. For any x, y and z in X , d(x, y) ≤ d(x, z) + d(y, z). If d is a metric on X , we say that (X, d) is a metric space. Metric spaces play important roles in advanced analysis. If V is an innner product space, it is a metric space with metric d(u,v) = ∥v − u∥. Using the Cauchy-Schwarz inequality, one can generalize the concept of angles to any two vectors in a real inner product space. If u and v are two nonzero vectors in a real inner product space V , Cauchy-Schwarz inequality implies that ⟨u,v⟩ ∥u∥ ∥v∥ Chapter 1. Euclidean Spaces 14 is a real number between −1 and 1. Generalizing the formula (1.1), we define the angle θ between u and v as θ = cos−1 ⟨u,v⟩ ∥u∥ ∥v∥ . This is an angle between 0◦ and 180◦. A necessary and sufficient condition for two vectors u and v to make a 90◦ angle is ⟨u,v⟩ = 0. Definition 1.13 Orthogonality Let V be a real inner product space. We say that the two vectors u and v in V are orthogonal if ⟨u,v⟩ = 0. Lemma 1.9 Generalized Pythagoras Theorem Let V be an inner product space. If u and v are orthogonal vectors in V , then ∥u+ v∥2 = ∥u∥2 + ∥v∥2. Now we discuss the projection theorem. Theorem 1.10 Projection Theorem Let V be an inner product space, and let w be a nonzero vector in V . If v is a vector in V , there is a unique way to write v as a sum of two vectors v1 and v2, such that v1 is parallel to w and v2 is orthogonal to w. Moreover, for any real number α, ∥v − αw∥ ≥ ∥v − v1∥, and the equality holds if and only if α is equal to the unique real number β such that v1 = βw. Chapter 1. Euclidean Spaces 15 Figure 1.4: The projection theorem. Proof Assume that v can be written as a sum of two vectors v1 and v2, such that v1 is parallel to w and v2 is orthogonal to w. Since w is nonzero, there is a real number β such that v1 = βw. Since v2 = v − v1 = v − βw is orthogonal to w, we have 0 = ⟨v − βw,w⟩ = ⟨v,w⟩ − β⟨w,w⟩. This implies that we must have β = ⟨v,w⟩ ⟨w,w⟩ , and v1 = ⟨v,w⟩ ⟨w,w⟩ w, v2 = v − ⟨v,w⟩ ⟨w,w⟩ w. It is easy to check that v1 and v2 given by these formulas indeed satisfy the requirements that v1 is parallel to w and v2 is orthogonal to w. This establishes the existence and uniqueness of v1 and v2. Now for any real number α, v − αw = v − v1 + (β − α)w. Chapter 1. Euclidean Spaces 16 Since v − v1 = v2 is orthogonal to (β − α)w, the generalized Pythagoras theorem implies that ∥v − αw∥2 = ∥v − v1∥2 + ∥(β − α)w∥2 ≥ ∥v − v1∥2. This proves that ∥v − αw∥ ≥ ∥v − v1∥. The equality holds if and only if ∥(β − α)w∥ = |α− β|∥w∥ = 0. Since ∥w∥ ≠ 0, we must have α = β. The vector v1 in this theorem is called the projection of v onto the subspace spanned by w. There is a more general projection theorem where the subspaceW spanned by w is replaced by a general subspace. We say that a vector v is orthogonal to the subspace W if it is orthogonal to each vector w in W . Theorem 1.11 General Projection Theorem Let V be an inner product space, and letW be a finite dimensional subspace of V . If v is a vector in V , there is a unique way to write v as a sum of two vectors v1 and v2, such that v1 is in W and v2 is orthogonal to W . The vector v1 is denoted by projWv. For any w ∈ W , ∥v −w∥ ≥ ∥v − projWv∥, and the equality holds if and only if w = projWv. Sketch of Proof If W is a k- dimensional vector space, it has a basis consists of k linearly independent vectors w1, . . . ,wk. Since the vector v1 is in W , there are constants c1, . . . , ck such that v1 = c1w1 + · · ·+ ckwk. Chapter 1. Euclidean Spaces 17 The condition v2 = v − v1 is orthogonal to W gives rise to k equations c1⟨w1,w1⟩+ · · ·+ ck⟨wk,w1⟩ = ⟨v,w1⟩, ... c1⟨w1,wk⟩+ · · ·+ ck⟨wk,wk⟩ = ⟨v,wk⟩. (1.2) Using the fact that w1, . . . ,wk are linearly independent, one can show that the k × k matrix A = ⟨w1,w1⟩ · · · ⟨wk,w1⟩ ... . . . ... ⟨w1,wk⟩ · · · ⟨wk,wk⟩ is invertible. This shows that there is a unique c = (c1, . . . , ck) satisfying the linear system (1.2). If V is an inner product space, a basis that consists of mutually orthogonal vectors are of special interest. Definition 1.14 Orthogonal Set and Orthonormal Set Let V be an inner product space. A subset of vectors S = {u1, . . . ,uk} is called an orthogonal set if any two distinct vectorsui and uj in S are orthogonal. Namely, ⟨ui,uj⟩ = 0 if i ̸= j. S is called an orthonormal set if it is an orthogonal set of unit vectors. Namely, ⟨ui,uj⟩ = 0 if i ̸= j, 1 if i = j . If S = {u1, . . . ,uk} is an orthogonal set of nonzero vectors, it is a linearly independent set of vectors. One can construct an orthonormal set by normalizing each vector in the set. There is a standard algorithm, known as the Gram-Schmidt process, which can turn any linearly independent set of vectors {v1, . . . ,vk} into Chapter 1. Euclidean Spaces 18 an orthogonal set {u1, . . . ,uk} of nonzero vectors. We start by the following lemma. Lemma 1.12 Let V be an inner product space, and let S = {u1, . . . ,uk} be an orthogonal set of nonzero vectors in V that spans the subspace W . Given any vector v in V , projWv = k∑ i=1 ⟨v,ui⟩ ⟨ui,ui⟩ ui. Proof By the general projection theorem, v = v1 + v2, where v1 = projWv is in W and v2 is orthogonal to W . Since S is a basis for W , there exist scalars c1, c2, . . . , ck such that v1 = c1u1 + · · ·+ ckuk. Therefore, v = c1u1 + · · ·+ ckuk + v2. Since S is an orthogonal set of vectors and v2 is orthogonal to each ui, we find that for 1 ≤ i ≤ k, ⟨v,ui⟩ = ci⟨ui,ui⟩. This proves the lemma. Theorem 1.13 Gram-Schmidt Process Let V be an inner product space, and assume that S = {v1, . . . ,vk} is a linearly independent set of vectors in V . Define the vectors u1, . . . ,uk inductively by u1 = v1, and for 2 ≤ j ≤ k, uj = vj − j−1∑ i=1 ⟨vj,ui⟩ ⟨ui,ui⟩ ui. Then S ′ = {u1, . . . ,uk} is a nonzero set of orthogonal vectors. Moreover, for each 1 ≤ j ≤ k, the set {ui | 1 ≤ i ≤ j} spans the same subspace as the set {vi | 1 ≤ i ≤ j}. Chapter 1. Euclidean Spaces 19 Sketch of Proof For 1 ≤ j ≤ k, let Wj be the subspace spanned by the set {vi | 1 ≤ i ≤ j}. The vectors u1, . . . ,uk are constructed by letting u1 = v1, and for 2 ≤ j ≤ k, uj = vj − projWj−1 vj. Since {v1, . . . ,vj} is a linearly independent set, uj ̸= 0. Using induction, one can show that span {u1, . . . ,uj} = span {v1, . . . ,vj}. By projection theorem, uj is orthogonal to Wj−1. Hence, it is orthogonal to u1, . . . ,uj−1. This proves the theorem. A mapping between two vector spaces that respect the linear structures is called a linear transformation. Definition 1.15 Linear Transformation Let V and W be real vector spaces. A mapping T : V → W is called a linear transformation provided that for any v1, . . . ,vk in V , for any real numbers c1, . . . , ck, T (c1v1 + · · ·+ ckvk) = c1T (v1) + · · ·+ ckT (vk). Linear transformations play important roles in multivariable analysis. In the following, we first define a special class of linear transformations associated to special projections. For 1 ≤ i ≤ n, let Li be the subspace of Rn spanned by the unit vector ei. For the point x = (x1, . . . , xn), projLi x = xiei. The number xi is the ith-component of x. It will play important roles later. The mapping from x to xi is a function from Rn to R. Definition 1.16 Projection Functions For 1 ≤ i ≤ n, the ith-projection function on Rn is the function πi : Rn → R defined by πi(x) = πi(x1, . . . , xn) = xi. Chapter 1. Euclidean Spaces 20 Figure 1.5: The projection functions. The following is obvious. Proposition 1.14 For 1 ≤ i ≤ n, the ith-projection function on Rn is a linear transformation. Namely, for any x1, . . . ,xk in Rn, and any real numbers c1, . . . , ck, πi (c1x1 + · · ·+ ckxk) = c1πi(x1) + · · ·+ ckπi(xk). The following is a useful inequality. Proposition 1.15 Let x be a vector in Rn. Then |πi(x)| ≤ ∥x∥. At the end of this section, let us introduce the concept of hyperplanes. Definition 1.17 Hyperplanes In Rn, a hyperplane is a translate of a subspace of dimension n−1. In other words, H is a hyperplane if there is a point x0 in Rn, and n − 1 linearly independent vectors v1, v2, . . ., vn−1 such that H contains all points x of the form x = x0 + t1v1 + · · ·+ tn−1vn−1, (t− 1, . . . , tn−1) ∈ Rn−1. Chapter 1. Euclidean Spaces 21 A hyperplane in R1 is a point. A hyperplane in R2 is a line. A hyperplane in R3 is a plane. Definition 1.18 Normal Vectors Let v1, v2, . . ., vn−1 be linearly independent vectors in Rn, and let H be the hyperplane H = { x0 + t1v1 + · · ·+ tn−1vn−1 | (t1, . . . , tn−1) ∈ Rn−1 } . A nonzero vector n that is orthogonal to all the vectors v1, . . . ,vn−1 is called a normal vector to the hyperplane. If x1 and x2 are two points on H, then n is orthogonal to the vector v = x2 − x1. Any two normal vectors of a hyperplane are scalar multiples of each other. Proposition 1.16 If H is a hyperplane with normal vector n = (a1, a2, . . . , an), and x0 = (u1, u2, . . . , un) is a point on H, then the equation of H is given by a1(x1 − u1) + a2(x2 − u2) + · · ·+ an(xn − un) = n · (x− x0) = 0. Conversely, any equation of the form a1x1 + a2x2 + · · ·+ anxn = b is the equation of a hyperplane with normal vector n = (a1, a2, . . . , an). Example 1.10 Given 1 ≤ i ≤ n, the equation xi = c is a hyperplane with normal vector ei. It is a hyperplane parallel to the coordinate plane xi = 0, and perpendicular to the xi-axis. Chapter 1. Euclidean Spaces 22 Exercises 1.1 Question 1 Let V be an inner product space. If u and v are vectors in V , show that∣∣∥u∥ − ∥v∥ ∣∣ ≤ ∥u− v∥. Question 2 Let V be an inner product space. If u and v are orthogonal vectors in V , show that ∥u+ v∥2 = ∥u∥2 + ∥v∥2. Question 3 Let V be an inner product space, and let u and v be vectors in V . Show that ⟨u,v⟩ = ∥u+ v∥2 − ∥u− v∥2 4 . Question 4 Let V be an inner product space, and let {u1, . . . ,uk} be an orthonormal set of vectors in V . For any real numbers α1, . . . , αk, show that ∥α1u1 + · · ·+ αkuk∥2 = α2 1 + · · ·+ α2 k. Question 5 Let x1, x2, . . . , xn be real numbers. Show that (a) √ x21 + x22 · · ·+ x2n ≤ |x1|+ |x2|+ · · ·+ |xn|; (b) |x1 + x2 + · · ·+ xn| ≤ √ n √ x21 + x22 · · ·+ x2n. Chapter 1. Euclidean Spaces 23 1.2 Convergence of Sequences in RnRnRn A point in the Euclidean space Rn is denoted by x = (x1, x2, . . . , xn). When n = 1, we just denote it by x. When n = 2 and n = 3, it is customary to denote a point in R2 and R3 by (x, y) and (x, y, z) respectively. The Euclidean inner product between the vectors x = (x1, x2, . . . , xn) and y = (y1, y2, . . . , yn) is ⟨x,y⟩ = x · y = n∑ i=1 xiyi. The norm of x is ∥x∥ = √ ⟨x,x⟩ = √√√√ n∑ i=1 x2i , while the distance between x and y is d(x,y) = ∥x− y∥ = √√√√ n∑ i=1 (xi − yi)2. A sequence in Rn is a function f : Z+ → Rn. For k ∈ Z+, let ak = f(k). Then we can also denote the sequence by {ak}∞k=1, or simply as {ak}. Example 1.11 The sequence {( k k + 1 , 2k + 3 k )} is a sequence in R2 with ak = ( k k + 1 , 2k + 3 k ) . In volume I, we have seen that a sequence of real numbers {ak}∞k=1 is said to converge to a real number a provided that for any ε > 0, there is a positive integer K such that |ak − a| < ε for all k ≥ K. Notice that |ak − a| is the distance between ak and a. To define the convergence of a sequence in Rn, we use the Euclidean distance. Chapter 1. Euclidean Spaces 24 Definition 1.19 Convergence of Sequences A sequence {ak} in Rn is said to converge to the point a in Rn provided that for any ε > 0, there is a positive integer K so that for all k ≥ K, ∥ak − a∥ = d(ak, a) < ε. If {ak} is a sequence that converges to a point a, we say that the sequence {ak} is convergent. A sequence that does not converge to any point in Rn is said to be divergent. Figure 1.6: The convergence of a sequence. As in the n = 1 case, we have the following. Proposition 1.17 A sequence in Rn cannot converge to two different points. Definition 1.20 Limit of a Sequence If {ak} is a sequence in Rn that converges to the point a, we call a the limit of the sequence. This can be expressed as lim k→∞ ak = a. The following is easy to establish. Chapter 1. Euclidean Spaces 25 Proposition 1.18 Let {ak} be a sequence in Rn. Then {ak} converges to a if and only if lim k→∞ ∥ak− a∥ = 0. Proof By definition, the sequence {ak} is convergent if and only if for any ε > 0, there is a positive integer K so that for all k ≥ K, ∥ak − a∥ < ε. This is the definition of lim k→∞ ∥ak − a∥ = 0. As in the n = 1 case, {akj}∞j=1 is a subsequence of {ak} if k1, k2, k3, . . . is a strictly increasing subsequence of positive integers. Corollary 1.19 If {ak} is a sequence in Rn that converges to the point a, then any subsequence of {ak} also converges to a. Example 1.12 Let us investigate the convergence of the sequence {ak} in R2 with ak = ( k k + 1 , 2k + 3 k ) that is defined in Example 1.11. Notice that lim k→∞ π1(ak) = lim k→∞ k k + 1 = 1, lim k→∞ π2(ak) = lim k→∞ 2k + 3 k = 2. It is natural for us to speculate that the sequence {ak} converges to the point a = (1, 2). Chapter 1. Euclidean Spaces 26 For k ∈ Z+, ak − a = ( − 1 k + 1 , 3 k ) . Thus, ∥ak − a∥ = √ 1 (k + 1)2 + 9 k2 . By squeeze theorem, lim k→∞ ∥ak − a∥ = 0. This proves that the sequence {ak} indeed converges to the point a = (1, 2). In the example above, we guess the limit of the sequence by looking at each components of the sequence. This in fact works for any sequences. Theorem 1.20 Componentwise Convergence of Sequences A sequence {ak} in Rn converges to the point a if and only if for each 1 ≤ i ≤ n, the sequence {πi(ak)} converges to the point {πi(a)}. Proof Given 1 ≤ i ≤ n, πi(ak)− πi(a) = πi(ak − a). Thus, |πi(ak)− πi(a)| = |πi(ak − a)| ≤ ∥ak − a∥. If the sequence {ak} converges to the point a, then lim k→∞ ∥ak − a∥ = 0. By squeeze theorem, lim k→∞ |πi(ak)− πi(a)| = 0. This proves that the sequence {πi(ak)} converges to the point {πi(a)}. Chapter 1. Euclidean Spaces 27 Conversely, assume that for each 1 ≤ i ≤ n, the sequence {πi(ak)} converges to the point {πi(a)}. Then lim k→∞ |πi(ak)− πi(a)| = 0 for 1 ≤ i ≤ n. Since ∥ak − a∥ ≤ n∑ i=1 |πi(ak − a)| , squeeze theorem implies that lim k→∞ ∥ak − a∥ = 0. This proves that the sequence {ak} converges to the point a. Theorem 1.20 reduces the investigations of convergence of sequences in Rn to sequences in R. Let us look at a few examples. Example 1.13 Find the following limit. lim k→∞ ( 2k + 1 3k , ( 1 + 1 k )k , k√ k2 + 1 ) . Solution We compute the limit componentwise. lim k→∞ 2k + 1 3k = lim k→∞ [( 2 3 )k + ( 1 3 )k ] = 0 + 0 = 0, lim k→∞ ( 1 + 1 k )k = e, lim k→∞ k√ k2 + 1 = lim k→∞ k k √ 1 + 1 k2 = 1. Chapter 1. Euclidean Spaces 28 Hence, lim k→∞ ( 2k + 1 3k , ( 1 + 1 k )k , k√ k2 + 1 ) = (0, e, 1). Example 1.14 Let {ak} be the sequence with ak = ( (−1)k, (−1)k k ) . Is the sequence convergent? Justify your answer. Solution The sequence {π1(ak)} is the sequence {(−1)k}, which is divergent. Hence, the sequence {ak} is divergent. Using the componentwise convergence theorem, it is easy to establish the following. Proposition 1.21 Linearity Let {ak} and {bk} be sequences in Rn that converges to a and b respectively. For any real numbers α and β, the sequence {αak + βbk} converges to αa+ βb. Namely, lim k→∞ (αak + βbk) = αa+ βb. Example 1.15 If {ak} is a sequence in Rn that converges to a, show that lim k→∞ ∥ak∥ = ∥a∥. Chapter 1. Euclidean Spaces 29 Solution Notice that ∥ak∥ = √ π1(ak)2 + · · ·+ πn(ak)2. For 1 ≤ i ≤ n, lim k→∞ πi(ak) = πi(a). Using limit laws for sequences in R, we have lim k→∞ ( π1(ak) 2 + · · ·+ πn(ak) 2 ) = π1(a) 2 + · · ·+ πn(a) 2. Using the fact that square root function is continuous, we find that lim k→∞ ∥ak∥ = lim k→∞ √ π1(ak)2 + · · ·+ πn(ak)2 = √ π1(a)2 + · · ·+ πn(a)2 = ∥a∥. There is also a Cauchy criterion for convergence of sequences in Rn. Definition 1.21 Cauchy Sequences A sequence {ak} in Rn is a Cauchy sequence if for every ε > 0, there is a positive integer K such that for all l ≥ k ≥ K, ∥al − ak∥ < ε. Theorem 1.22 Cauchy Criterion A sequence {ak} in Rn is convergent if and only if it is a Cauchy sequence. Similar to the n = 1 case, the Cauchy criterion allows us to determine whether a sequence in Rn is convergent without having to guess what is the limit first. Chapter 1. Euclidean Spaces 30 Proof Assume that the sequence {ak} converges to a. Given ε > 0, there is a positive integer K such that for all k ≥ K, ∥ak − a∥ < ε/2. Then for all l ≥ k ≥ K, ∥al − ak∥ ≤ ∥al − a∥+ ∥ak − a∥ < ε. This proves that {ak} is a Cauchy sequence. Conversely, assume that {ak} is a Cauchy sequence. Given ε > 0, there is a positive integer K such that for all l ≥ k ≥ K, ∥al − ak∥ < ε. For each 1 ≤ i ≤ n, |πi(al)− πi(ak)| = |πi (al − ak)| ≤ ∥al − ak∥. Hence, {πi(ak)} is a Cauchy sequence in R. Therefore, it is convergent. By componentwise convergence theorem, the sequence {ak} is convergent. Chapter 1. Euclidean Spaces 31 Exercises 1.2 Question 1 Show that a sequence in Rn cannot converge to two different points. Question 2 Find the limit of the sequence {ak}, where ak = ( 2k + 1 k + 3 , √ 2k2 + k k , ( 1 + 2 k )k ) . Question 3 Let {ak} be the sequence with ak = ( 1 + (−1)k−1k 1 + k , 1 2k ) . Determine whether the sequence is convergent. Question 4 Let {ak} be the sequence with ak = ( k 1 + k , k√ k + 1 ) . Determine whether the sequence is convergent. Question 5 Let {ak} and {bk} be sequences in Rn that converges to a and b respectively. Show that lim k→∞ ⟨ak,bk⟩ = ⟨a,b⟩. Here ⟨x,y⟩ = x · y is the standard inner product on Rn. Chapter 1. Euclidean Spaces 32 Question 6 Suppose that {ak} is a sequence in Rn that converges to a, and {ck} is a sequence of real numbers that converges to c, show that lim k→∞ ckak = ca. Question 7 Suppose that {ak} is a sequence of nonzero vectors in Rn that converges to a and a ̸= 0, show that lim k→∞ ak ∥ak∥ = a ∥a∥ . Question 8 Let {ak} and {bk} be sequences in Rn. If {ak} is convergent and {bk} is divergent, show that the sequence {ak + bk} is divergent. Question 9 Suppose that {ak} is a sequence in Rn that converges to a. If r = ∥a∥ ≠ 0, show that there is a positive integer K such that ∥ak∥ > r 2 for all k ≥ K. Question 10 Let {ak} be a sequence in Rn and let b be a point in Rn. Assume that the sequence {ak} does not converge to b. Show that there is an ε > 0 and a subsequence {akj} of {ak} such that ∥akj − b∥ ≥ ε for all j ∈ Z+. Chapter 1. Euclidean Spaces 33 1.3 Open Sets and Closed Sets In volume I, we call an interval of the form (a, b) an open interval. Given a point x in R, a neighbourhood of x is an open interval (a, b) that contains x. Given a subset S of R, we say that x is an interior point of S if there is a neighboirhood of x that is contained in S. We say that S is closed in R provided that if {ak} is a sequence of points in S that converges to a, then a is also in S. These describe the topology of R. It is relatively simple. For n ≥ 2, the topological features of Rn are much more complicated. An open interval (a, b) in R can be described as a set of the form B = {x ∈ R | |x− x0| < r} , where x0 = a+ b 2 and r = b− a 2 . Figure 1.7: An open interval. Generalizing this, we define open balls in Rn. Definition 1.22 Open Balls Given x0 in Rn and r > 0, an open ball B(x0, r) of radius r with center at x0 is a subset of Rn of the form B(x0, r) = {x ∈ Rn | ∥x− x0∥ < r} . It consists of all points of Rn whose distance to the center x0 is less than r. Obviously, it 0 < r1 ≤ r2, then B(x0, r1) ⊂ B(x0, r2). The following is a useful lemma for balls with different centers. Chapter 1. Euclidean Spaces 34 Figure 1.8: An open ball. Lemma 1.23 Let x1 be a point in the open ball B(x0, r). Then ∥x1 − x0∥ < r. If r1 is a positive number satisfying r1 ≤ r − ∥x1 − x0∥, then the open ball B(x1, r1) is contained in the open ball B(x0, r). Figure 1.9: An open ball containing another open ball with different center. Proof Let x be a point in B(x1, r1). Then ∥x− x1∥ < r1 ≤ r − ∥x1 − x0∥. Chapter 1. Euclidean Spaces 35 By triangle inequality, ∥x− x0∥ ≤ ∥x− x1∥+ ∥x1 − x0∥ < r.Therefore, x is a point in B(x0, r). This proves the assertion. Now we define open sets in Rn. Definition 1.23 Open Sets Let S be a subset of Rn. We say that S is an open set if for each x ∈ S, there is a ball B(x, r) centered at x that is contained in S. The following example justifies that an open interval of the form (a, b) is an open set. Example 1.16 Let S to be the open interval S = (a, b) in R. If x ∈ S, then a < x < b. Hence, x − a and b − x are positive. Let r = min{x − a, b − x}. Then r > 0, r ≤ x− a and r ≤ b− x. These imply that a ≤ x− r < x+ r ≤ b. Hence, B(x, r) = (x− r, x+ r) ⊂ (a, b) = S. This shows that the interval (a, b) is an open set. Figure 1.10: The interval (a, b) is an open set. The following example justifies that an open ball is indeed an open set. Example 1.17 Let S = B(x0, r) be the open ball with center at x0 and radius r > 0 in Rn. Show that S is an open set. Chapter 1. Euclidean Spaces 36 Solution Given x ∈ S, d = ∥x − x0∥ < r. Let r1 = r − d. Then r1 > 0. Lemma 1.23 implies that the ball B(x, r1) is inside S. Hence, S is an open set. Example 1.18 As subsets of Rn, ∅ and Rn are open sets. Example 1.19 A one-point set S = {a} in Rn cannot be open, for there is no r > 0 such that B(a, r) in contained in S. Let us look at some other examples of open sets. Definition 1.24 Open Rectangles A set of the form U = n∏ i=1 (ai, bi) = (a1, b1)× · · · × (an, bn) in Rn, which is a cartesian product of open bounded intervals, in called an open rectangle. Figure 1.11: A rectangle in R2. Chapter 1. Euclidean Spaces 37 Example 1.20 Let U = n∏ i=1 (ai, bi) be an open rectangle in Rn. Show that U is an open set. Solution Let x = (x1, . . . , xn) be a point in U . Then for 1 ≤ i ≤ n, ri = min{xi − ai, bi − xi} > 0 and (xi − ri, xi + ri) ⊂ (ai, bi). Let r = min{r1, . . . , rn}. Then r > 0. We claim that B(x, r) is contained in U . If y ∈ B(x, r), then ∥y − x∥ < r. This implies that |yi − xi| ≤ ∥y − x∥ < r ≤ ri for all 1 ≤ i ≤ n. Hence, yi ∈ (xi − ri, xi + ri) ⊂ (ai, bi) for all 1 ≤ i ≤ n. This proves that y ∈ U , and thus, completes the proof that B(x, r) is contained in U . Therefore, U is an open set. Figure 1.12: An open rectangle is an open set. Chapter 1. Euclidean Spaces 38 Next, we define closed sets. The definition is a straightforward generalization of the n = 1 case. Definition 1.25 Closed Sets Let S be a subset of Rn. We say that S is closed in Rn provided that if {ak} is a sequence of points in S that converges to the point a, the point a is also in S. Example 1.21 As subsets of Rn, ∅ and Rn are closed sets. Since ∅ and Rn are also open, a subset S of Rn can be both open and closed. Example 1.22 Let S = {a} be a one-point set in Rn. A sequence {ak} in S is just the constant sequence where ak = a for all k ∈ Z+. Hence, it converges to a which is in S. Thus, a one-point set S is a closed set. In volume I, we have proved the following. Proposition 1.24 Let I be intervals of the form (−∞, a], [a,∞) or [a, b]. Then I is a closed subset of R. Definition 1.26 Closed Rectangles A set of the form R = n∏ i=1 [ai, bi] = [a1, b1]× · · · × [an, bn] in Rn, which is a cartesian product of closed and bounded intervals, is called a closed rectangle. The following justifies that a closed rectangle is indeed a closed set. Chapter 1. Euclidean Spaces 39 Example 1.23 Let R = n∏ i=1 [ai, bi] = [a1, b1]× · · · × [an, bn] be a closed rectangle in Rn. Show that R is a closed set. Solution Let {ak} be a sequence in R that converges to a point a. For each 1 ≤ i ≤ n, {πi(ak)} is a sequence in [ai, bi] that converges to πi(a). Since [ai, bi] is a closed set in R, πi(a) ∈ [ai, bi]. Hence, a is in R. This proves that R is a closed set. It is not true that a set that is not open is closed. Example 1.24 Show that an interval of the form I = (a, b] in R is neither open nor closed. Solution If I is open, since b is in I , there is an r > 0 such that (b − r, b + r) = B(b, r) ⊂ I . But then b+r/2 is a point in (b−r, b+r) but not in I = (a, b], which gives a contradiction. Hence, I is not open. For k ∈ Z+, let ak = a+ b− a k . Then {ak} is a sequence in I that converges to a, but a is not in I . Hence, I is not closed. Thus, we have seen that a subset S of Rn can be both open and closed, and it can also be neither open nor closed. Let us look at some other examples of closed sets. Chapter 1. Euclidean Spaces 40 Definition 1.27 Closed Balls Given x0 in Rn and r > 0, a closed ball of radius r with center at x0 is a subset of Rn of the form CB(x0, r) = {x ∈ Rn | ∥x− x0∥ ≤ r} . It consists of all points of Rn whose distance to the center x0 is less than or equal to r. The following justifies that a closed ball is indeed a closed set. Example 1.25 Given x0 ∈ Rn and r > 0, show that the closed ball CB(x0, r) = {x ∈ Rn | ∥x− x0∥ ≤ r} is a closed set. Solution Let {ak} be a sequence in CB(x0, r) that converges to the point a. Then lim k→∞ ∥ak − a∥ = 0. For each k ∈ Z+, ∥ak − x0∥ ≤ r. By triangle inequality, ∥a− x0∥ ≤ ∥ak − x0∥+ ∥ak − a∥ ≤ r + ∥ak − a∥. Taking the k → ∞ limit, we find that ∥a− x0∥ ≤ r. Hence, a is in CB(x0, r). This proves that CB(x0, r) is a closed set. The following theorem gives the relation between open and closed sets. Chapter 1. Euclidean Spaces 41 Theorem 1.25 Let S be a subset of Rn and let A = Rn \ S be its complement in Rn. Then S is open if and only if A is closed. Proof Assume that S is open. Let {ak} be a sequence in A that converges to the point a. We want to show that a is in A. Assume to the contrary that a is not in A. Then a is in S. Since S is open, there is an r > 0 such that B(a, r) is contained in S. Since the sequence {ak} converges to a, there is a positive integer K such that for all k ≥ K, ∥ak − a∥ < r. But then this implies that aK ∈ B(a, r) ⊂ S. This contradicts to aK is in A = Rn \ S. Hence, we must have a is in A, which proves that A is closed. Conversely, assume that A is closed. We want to show that S is open. Assume to the contrary that S is not open. Then there is a point a in S such that for every r > 0, B(a, r) is not contained in S. For every k ∈ Z+, since B(a, 1/k) is not contained in S, there is a point ak in B(a, 1/k) such that ak is not in S. Thus, {ak} is a sequence in A and ∥ak − a∥ < 1 k . This shows that {ak} converges to a. Since A is closed, a is in A, which contradicts to a is in S. Thus, we must have S is open. Figure 1.13: A sequence outside an open set cannot converge to a point in the open set. Chapter 1. Euclidean Spaces 42 Next, we consider unions and intersections of sets. Theorem 1.26 1. Arbitrary union of open sets is open. Namely, if {Uα |α ∈ J} is a collection of open sets in Rn, then their union U = ⋃ α∈J Uα is also an open set. 2. Finite intersections of open sets is open. Namely, if V1, . . . , Vk are open sets in Rn, then their intersection V = k⋂ i=1 Vi is also an open set. Proof To prove the first statement, let x be a point in U = ⋃ α∈J Uα. Then there is an α ∈ J such that x is in Uα. Since Uα is open, there is an r > 0 such that B(x, r) ⊂ Uα ⊂ U . Hence, U is open. For the second statement, let x be a point in V = k⋂ i=1 Vi. Then for each 1 ≤ i ≤ k, x is in the open set Vi. Hence, there is an ri > 0 such that B(x, ri) ⊂ Vi. Let r = min{r1, . . . , rk}. Then for 1 ≤ i ≤ k, r ≤ ri and so B(x, r) ⊂ B(x, ri) ⊂ Vi. Hence, B(x, r) ⊂ V . This proves that V is open. As an application of this theorem, let us show that any open interval in R is indeed an open set. Proposition 1.27 Let I be an interval of the form (−∞, a), (a,∞) or (a, b). Then I is an open subset of R. Chapter 1. Euclidean Spaces 43 Proof We have shown in Example 1.16 that if I is an interval of the form (a, b), then I is an open subset of R. Now (a,∞) = ∞⋃ k=1 (a, a+ k) is a union of open sets. Hence, (a,∞) is open. In the same way, one can show that an interval of the form (−∞, a) is open. The next example shows that arbitrary intersectionsof open sets is not necessary open. Example 1.26 For k ∈ Z+, let Uk be the open set in R given by Uk = ( −1 k , 1 k ) . Notice that the set U = ∞⋂ k=1 Uk = {0} is a one-point set. Hence, it is not open in R. De Morgan’s law in set theory says that if {Uα |α ∈ J} is a collection of sets in Rn, then Rn \ ⋃ α∈J Uα = ⋂ α∈J (Rn \ Uα) , Rn \ ⋂ α∈J Uα = ⋃ α∈J (Rn \ Uα) . Thus, we obtain the counterpart of Theorem 1.26 for closed sets. Chapter 1. Euclidean Spaces 44 Theorem 1.28 1. Arbitrary intersection of closed sets is closed. Namely, if {Aα |α ∈ J} is a collection of closed sets in Rn, then their intersection A = ⋂ α∈J Aα is also a closed set. 2. Finite union of closed sets is closed. Namely, if C1, . . . , Ck are closed sets in Rn, then their union C = k⋃ i=1 Ci is also a closed set. Proof We prove the first statement. The proof of the second statement is similar. Given that {Aα |α ∈ J} is a collection of closed sets in Rn, for each α ∈ J , let Uα = Rn \ Aα. Then {Uα |α ∈ J} is a collection of open sets in Rn. By Theorem 1.26, the set ⋃ α∈J Uα is open. By Theorem 1.25, Rn \ ⋃ α∈J Uα is closed. By De Morgan’s law, Rn \ ⋃ α∈J Uα = ⋂ α∈J (Rn \ Uα) = ⋂ α∈J Aα. This proves that ⋂ α∈J Aα is a closed set. The following example says that any finite point set is a closed set. Example 1.27 Let S = {x1, . . . ,xk} be a finite point set in Rn. Then S = k⋃ i=1 {xi} is a finite union of one-point sets. Since one-point set is closed, S is closed. Chapter 1. Euclidean Spaces 45 Exercises 1.3 Question 1 Let A be the subset of R2 given by A = {(x, y) |x > 0, y > 0} . Show that A is an open set. Question 2 Let A be the subset of R2 given by A = {(x, y) |x ≥ 0, y ≥ 0} . Show that A is a closed set. Question 3 Let A be the subset of R2 given by A = {(x, y) |x > 0, y ≥ 0} . Is A open? Is A closed? Justify your answers. Question 4 Let C and U be subsets of Rn. Assume that C is closed and U is open, show that U \ C is open and C \ U is closed. Question 5 Let A be a subset of Rn, and let B = A + u be the translate of A by the vector u. (a) Show that A is open if and only if B is open. (b) Show that A is closed if and only if B is closed. Chapter 1. Euclidean Spaces 46 1.4 Interior, Exterior, Boundary and Closure First, we introduce the interior of a set. Definition 1.28 Interior Let S be a subset of Rn. We say that x ∈ Rn is an interior point of S if there exists r > 0 such that B(x, r) ⊂ S. The interior of S, denoted by intS, is defined to be the collection of all the interior points of S. Figure 1.14: The interior point of a set. The following gives a characterization of the interior of a set. Theorem 1.29 Let S be a subset of Rn. Then we have the followings. 1. intS is a subset of S. 2. intS is an open set. 3. S is an open set if and only if S = intS. 4. If U is an open set that is contained in S, then U ⊂ intS. These imply that intS is the largest open set that is contained in S. Chapter 1. Euclidean Spaces 47 Proof Let x be a point in intS. By definition, there exists r > 0 such that B(x, r) ⊂ S. Since x ∈ B(x, r) and B(x, r) ⊂ S, x is a point in S. Since we have shown that every point in intS is in S, intS is a subset of S. If y ∈ B(x, r), Lemma 1.23 says that there is an r′ > 0 such that B(y, r′) ⊂ B(x, r) ⊂ S. Hence, y is also in intS. This proves thatB(x, r) is contained in intS. Since we have shown that for any x ∈ intS, there is an r > 0 such that B(x, r) is contained in intS, this shows that intS is open. If S = intS, S is open. Conversely, if S is open, for every x in S, there is an r > 0 such that B(x, r) ⊂ S. Then x is in intS. Hence, S ⊂ intS. Since we have shown that intS ⊂ S is always true, we conclude that if S is open, S = intS. If U is a subset of S and U is open, for every x in U , there is an r > 0 such that B(x, r) ⊂ U . But then B(x, r) ⊂ S. This shows that x is in intS. Since every point of U is in intS, this proves that U ⊂ intS. Example 1.28 Find the interior of each of the following subsets of R. (a) A = (a, b) (b) B = (a, b] (c) C = [a, b] (d) Q Solution (a) Since A is an open set, intA = A = (a, b). (b) Since A is an open set that is contained in B, A = (a, b) is contained in intB. Since intB ⊂ B, we only left to determine whether b is in intB. The same argument as given in Example 1.24 shows that b is not an interior point of B. Hence, intB = A = (a, b). Chapter 1. Euclidean Spaces 48 (c) Similar arguments as given in (b) show that A ⊂ intC, and both a and b are not interior points of C. Hence, intC = A = (a, b). (d) For any x ∈ R and any r > 0, B(x, r) = (x − r, x + r) contains an irrational number. Hence, B(x, r) is not contained in Q. This shows that Q does not have interior points. Hence, intQ = ∅. Definition 1.29 Neighbourhoods Let x be a point in Rn and let U be a subset of Rn. We say that U is a neighbourhood of x if U is an open set that contains x. Notice that this definition is slightly different from the one we use in volume I for the n = 1 case. Neighbourhoods By definition, if U is a neighbourhood of x, then x is an interior point of U , and there is an r > 0 such that B(x, r) ⊂ U . Example 1.29 Consider the point x = (1, 2) and the sets U = { (x1, x2) |x21 + x22 < 9 } , V = {(x1, x2) | 0 < x1 < 2,−1 < x2 < 3} in R2. The sets U and V are neighbourhoods of x. Next, we introduce the exterior and boundary of a set. Definition 1.30 Exterior Let S be a subset of Rn. We say that x ∈ Rn is an exterior point of S if there exists r > 0 such that B(x, r) ⊂ Rn \ S. The exterior of S, denoted by extS, is defined to be the collection of all the exterior points of S. Chapter 1. Euclidean Spaces 49 Figure 1.15: The sets U and V are neighbourhoods of the point x. Definition 1.31 Boundary Let S be a subset of Rn. We say that x ∈ Rn is a boundary point of S if for every r > 0, the ball B(x, r) intersects both S and Rn \S. The boundary of S, denoted by bdS or ∂S, is defined to be the collection of all the boundary points of S. Figure 1.16: P is an interior point, Q is an exterior point, E is a boundary point. Chapter 1. Euclidean Spaces 50 Theorem 1.30 Let S be a subset of Rn. We have the followings. (a) ext (S) = int (Rn \ S). (b) bd (S) = bd (Rn \ S). (c) intS, extS and bdS are mutually disjoint sets. (d) Rn = intS ∪ extS ∪ bdS. Proof (a) and (b) are obvious from definitions. For parts (c) and (d), we notice that for a point x ∈ Rn, exactly one of the following three statements holds. (i) There exists r > 0 such that B(x, r) ⊂ S. (ii) There exists r > 0 such that B(x, r) ⊂ Rn \ S. (iii) For every r > 0, B(x, r) intersects both S and Rn \ S. Thus, intS, extS and bdS are mutually disjoint sets, and their union is Rn. Example 1.30 Find the exterior and boundary of each of the following subsets of R. (a) A = (a, b) (b) B = (a, b] (c) C = [a, b] (d) Q Solution We have seen in Example 1.28 that intA = intB = intC = (a, b). Chapter 1. Euclidean Spaces 51 For any r > 0, the ball B(a, r) = (a − r, a + r) contains a point less than a, and a point larger than a. Hence, a is a boundary point of the sets A, B and C. Similarly, b is a boundary point of the sets A, B and C. For every point x which satisfies x < a, let r = a − x. Then r > 0. Since x+r = a, the ballB(x, r) = (x−r, x+r) is contained in (−∞, a). Hence, x is an exterior point of the sets A, B and C. Similarly every point x such that x > b is an exterior point of the sets A, B and C. Since the interior, exterior and boundary of a set in R are three mutually disjoint sets whose union is R, we conclude that bdA = bdB = bdC = {a, b}, extA = extB = extC = (−∞, a) ∪ (b,∞). For every x ∈ R and every r > 0, the ball B(x, r) = (x−r, x+r) contains a point in Q and a point not in Q. Therefore, x is a boundary point of Q. This shows that bdQ = R, and thus, extQ = ∅. Example 1.31 Let A = B(x0, r), where x0 is a point in Rn, and r is a positive number. Find the interior, exterior and boundary of A. Solution Wehave shown that A is open. Hence, intA = A. Let U = {x ∈ Rn | ∥x− x0∥ > r} , C = {x ∈ Rn | ∥x− x0∥ = r} . Notice that A, U and C are mutually disjoint sets whose union is Rn. If x is in U , d = ∥x−x0∥ > r. Let r′ = d−r. Then r′ > 0. If y ∈ B(x, r′), then ∥y − x∥ < r′. It follows that ∥y − x0∥ ≥ ∥x− x0∥ − ∥y − x∥ > d− r′ = r. This proves that y ∈ U . Hence, Bd(x, r ′) ⊂ U ⊂ Rn \A, which shows that x is an exterior point of A. Thus, U ⊂ extA. Chapter 1. Euclidean Spaces 52 Now if x ∈ C, ∥x − x0∥ = r. For every r′ > 0, let a = 1 2 min{r′/r, 1}. Then a ≤ 1 2 and a ≤ r′ 2r . Consider the point v = x− a(x− x0). Notice that ∥v − x∥ = ar ≤ r′ 2 < r′. Thus, v is in B(x, r′). On the other hand, ∥v − x0∥ = (1− a)r < r. Thus, v is inA. This shows thatB(x, r′) intersectsA. Since x is inB(x, r′) but not inA, we find thatB(x, r′) intersects Rn\A. Hence, x is a boundary point of A. This shows that C ⊂ bdA. Since intA, extA and bdA are mutually disjoint sets, we conclude that intA = A, extA = U and bdA = C. Now we introduce the closure of a set. Definition 1.32 Closure Let S be a subset of Rn. The closure of S, denoted by S, is defined as S = intS ∪ bdS. Example 1.32 Example 1.31 shows that the closure of the open ball B(x0, r) is the closed ball CB(x0, r). Example 1.33 Consider the sets A = (a, b), B = (a, b] and C = [a, b] in Example 1.28 and Example 1.30. We have shown that intA = intB = intC = (a, b), and bdA = bdB = bdC = {a, b}. Therefore, A = B = C = [a, b]. Chapter 1. Euclidean Spaces 53 Since Rn is a disjoint union of intS, bdS and extS, we obtain the following immediately from the definition. Theorem 1.31 Let S be a subset of Rn. Then S and extS are complement of each other in Rn. The following theorem gives a characterization of the closure of a set. Theorem 1.32 Let S be a subset of Rn, and let x be a point in Rn. The following statements are equivalent. (a) x ∈ S. (b) For every r > 0, B(x, r) intersects S. (c) There is a sequence {xk} in S that converges to x. Proof If x is in S, x is not in int (Rn \ S). Thus, for every r > 0, B(x, r) is not contained in Rn \ S. Then it must intersect S. This proves (a) implies (b). If (b) holds, for every k ∈ Z+, take r = 1/k. The ball B(x, 1/k) intersects S at some point xk. This gives a sequence {xk} satisfying ∥xk − x∥ < 1 k . Thus, {xk} is a sequences in S that converges to x. This proves (b) implies (c). If (c) holds, for every r > 0, there is a positive integer K such that for all k ≥ K, ∥xk − x∥ < r, and thus xk ∈ B(x, r). This shows that B(x, r) is not contained in Rn \ S. Hence, x /∈ extS, and thus we must have x ∈ S. This proves (c) implies (a). The following theorem gives further properties of the closure of a set. Chapter 1. Euclidean Spaces 54 Theorem 1.33 Let S be a subset of Rn. 1. S is a closed set that contains S. 2. S is closed if and only if S = S. 3. If C is a closed subset of Rn and S ⊂ C, then S ⊂ C. These imply that S is the smallest closed set that contains S. Proof These statements are counterparts of the statements in Theorem 1.29. Since extS = int (Rn \ S), and the interior of a set is open, extS is open. Since S = Rn \ extS, S is a closed set. Since extS ⊂ Rn \ S, we find that S = Rn \ extS ⊃ S. If S = S, then S must be closed since S is closed. Conversely, if S is closed, Rn \ S is open, and so extS = int (Rn \ S) = Rn \ S. It follows that S = Rn \ extS = S. If C is a closed set that contains S, then Rn \ C is an open set that is contained in Rn \S. Thus, Rn \C ⊂ int (Rn \S) = extS. This shows that C ⊃ Rn \ extS = S. Corollary 1.34 If S be a subset of Rn, S = S ∪ bdS. Proof Since intS ⊂ S, S = intS ∪ bdS ⊂ S ∪ bdS. Since S and bdS are both subsets of S, S ∪ bdS ⊂ S. This proves that S = S ∪ bdS. Chapter 1. Euclidean Spaces 55 Example 1.34 Let U be the open rectangle U = n∏ i=1 (ai, bi) in Rn. Show that the closure of U is the closed rectangle R = n∏ i=1 [ai, bi]. Solution Since R is a closed set that contains U , U ⊂ R. If x = (x1, . . . , xn) is a point in R, then xi ∈ [ai, bi] for each 1 ≤ i ≤ n. Since [ai, bi] is the closure of (ai, bi) in R, there is a sequence {xi,k}∞k=1 in (ai, bi) that converges to xi. For k ∈ Z+, let xk = (x1,k, . . . , xn,k). Then {xk} is a sequence in U that converges to x. This shows that x ∈ U , and thus completes the proof that U = R. The proof of the following theorem shows the usefulness of the characterization of intS as the largest open set that is contained in S, and S is the smallest closed set that contains S. Theorem 1.35 If A and B are subsets of Rn such that A ⊂ B, then (a) intA ⊂ intB; and (b) A ⊂ B. Proof Since intA is an open set that is contained in A, it is an open set that is contained in B. By the fourth statement in Theorem 1.29, intA ⊂ intB. Since B is a closed set that contains B, it is a closed set that contains A. By the third statement in Theorem 1.33, A ⊂ B. Notice that as subsets of R, (a, b) ⊂ (a, b] ⊂ [a, b]. We have shown that Chapter 1. Euclidean Spaces 56 (a, b) = (a, b] = [a, b]. In general, we have the following. Theorem 1.36 If A and B are subsets of Rn such that A ⊂ B ⊂ A, then A = B. Proof By Theorem 1.35, A ⊂ B implies that A ⊂ B, while B ⊂ A implies that B is contained in A = A. Thus, we have A ⊂ B ⊂ A, which proves that B = A. Example 1.35 In general, if S is a subset of Rn, it is not necessary true that intS = intS, even when S is an open set. For example, take S = (−1, 0) ∪ (0, 1) in R. Then S is an open set and S = [−1, 1]. Notice that intS = S = (−1, 0) ∪ (0, 1), but intS = (−1, 1). Chapter 1. Euclidean Spaces 57 Exercises 1.4 Question 1 Let S be a subset of Rn. Show that bdS is a closed set. Question 2 Let A be the subset of R2 given by A = {(x, y) |x < 0, y ≥ 0} . Find the interior, exterior, boundary and closure of A. Question 3 Let x0 be a point in Rn, and let r be a positive number. Consider the subset of Rn given by A = {x ∈ Rn | 0 < ∥x− x0∥ ≤ r} . Find the interior, exterior, boundary and closure of A. Question 4 Let A be the subset of R2 given by A = {(x, y) | 1 ≤ x < 3,−2 < y ≤ 5} ∪ {(0, 0), (2,−3)}. Find the interior, exterior, boundary and closure of A. Question 5 Let S be a subset of Rn. Show that bdS = S ∩ Rn \ S. Chapter 1. Euclidean Spaces 58 Question 6 Let S be a subset of Rn. Show that bdS ⊂ bdS. Give an example where bdS ̸= bdS. Question 7 Let S be a subset of Rn. (a) Show that S is open if and only if S does not contain any of its boundary points. (b) Show that S is closed if and only if S contains all its boundary points. Question 8 Let S be a subset of Rn, and let x be a point in Rn. (a) Show that x is an interior point of S if and only if there is a neighbourhood of x that is contained in S. (b) Show that x ∈ S if and only if every neighbourhood of x intersects S. (c) Show that x is a boundary point of S if and only if every neighbourhood of x contains a point in S and a point not in S. Question 9 Let S be a subset of Rn, and let x = (x1, . . . , xn) be a point in the interior of S. (a) Show that there is an r1 > 0 such that CB(x, r1) ⊂ S. (b) Show that there is an r2 > 0 such that n∏ i=1 (xi − r2, xi + r2) ⊂ S. (c) Show that there is an r3 > 0 such that n∏ i=1 [xi − r3, xi + r3] ⊂ S. Chapter 1. Euclidean Spaces 59 1.5 Limit Points and Isolated Points In this section, we generalize the concepts of limit points and isolated points to subsets of Rn. Definition 1.33 Limit Points Let S be a subset of Rn. A point x in Rn is a limit point of S provided that there is a sequence {xk} in S \ {x} that converges to x. The set of limit points of S is denoted by S ′. By Theorem 1.32, we obtain the following immediately. Theorem 1.37 Let S be a subset of Rn, and let x be a point in Rn. The following are equivalent. (a) x is a limit point of S. (b) x is in S \ {x}. (c) For every r > 0, B(x, r) intersects S at a point other than x. Corollary 1.38 If S is a subset of Rn, then S ′ ⊂ S. Proof If x ∈ S ′, x∈ S \ {x}. Since S \ {x} ⊂ S, we have S \ {x} ⊂ S. Therefore, x ∈ S. The following theorem says that the closure of a set is the union of the set with all its limit points. Theorem 1.39 If S is a subset of Rn, then S = S ∪ S ′. Chapter 1. Euclidean Spaces 60 Proof By Corollary 1.38, S ′ ⊂ S. Since we also have S ⊂ S, we find that S ∪ S ′ ⊂ S. Conversely, if x ∈ S, then by Theorem 1.32, there is a sequence {xk} in S that converges to x. If x is not in S, then the sequence {xk} is in S \ {x}. In this case, x is a limit point of S. This shows that S \ S ⊂ S ′, and hence, S ⊂ S ∪ S ′. In the proof above, we have shown the following. Corollary 1.40 Let S be a subset of Rn. Every point in S that is not in S is a limit point of S. Namely, S \ S ⊂ S ′. Now we introduce the definition of isolated points. Definition 1.34 Isolated Points Let S be a subset of Rn. A point x in Rn is an isolated point of S if (a) x is in S; (b) x is not a limit point of S. Remark 1.1 By definition, a point x in S is either an isolated point of S or a limit point of S. Theorem 1.37 gives the following immediately. Theorem 1.41 Let S be a subset of Rn and let x be a point in S. Then x is an isolated point of S if and only if there is an r > 0 such that the ball B(x, r) does not contain other points of S except the point x. Chapter 1. Euclidean Spaces 61 Example 1.36 Find the set of limit points and isolated points of the set A = Z2 as a subset of R2. Solution If {xk} is a sequence inA that converges to a point x, then there is a positive integer K such that for all l ≥ k ≥ K, ∥xl − xk∥ < 1. This implies that xk = xK for all k ≥ K. Hence, x = xK ∈ A. This shows that A is closed. Hence, A = A. Therefore, A′ ⊂ A. For every x = (k, l) ∈ Z2, B(x, 1) intersects A only at the point x itself. Hence, x is an isolated point of A. This shows that every point of A is an isolated point. Since A′ ⊂ A, we must have A′ = ∅. Figure 1.17: The set Z2 does not have limit points. Let us prove the following useful fact. Theorem 1.42 If S is a subset of Rn, every interior point of S is a limit point of S. Chapter 1. Euclidean Spaces 62 Proof If x is an interior point of S, there exists r0 > 0 such that B(x, r0) ⊂ S. Given r > 0, let r′ = 1 2 min{r, r0}. Then r′ > 0. Since r′ < r and r′ < r0, the point x′ = x+ r′e1 is inB(x, r) and S. Obviously, x′ ̸= x. Therefore, for every r > 0, B(x, r) intersects S at a point other than x. This proves that x is a limit point of S. Since S ⊂ intS ∪ bdS, and intS and bdS are disjoint, we deduce the following. Corollary 1.43 Let S be a subset of Rn. An isolated point of S must be a boundary point. Since every point in an open set S is an interior point of S, we obtain the following. Corollary 1.44 If S is an open subset of Rn, every point of S is a limit point. Namely, S ⊂ S ′. Example 1.37 If I is an interval of the form (a, b), (a, b], [a, b) or [a, b] in R, then bd I = {a, b}. It is easy to check that a and b are not isolated points of I . Hence, I has no isolated points. Since I = I ∪ I ′ and I ⊂ I ′, we find that I ′ = I = [a, b]. In fact, we can prove a general theorem. Theorem 1.45 Let A and B be subsets of Rn such that A is open and A ⊂ B ⊂ A. Then B′ = A. In particular, the set of limit points of A is A. Chapter 1. Euclidean Spaces 63 Proof By Theorem 1.36, A = B. Since A is open, A ⊂ A′. Since A = A ∪ A′, we find that A = A′. In the exercises, one is asked to show that A ⊂ B implies A′ ⊂ B′. Therefore, A = A′ ⊂ B′ ⊂ B. Since A = B, we must have B′ = B = A. Example 1.38 Let A be the subset of R2 given by A = [−1, 1]× (−2, 2] = {(x, y) | − 1 ≤ x ≤ 1,−2 < y ≤ 2} . Since U = (−1, 1)× (−2, 2) is open, U = [−1, 1]× [−2, 2], and U ⊂ A ⊂ U , the set of limit points of A is U = [−1, 1]× [−2, 2]. Chapter 1. Euclidean Spaces 64 Exercises 1.5 Question 1 Let A and B be subsets of Rn such that A ⊂ B. Show that A′ ⊂ B′. Question 2 Let x0 be a point in Rn and let r be a positive number. Find the set of limit points of the open ball B(x0, r). Question 3 Let A be the subset of R2 given by A = {(x, y) |x < 0, y ≥ 0} . Find the set of limit points of A. Question 4 Let x0 be a point in Rn, and let r is a positive number. Consider the subset of Rn given by A = {x ∈ Rn | 0 < ∥x− x0∥ ≤ r} . (a) Find the set of limit points of A. (b) Find the set of isolated points of the set S = Rn \ A. Question 5 Let A be the subset of R2 given by A = {(x, y) | 1 ≤ x < 3,−2 < y ≤ 5} ∪ {(0, 0), (2,−3)}. Determine the set of isolated points and the set of limit points of A. Chapter 1. Euclidean Spaces 65 Question 6 Let A = Q2 as a subset of R2. (a) Find the interior, exterior, boundary and closure of A. (b) Determine the set of isolated points and the set of limit points of A. Question 7 Let S be a subset of Rn. Show that S is closed if and only if it contains all its limit points. Question 8 Let S be a subset of Rn, and let x be a point in Rn. Show that x is a limit point of S if and only if every neighbourhood of x intersects S at a point other than itself. Question 9 Let x1, x2, . . ., xk be points in Rn and let A = Rn \ {x1,x2, . . . ,xk}. Find the set of limit points of A. Chapter 2. Limits of Multivariable Functions and Continuity 66 Chapter 2 Limits of Multivariable Functions and Continuity We are interested in functions F : D → Rm that are defined on subsets D of Rn, taking values in Rm. When n ≥ 2, these are called multivariable functions. When m ≥ 2, they are called vector-valued functions. When m = 1, we usually write the function as f : D → R. 2.1 Multivariable Functions In this section, let us define some special classes of multivariable functions. 2.1.1 Polynomials and Rational Functions A special class of functions is the set of polynomials in n variables. Definition 2.1 Polynomials Let k = (k1, . . . , kn) be an n-tuple of nonnegative integers. Associated to this n-tuple k, there is a monomial pk : Rn → R of degree |k| = k1 + · · ·+ kn of the form pk(x) = xk11 · · ·xknn . A polynomial in n variables is a function p : Rn → R that is a finite linear combination of monomials in n variables. It takes the form p(x) = m∑ j=1 ckj pkj (x), where k1,k2, . . . ,km are distinct n-tuples of nonnegative integers, and ck1 , ck2 , . . . , ckm are nonzero real numbers. The degree of the polynomial p(x) is max{|k1|, |k2|, . . . , |km|}. Chapter 2. Limits of Multivariable Functions and Continuity 67 Example 2.1 The following are examples of polynomials in three variables. (a) p(x1, x2, x3) = x21 + x22 + x23 (b) p(x1, x2, x3) = 4x21x2 − 3x1x3 + x1x2x3 Example 2.2 The function f : Rn → R, f(x) = ∥x∥ = √ x21 + · · ·+ x2n is not a polynomial. When the domain of a function is not specified, we always assume that the domain is the largest set on which the function can be defined. Definition 2.2 Rational Functions A rational function f : D → R is the quotient of two polynomials p : Rn → R and q : Rn → R. Namely, f(x) = p(x) q(x) . Its domain D is the set D = {x ∈ Rn | q(x) ̸= 0} . Example 2.3 The function f(x1, x2) = x1x2 + 3x21 x1 − x2 is a rational function defined on the set D = { (x1, x2) ∈ R2 |x1 ̸= x2 } . Chapter 2. Limits of Multivariable Functions and Continuity 68 2.1.2 Component Functions of a Mapping If the codomain Rm of the function F : D → Rm has dimension m ≥ 2, we usually call the function a mapping. In this case, it would be good to consider the component functions. For 1 ≤ j ≤ m, the projection function πj : Rm → R is the function πj(x1, . . . , xm) = xj. Definition 2.3 Component Functions Let F : D → Rm be a function defined on D ⊂ Rn. For 1 ≤ j ≤ m, the j th component function of F is the function Fj : D → R defined as Fj = (πj ◦ F) : D → R. For each x ∈ D, F(x) = (F1(x), . . . , Fm(x)). Example 2.4 For the function F : R3 → R3, F(x) = −3x, the component functions are F1(x1, x2, x3) = −3x1, F2(x1, x2, x3) = −3x2, F3(x1, x2, x3) = −3x3. For convenience, we also define the notion of polynomialmappings. Definition 2.4 Polynomial Mappings We call a function F : Rn → Rm a polynomial mapping if each of its components Fj : Rn → R, 1 ≤ j ≤ m, is a polynomial function. The degree of the polynomial mapping F is the maximum of the degrees of the polynomials F1, F2, . . . , Fm. Example 2.5 The mapping F : R3 → R2, F(x, y, z) = (x2y + 3xz, 8yz3 − 7x) is a polynomial mapping of degree 4. Chapter 2. Limits of Multivariable Functions and Continuity 69 2.1.3 Invertible Mappings The invertibility of a function F : D → Rm is defined in the following way. Definition 2.5 Inverse Functions Let D be a subset of Rn, and let F : D → Rm be a function defined on D. We say that F is invertible if F is one-to-one. In this case, the inverse function F−1 : F(D) → D is defined so that for each y ∈ F(D), F−1(y) = x if and only if F(x) = y. Example 2.6 Let D = {(x, y) |x > 0, y > 0} and let F : D → R2 be the function defined as F(x, y) = (x− y, x+ y). Show that F is invertible and find its inverse. Solution Let u = x− y and v = x+ y. Then x = u+ v 2 , y = v − u 2 . This shows that for any (u, v) ∈ R2, there is at most one pair of (x, y) such that F(x, y) = (u, v). Thus, F is one-to-one, and hence, it is invertible. Observe that F(D) = {(u, v) | v > 0,−v < u < v.} . The inverse mapping is given by F−1 : F(D) → R2, F−1(u, v) = ( u+ v 2 , v − u 2 ) . Chapter 2. Limits of Multivariable Functions and Continuity 70 2.1.4 Linear Transformations Another special class of functions consists of linear transformations. A function T : Rn → Rm is a linear transformation if for any x1, . . . ,xk in Rn, and for any c1, . . . , ck in R, T(c1x1 + · · ·+ ckxk) = c1T(x1) + · · ·+ ckT(xk). Linear transformations are closely related to matrices. An m × n matrix A is an array with m rows and n columns of real numbers. It has the form A = [aij] = a11 a12 · · · a1n a21 a22 · · · a2n ... ... . . . ... am1 am2 · · · amn . IfA = [aij] andB = [bij] arem×nmatrices, α and β are real numbers, αA+βB is defined to be the m× n matrix C = αA+ βB = [cij] with cij = αaij + βbij. If A = [ail] is a m × k matrix, B = [blj] is a k × n matrix, the product AB is defined to be the m× n matrix C = AB = [cij], where cij = k∑ l=1 ailblj. It is easy to verify that matrix multiplications are associative. Given x = (x1, . . . , xn) in Rn, we identify it with the column vector x = x1 x2 ... xn , which is an n × 1 matrix. If A is an m × n matrix, and x is a vector in Rn, then y = Ax is the vector in Rm given by y = Ax = a11 a12 · · · a1n a21 a22 · · · a2n ... ... . . . ... am1 am2 · · · amn x1 x2 ... xn = a11x1 + a12x2 + · · ·+ a1nxn a21x1 + a22x2 + · · ·+ a2nxn ... am1x1 + am2x2 + · · ·+ amnxn . Chapter 2. Limits of Multivariable Functions and Continuity 71 The following is a standard result in linear algebra. Theorem 2.1 A function T : Rn → Rm is a linear transformation if and only if there exists an m× n matrix A = [aij] such that T(x) = Ax. In this case, A is called the matrix associated to the linear transformation T : Rn → Rm. Sketch of Proof It is easy to verify that the mapping T : Rn → Rm, T(x) = Ax is a linear transformation if A is an m× n matrix. Conversely, if T : Rn → Rm is a linear transformation, then for any x ∈ Rn, T(x) = T(x1e1+x2e2+· · ·+xnen) = x1T(e1)+x2T(e2)+· · ·+xnT(en). Define the vectors a1, a2, . . ., an in Rm by a1 = T(e1), a2 = T(e2), . . . , an = T(en). Let A be the m× n matrix with column vectors a1, a2, . . ., an. Namely, A = [ a1 a2 · · · an ] . Then we have T(x) = Ax. Example 2.7 Let F : R2 → R2 be the function defined as F(x, y) = (x− y, x+ y). Then F is a linear transformation with matrix A = [ 1 −1 1 1 ] . Chapter 2. Limits of Multivariable Functions and Continuity 72 For the linear transformation T : Rn → Rm, T(x) = Ax, the component functions are T1(x) = a11x1 + a12x2 + · · ·+ a1nxn, T2(x) = a21x1 + a22x2 + · · ·+ a2nxn, ... Tm(x) = am1x1 + am2x2 + · · ·+ amnxn. Each of them is a polynomial of degree at most one. Thus, a linear transformation is a polynomial mapping of degree at most one. It is easy to deduce the following. Corollary 2.2 A mapping T : Rn → Rm is a linear transformation if and only if each component function is a linear transformation. The followings are some standard results about linear transformations. Theorem 2.3 If S : Rn → Rm and T : Rn → Rm are linear transformations with matrices A and B respectively, then for any real numbers α and β, αS + βT : Rn → Rm is a linear transformation with matrix αA+ βB. Theorem 2.4 If S : Rn → Rm and T : Rm → Rk are linear transformations with matrices A and B, then T ◦S : Rn → Rk is a linear transformation with matrix BA. Sketch of Proof This follows from (T ◦ S)(x) = T(S(x)) = B(Ax) = (BA)x. In the particular case when m = n, we have the following. Chapter 2. Limits of Multivariable Functions and Continuity 73 Theorem 2.5 Let T : Rn → Rn be a linear transformation represented by the matrix A. The following are equivalent. (a) The mapping T : Rn → Rn is one-to-one. (b) The mapping T : Rn → Rn is onto. (c) The matrix A is invertible. (d) detA ̸= 0. In other words, if the linear transformation T : Rn → Rn is one-to-one or onto, then it is bijective. In this case, the linear transformation is invertible, and we can define the inverse function T−1 : Rn → Rn. Theorem 2.6 Let T : Rn → Rn be an invertible linear transformation represented by the matrix A. Then the inverse mapping T−1 : Rn → Rn is also a linear transformation and T−1(x) = A−1x. Example 2.8 Let T : R2 → R2 be the linear transformation T(x, y) = (x− y, x+ y). The matrix associated with T is A = [ 1 −1 1 1 ] . Since detA = 2 ̸= 0, T is invertible. Since A−1 = 1 2 [ 1 1 −1 1 ] , we have T−1(x, y) = ( x+ y 2 , −x+ y 2 ) . Chapter 2. Limits of Multivariable Functions and Continuity 74 2.1.5 Quadratic Forms Given an m × n matrix A = [aij], its transpose is the n ×m matrix AT = [bij], where bij = aji for all 1 ≤ i ≤ n, 1 ≤ j ≤ m. An n× n matrix A is symmetric if A = AT . An n× n matrix P is orthogonal if P TP = PP T = I. If the column vectors of P are v1, v2, . . ., vn, so that P = [ v1 v2 · · · vn ] , (2.1) then P is orthogonal if and only if {v1, . . . ,vn} is an orthonormal set of vectors in Rn. If A is an n× n symmetric matrix, its characteristic polynomial p(λ) = det(λIn − A) is a monic polynomial of degree n with n real roots λ1, λ2, . . . , λn, counting with multiplicities. These roots are called the eigenvalues of A. There is an orthonormal set of vectors {v1, . . . ,vn} in Rn such that Avi = λivi for all 1 ≤ i ≤ n. (2.2) Let D be the diagonal matrix D = λ1 0 · · · 0 0 λ2 · · · 0 ... ... . . . ... 0 0 · · · λn , (2.3) and let P be the orthogonal matrix (2.1). Then (2.2) is equivalent to AP = PD, or equivalently, A = PDP T = PDP−1. Chapter 2. Limits of Multivariable Functions and Continuity 75 This is known as the orthogonal diagonalization of the real symmetric matrix A. A quadratic form in Rn is a polynomial function Q : Rn → R of the form Q(x) = ∑ 1≤i<j≤n cijxixj. An n× n symmetric matrix A = [aij] defines a quadratic form QA : Rn → R by QA(x) = xTAx = n∑ i=1 n∑ j=1 aijxixj. Example 2.9 The symmetric matrix A = [ 1 −2 −2 5 ] defines the quadratic form QA(x, y) = x2 − 4xy + 5y2. Conversely, given a quadratic form Q(x) = ∑ 1≤i<j≤n cijxixj, then Q = QA, where the entries of A = [aij] are aij = cii, if i = j, cij/2, if i < j, cji/2, if i > j. Thus, there is a one-to-one correspondence between quadratic forms and symmetric matrices. If A = PDP T is an orthogonal diagonalization of A, under the change of variables y = P Tx, or equivalently, x = Py we find that QA = yTDy = λ1y 2 1 + · · ·+ λny 2 n. (2.4) A consequence of (2.4) is the following. Chapter 2. Limits of Multivariable Functions and Continuity 76 Theorem 2.7 Let A bean n × n symmetric matrix, and let QA(x) = xTAx be the associated quadratic form. Let λ1, λ2, . . . , λn be the eigenvalues of A. Assume that λn ≤ · · · ≤ λ2 ≤ λ1. Then for any x ∈ Rn, λn∥x∥2 ≤ QA(x) ≤ λ1∥x∥2. Sketch of Proof Given x ∈ Rn, let y = P Tx. Then ∥y∥2 = yTy = xTPP Tx = xTx = ∥x∥2. By (2.4) , QA(x) = λ1y 2 1 + · · ·+ λny 2 n. Since λn ≤ · · · ≤ λ2 ≤ λ1, we find that λn(y 2 1 + . . .+ y2n) ≤ QA(x) ≤ λ1(y 2 1 + . . .+ y2n). The assertion follows. At the end of this section, let us recall the classification of quadratic forms. Definiteness of Symmetric Matrices Given an n× n symmetric matrix A = [aij], let QA : Rn → R, QA(x) = xTAx = n∑ i=1 n∑ j=1 aijxixj be the associated quadratic form. Chapter 2. Limits of Multivariable Functions and Continuity 77 1. We say that the matrix A is positive definite, or the quadratic form QA is positive definite, if QA(x) > 0 for all x ̸= 0 in Rn. 2. We say that the matrix A is negative definite, or the quadratic form QA is negative definite, if QA(x) < 0 for all x ̸= 0 in Rn. 3. We say that the matrix A is indefinite, or the quadratic form QA is indefinite, if there exist u and v in Rn such that QA(u) > 0 and QA(v) < 0. 4. We say that the matrix A is positive semi-definite, or the quadratic form QA is positive semi-definite, if QA(x) ≥ 0 for all x in Rn. 5. We say that the matrix A is negative semi-definite, or the quadratic form QA is negative semi-definite, if QA(x) ≤ 0 for all x in Rn. Obviously, a symmetric matrix A is negative definite if and only if −A is positive definite. The following is a standard result in linear algebra, which can be deduced from (2.4). Theorem 2.8 Let A be an n × n symmetric matrix, and let QA(x) = xTAx be the associated quadratic form. Let {λ1, . . . , λn} be the set of eigenvalues of A, repeated with multiplicities. (a) QA is positive definite if and only if λi > 0 for all 1 ≤ i ≤ n. (b) QA is negative definite if and only if λi < 0 for all 1 ≤ i ≤ n. (c) QA is indefinite if there exist i and j so that λi > 0 and λj < 0. (d) QA is positive semi-definite if and only if λi ≥ 0 for all 1 ≤ i ≤ n. (e) QA is negative semi-definite if and only if λi ≤ 0 for all 1 ≤ i ≤ n. From Theorem 2.7 and Theorem 2.8, we obtain the following. Chapter 2. Limits of Multivariable Functions and Continuity 78 Corollary 2.9 Let Q : Rn → R be a quadratic form. If Q is positive definite, then there exists a positive constant c such that Q(x) ≥ c∥x∥2 for all x ∈ Rn. In fact, c can be any positive number that is less than or equal to the smallest eigenvalue of the symmetric matrix A associated to the quadratic form Q. Chapter 2. Limits of Multivariable Functions and Continuity 79 2.2 Limits of Functions In this section, we study limits of multivariable functions. Definition 2.6 Limits of Functions Let D be a subset of Rn and let x0 be a limit point of D. Given a function F : D → Rm, we say that the limit of F(x) as x approaches x0 is v, provided that whenever {xk} is a sequence of points in D \ {x0} that converges to x0, the sequence {F(xk)} of points in Rm converges to the point v. If the limit of F : D → Rm as x approaches x0 is v, we write lim x→x0 F(x) = v. Example 2.10 For 1 ≤ i ≤ n, let πi : Rn → R be the projection function πi(x1, . . . , xn) = xi. By the theorem on componentwise convergence of sequences, if {xk} is a sequence in Rn \ {x0} that converges to the point x0, then lim k→∞ πi(xk) = πi(x0). This means that lim x→x0 πi(x) = πi(x0). From the theorem on componentwise convergence of sequences, we also obtain the following immediately. Chapter 2. Limits of Multivariable Functions and Continuity 80 Proposition 2.10 Let D be a subset of Rn and let x0 be a limit point of D. Given a function F : D → Rm, lim x→x0 F(x) = v if and only if for each 1 ≤ j ≤ m, lim x→x0 Fj(x) = πj(v). Example 2.11 Let f : Rn → R be the function defined as f(x) = ∥x∥. If x0 is a point in Rn, find lim x→x0 f(x). Solution We have shown in Example 1.15 that If {xk} is a sequence in Rn \ {x0} that converges to x0, then lim k→∞ ∥xk∥ = ∥x0∥. Therefore, lim x→x0 f(x) = ∥x0∥. By the limit laws for sequences, we also have the followings. Proposition 2.11 Let F : D → Rm and G : D → Rm be functions defined on D ⊂ Rn. If x0 is a limit point of D and lim x→x0 F(x) = u, lim x→x0 G(x) = v, then for any real numbers α and β, lim x→x0 (αF+ βG)(x) = αu+ βv. Chapter 2. Limits of Multivariable Functions and Continuity 81 Proposition 2.12 Let f : D → R and g : D → R be functions defined on D ⊂ Rn. If x0 is a limit point of D and lim x→x0 f(x) = u, lim x→x0 g(x) = v, then lim x→x0 (fg)(x) = uv. If g(x) ̸= 0 for all x ∈ D, and v ̸= 0, then lim x→x0 ( f g ) (x) = u v . Example 2.12 If k = (k1, . . . , kn) is a k-tuple of nonnegative integers, the monomial pk : Rn → R, pk(x) = xk11 · · ·xknn can be written as a product of the projection functions πi : Rn → R, πi(x) = xi, 1 ≤ i ≤ n. By Proposition 2.12, lim x→x0 pk(x) = pk(x0) for any x0 in Rn. If p : Rn → R is a polynomial, it is a finite linear combination of monomials. Proposition 2.11 then implies that for any x0 in Rn, lim x→x0 p(x) = p(x0). If f : D → R, f(x) = p(x)/q(x) is a rational function which is equal to the quotient of the polynomial p(x) by the polynomial q(x), then Proposition 2.12 implies that lim x→x0 f(x) = f(x0) for any x0 ∈ D = {x ∈ Rn | q(x) ̸= 0}. Chapter 2. Limits of Multivariable Functions and Continuity 82 Example 2.13 Find lim (x,y)→(1,−1) x2 + 3xy + 2y2 x2 + y2 . Solution Since lim (x,y)→(1,−1) (x2 + 3xy + 2y2) = 1− 3 + 2 = 0, lim (x,y)→(1,−1) (x2 + y2) = 1 + 1 = 2, we find that lim (x,y)→(1,−1) x2 + 3xy + 2y2 x2 + y2 = 0 2 = 0. It is easy to deduce the limit law for composite functions. Proposition 2.13 Let D be a subset of Rn, and let U be a subset of Rk. Given the two functions F : D → Rk and G : U → Rm, if F(D) ⊂ U , we can define the composite function H = G ◦ F : D → Rm by H(x) = G(F(x)). If x0 is a limit point of D, y0 is a limit point of U , F(D \ {x0}) ⊂ U \ {y0}, lim x→x0 F(x) = y0, lim y→y0 G(y) = v, then lim x→x0 H(x) = lim x→x0 (G ◦ F)(x) = v. The proof repeats verbatim the proof of the corresponding theorem for single variable functions. Example 2.14 Find the limit lim (x,y)→(0,0) sin(2x2 + 3y2) 2x2 + 3y2 . Chapter 2. Limits of Multivariable Functions and Continuity 83 Figure 2.1: The function f(x, y) = x2 + 3xy + 2y2 x2 + y2 in Example 2.13. Solution Since lim (x,y)→(0,0) (2x2 + 3y2) = 2× 0 + 3× 0 = 0, lim u→0 sinu u = 1, the limit law for composite functions implies that lim (x,y)→(0,0) sin(2x2 + 3y2) 2x2 + 3y2 = 1. Figure 2.2: The function f(x, y) = sin(2x2 + 3y2) 2x2 + 3y2 in Example 2.14. Let us look at some examples where the rules we have studied cannot be applied. Chapter 2. Limits of Multivariable Functions and Continuity 84 Example 2.15 Determine whether the limit lim (x,y)→(0,0) x2 − 2y2 x2 + y2 exists. Solution Let f(x, y) = x2 − 2y2 x2 + y2 = p(x, y) q(x, y) . When (x, y) → (0, 0), q(x, y) = x2 + y2 → 0. Hence, we cannot apply limit law for quotients of functions. Consider the sequences of points {uk} and {vk} in R2 \ {0, 0} given by uk = ( 1 k , 0 ) , vk = ( 0, 1 k ) . Notice that both the sequences {uk} and {vk} converge to (0, 0). If lim (x,y)→(0,0) f(x, y) = a, then both the sequences {f(uk)} and {f(vk)} should converge to a. Since f(uk) = 1, f(vk) = −2 for all k ∈ Z+, the sequence {f(uk)} converges to 1, while the sequence {f(vk)} converges to −2. These imply that a = 1 and a = −2, which is a contradiction. Hence, the limit lim (x,y)→(0,0) x2 − 2y2 x2 + y2 does not exist. Example 2.16 Determine whether the limit lim (x,y)→(0,0) xy x2 + 2y2 exists. Chapter 2. Limits of Multivariable Functions and Continuity 85 Figure 2.3: The function f(x, y) = x2 − 2y2 x2 + y2 in Example 2.15. Solution Let f(x, y) = xy x2 + 2y2 . Consider the sequences of points {uk} and {vk} in R2 \ {0, 0}given by uk = ( 1 k , 0 ) , vk = ( 1 k , 1 k ) , Notice that both the sequences {uk} and {vk} converge to (0, 0). If lim (x,y)→(0,0) f(x, y) = a, then both the sequences {f(uk)} and {f(vk)} should converge to a. Since f(uk) = 0, f(vk) = 1 3 for all k ∈ Z+, the sequence {f(uk)} converges to 0, while the sequence {f(vk)} converges to 1/3. These imply that a = 0 and a = 1/3, which is a contradiction. Hence, the limit lim (x,y)→(0,0) xy x2 + 2y2 does not exist. Chapter 2. Limits of Multivariable Functions and Continuity 86 Figure 2.4: The function f(x, y) = xy x2 + 2y2 in Example 2.16. Example 2.17 Determine whether the limit lim (x,y)→(0,0) xy2 x2 + 2y4 exists. Solution Let f(x, y) = xy2 x2 + 2y4 . Consider the sequences of points {uk} and {vk} in R2 \ {0, 0} given by uk = ( 1 k , 0 ) , vk = ( 1 k2 , 1 k ) , Notice that both the sequences {uk} and {vk} converge to (0, 0). If lim (x,y)→(0,0) f(x, y) = a, then both the sequences {f(uk)} and {f(vk)} should converge to a. Since f(uk) = 0, f(vk) = 1 3 for all k ∈ Z+, the sequence {f(uk)} converges to 0, while the sequence {f(vk)} converges to 1/3. These imply that a = 0 and a = 1/3, which is a contradiction. Hence, the limit lim (x,y)→(0,0) xy2 x2 + 2y4 does not exist. Chapter 2. Limits of Multivariable Functions and Continuity 87 Figure 2.5: The function f(x, y) = xy2 x2 + 2y4 in Example 2.17. Example 2.18 Determine whether the limit lim (x,y)→(0,0) xy2 x2 + 2y2 exists. Solution Let f(x, y) = xy2 x2 + 2y2 . If {(xk, yk)} is a sequence of points in R2 \ {0, 0} that converges to (0, 0), then |f(xk, yk)| = |xk| y2k x2k + 2y2k ≤ |xk|. The sequence {xk} converges to 0. By squeeze theorem, the sequence {f(xk, yk)} also converges to 0. This proves that lim (x,y)→(0,0) xy2 x2 + 2y2 = 0. Similar to the single variable case, there is an equivalent definition of limits in terms of ε and δ. Chapter 2. Limits of Multivariable Functions and Continuity 88 Figure 2.6: The function f(x, y) = xy2 x2 + 2y2 in Example 2.18. Theorem 2.14 Equivalent Definitions for Limits Let D be a subset of Rn, and let x0 be a limit point of D. Given a function F : D → Rm, the following two definitions for lim x→x0 F(x) = v are equivalent. (i) Whenever {xk} is a sequence of points in D \ {x0} that converges to x0, the sequence {F(xk)} converges to v. (ii) For any ε > 0, there is a δ > 0 such that if the point x is in D and 0 < ∥x− x0∥ < δ, then ∥F(x)− v∥ < ε. Proof We will prove that if (ii) holds, then (i) holds; and if (ii) does not hold, then (i) also does not hold. First assume that (ii) holds. If {xk} is a sequence in D\{x0} that converges to the point x0, we need to show that the sequence {F(xk)} converges to v. Given ε > 0, (ii) implies that there is a δ > 0 such that for all x that is in D \ {x0} with ∥x− x0∥ < δ, we have ∥F(x)− v∥ < ε. Chapter 2. Limits of Multivariable Functions and Continuity 89 Since {xk} converges to x0, there is a positive integer K such that for all k ≥ K, ∥xk − x0∥ < δ. Therefore, for all k ≥ K, ∥F(xk)− v∥ < ε. This shows that the sequence {F(xk)} indeed converges to v. Now assume that (ii) does not hold. Then there is an ε > 0 such that for any δ > 0, there is a point x in D\{x0} with ∥x−x0∥ < δ but ∥F(xk)−v∥ ≥ ε. For this ε > 0, we construct a sequence {xk} in D \ {x0} in the following way. For each positive integer k, there is a point xk in D \ {x0} such that ∥x−x0∥ < 1/k but ∥F(xk)−v∥ ≥ ε. Then {xk} is a sequence in D\{x0} that satisfies ∥x− x0∥ < 1/k for all k ∈ Z+. Hence, it converges to x0. Since ∥F(xk) − v∥ ≥ ε for all k ∈ Z+, the sequence {F(xk)} cannot converge to v. This proves that (i) does not hold. We can give an alternative solution to Example 2.18 as follows. Alternative Solution to Example 2.18 Let f(x, y) = xy2 x2 + 2y2 . Given ε > 0, let δ = ε. If (x, y) is a point in R2 \ {(0, 0)} such that√ x2 + y2 = ∥(x, y)− (0, 0)∥ < δ = ε, then |x| < ε. This implies that |f(x, y)− 0| = |x| y2 x2 + 2y2 ≤ |x| < ε. Hence, lim (x,y)→(0,0) xy2 x2 + 2y2 = 0. Chapter 2. Limits of Multivariable Functions and Continuity 90 Exercises 2.2 Question 1 Determine whether the limit exists. If it exists, find the limit. (a) lim (x,y)→(1,2) 4x2 − y2 x2 + y2 (b) lim (x,y)→(1,2) √ 4x2 − y2 x2 + y2 (c) lim (x,y)→(1,2) √ 4x2 + y2 x2 + y2 Question 2 Determine whether the limit exists. If it exists, find the limit. (a) lim (x,y)→(0,0) x3 + y3 x2 + y2 (b) lim (x,y)→(0,0) x2 + y3 x2 + y2 (c) lim (x,y)→(0,0) e4x 2+y2 − 1 4x2 + y2 (d) lim (x,y)→(0,0) ex 2+y2 − 1 4x2 + y2 Question 3 Determine whether the limit lim (x,y)→(0,0) x2 + 4y4 4x2 + y4 exists. If it exists, find the limit. Chapter 2. Limits of Multivariable Functions and Continuity 91 Question 4 Determine whether the limit lim (x,y)→(1,1) cos(x2 + y2 − 2)− 1 (x2 + y2 − 2)2 exists. If it exists, find the limit. Question 5 Let x0 be a point in Rn. Find the limit lim x→x0 x ∥x∥ . Question 6 Let D be a subset of Rn, and let f : D → R and G : D → Rm be functions defined on D. We can define the function H : D → Rm by H(x) = f(x)G(x) for all x ∈ D. If x0 is a point in D and lim x→x0 f(x) = a, lim x→x0 G(x) = v, show that lim x→x0 H(x) = av. Chapter 2. Limits of Multivariable Functions and Continuity 92 2.3 Continuity The definition of continuity is a direct generalization of the single variable case. Definition 2.7 Continuity Let D be a subset of Rn that contains the point x0, and let F : D → Rm be a function defined on D. We say that the function F is continuous at x0 provided that whenever {xk} is a sequence of points in D that converges to x0, the sequence {F(xk)} converges to F(x0). We say that F : D → Rm is a continuous function if it is continuous at every point of its domain D. From the definition, we obtain the following immediately. Proposition 2.15 Limits and Continuity Let D be a subset of Rn that contains the point x0, and let F : D → Rm be a function defined on D. 1. If x0 is an isolated point of D, then F is continuous at x0. 2. If x0 is a limit point of D, then F is continuous at x0 if and only if lim x→x0 F(x) = F(x0). Example 2.19 Example 2.10 says that for each 1 ≤ i ≤ n, the projection function πi : Rn → R, πi(x) = xi, is a continuous function. Example 2.20 Example 2.11 says that the norm function f : Rn → R, f(x) = ∥x∥, is a continuous function. From Proposition 2.10, we have the following. Chapter 2. Limits of Multivariable Functions and Continuity 93 Proposition 2.16 Let D be a subset of Rn that contains the point x0, and let F : D → Rm be a function defined on D. The function F : D → Rm is continuous at x0 if and only if each of the component functions Fj = (πj ◦ F) : D → R, 1 ≤ j ≤ m, is continuous at x0. Example 2.21 The function F : R3 → R2, F(x, y, z) = (x, z), is a continuous function since each component function is continuous. Proposition 2.11 gives the following. Proposition 2.17 Let F : D → Rm and G : D → Rm be functions defined on D ⊂ Rn, and let x0 be a point in D. If F : D → Rm and G : D → Rm are continuous at x0, then for any real numbers α and β, the function (αF+ βG) : D → Rm is continuous at x0. Proposition 2.12 gives the following. Proposition 2.18 Let f : D → R and g : D → R be functions defined on D ⊂ Rn, and let x0 be a point in D. Assume that the functions f : D → R and g : D → R are continuous at x0. 1. The function (fg) : D → R is continuous at x0. 2. If g(x) ̸= 0 for all x ∈ D, then the function (f/g) : D → R is continuous at x0. Example 2.12 gives the following. Chapter 2. Limits of Multivariable Functions and Continuity 94 Proposition 2.19 Polynomials and rational functions are continuous functions. Since each component of a linear transformation T : Rn → Rm is a polynomial, we have the following. Proposition 2.20 A linear transformation T : Rn → Rm is a continuous function. Since a quadratic form Q : Rn → R is a polynomial, we have the following. Proposition 2.21 A quadraticform Q : Rn → R given by Q(x) = n∑ i=1 n∑ j=1 aijxixj is a continuous function. The following is obvious from the definition of continuity. Proposition 2.22 Let D be a subset of Rn, and let F : D → Rm be a function that is continuous at the point x0 ∈ D. If D1 is a subset of D that contains x0, then the function F : D1 → Rm is also continuous at x0. Example 2.22 Let D be the set D = { (x, y) |x2 + y2 < 1 } , and let f : D → R be the function defined as f(x, y) = xy 1− x2 − y2 . Chapter 2. Limits of Multivariable Functions and Continuity 95 Since f1(x, y) = xy and f2(x, y) = 1 − x2 − y2 are polynomials, they are continuous. Since f2(x, y) ̸= 0 for all (x, y) ∈ D, f : D → R is a continuous function. Figure 2.7: The function f(x, y) = xy 1− x2 − y2 in Example 2.22. Proposition 2.13 implies the following. Proposition 2.23 Let D be a subset of Rn, and let U be a subset of Rk. If F : D → Rk and G : U → Rm are functions such that F(D) ⊂ U , F : D → Rk is continuous at x0, G : U → Rm is continuous at y0, then the composite function H = (G ◦ F) : D → Rm is continuous at x0. A direct proof of this theorem using the definition of continuity is actually much simpler. Proof If {xk} is a sequence of points in D that converges to x0, then since F : D → Rk is continuous at x0, {F(xk)} is a sequence of points in U that converges to y0. Since G : U → Rm is continuous at y0, {G(F(xk))} is a sequence of points in Rm that converges to G(y0) = G(F(x0)). Chapter 2. Limits of Multivariable Functions and Continuity 96 In other words, the sequence {H(xk)} converges to H(x0). This shows that the function H = (G ◦ F) : D → Rm is continuous at x0. Figure 2.8: Composition of functions. Corollary 2.24 Let D be a subset of Rn, and let x0 be a point in D. If the function F : D → Rm is continuous at x0 ∈ D, then the function ∥F∥ : D → R is also continuous at x0. Figure 2.9: The function f(x, y) = |x2 − y2|. Chapter 2. Limits of Multivariable Functions and Continuity 97 Example 2.23 The function f : R2 → R, f(x, y) = |x2 − y2| is a continuous function since f(x, y) = |p(x, y)|, where p(x, y) = x2−y2 is a polynomial function, which is continuous. Example 2.24 Consider the function f : R2 → R, f(x, y) = √ e2xy + x2 + y2. Notice that f(x, y) = ∥F(x, y)∥, where F : R2 → R3 is the function given by F(x, y) = (exy, x, y) . Since g(x, y) = xy is a polynomial function, it is continuous. Being a composition of the continuous function h(x) = ex with the continuous function g(x, y) = xy, F1(x, y) = (h ◦ g)(x, y) = exy is a continuous function. The functions F2(x, y) = x and F3(x, y) = y are continuous functions. Hence, F : R2 → R3 is a continuous function. This implies that f : R2 → R is also a continuous function. Figure 2.10: The function f(x, y) = √ e2xy + x2 + y2. Chapter 2. Limits of Multivariable Functions and Continuity 98 Example 2.25 We have shown in volume I that the function f : R → R, f(x) = sinx x , if x ̸= 0, 1, if x = 0, is a continuous function. Define the function h : R3 → R by h(x, y, z) = sin(x2 + y2 + z2) x2 + y2 + z2 , if (x, y, z) ̸= (0, 0, 0), 1, if (x, y, z) = (0, 0, 0). Since h = f ◦ g, where g : R3 → R is the polynomial function g(x, y, z) = x2 + y2 + z2, which is continuous, the function h : R3 → R is continuous. The following gives an equivalent definition of continuity in terms of ε and δ. Theorem 2.25 Equivalent Definitions of Continuity Let D be a subset of Rn, and let x0 be a limit point of D. Given a function F : D → Rm, the following two definitions for the continuity of F at x0 are equivalent. (i) Whenever {xk} is a sequence of points in D that converges to x0, the sequence {F(xk)} converges to F(x0). (ii) For any ε > 0, there is a δ > 0 such that if the point x is in D and ∥x− x0∥ < δ, then ∥F(x)− F(x0)∥ < ε. The proof is left as an exercise. Notice that statement (ii) can be reformulated as follows. For any ε > 0, there is a δ > 0 such that if the point x is in D and x ∈ B(x0, δ), then F(x) ∈ B(F(x0), ε). Now we want to explore another important property of continuity. Chapter 2. Limits of Multivariable Functions and Continuity 99 Figure 2.11: The definition of continuity in terms of ε and δ. Theorem 2.26 Let O be an open subset of Rn, and let F : O → Rm be a function defined on O. The following are equivalent. (a) F : O → Rm is continuous. (b) For every open subset V of Rm, F−1(V ) is an open subset of Rn. Note that for this theorem to hold, it is important that the domain of the function F is an open set. Proof Assume that (a) holds. Let V be an open subset of Rm, and let U = F−1(V ) = {x ∈ O |F(x) ∈ V } . We need to show that U is an open subset of Rn. If x0 is in U , then it is in O. Since O is open, there exists r0 > 0 such that B(x0, r0) ⊂ O. Since y0 = F(x0) is in V and V is open, there exists ε > 0 such that B(y0, ε) ⊂ V . By (a), there exists δ > 0 such that for any x ∈ O, if ∥x− x0∥ < δ, then ∥F(x)− F(x0)∥ < ε. Chapter 2. Limits of Multivariable Functions and Continuity 100 Take r = min{δ, r0}. Then r > 0, r ≤ r0 and r ≤ δ. If x is in B(x0, r), then x ∈ O and ∥x − x0∥ < r ≤ δ. It follows that ∥F(x) − F(x0)∥ < ε. This implies that F(x) ⊂ B(y0, ε) ⊂ V . Thus, x ∈ U . In other words, we have shown that B(x0, r) is contained in U . This proves that U is open, which is the assertion of (b). Conversely, assume that (b) holds. Let x0 be a point in O, and let y0 = F(x0). Given ε > 0, the ball V = B(y0, ε) is an open subset of Rm. By (b), U = F−1(V ) is open in Rn. By definition, U is a subset of O. Since F(x0) is in V , x0 is in U . Since U is open and it contains x0, there is an r > 0 such that B(x0, r) ⊂ U . Take δ = r. Then if x is a point in O and ∥x−x0∥ < r, x ∈ B(x0, r) ⊂ U . This implies that F(x) ∈ V = B(y0, ε). Namely, ∥F(x)−F(x0)∥ < ε. This proves that F : O → Rm is continuous at x0. Since x0 is an arbitrary point in O, F : O → Rm is continuous. Using the fact that a set is open if and only if its complement is closed, it is natural to expect the following. Theorem 2.27 Let A be a closed subset of Rn, and let F : A → Rm be a function defined on A. The following are equivalent. (a) F : A → Rm is continuous. (b) For every closed subset C of Rm, F−1(C) is a closed subset of Rn. Proof Assume that (a) holds. Let C be a closed subset of Rm, and let D = F−1(C) = {x ∈ A |F(x) ∈ D} . We need to show that D is a closed subset of Rn. If {xk} is a sequence in D that converges to the point x0 in Rn, since D ⊂ A and A is closed, x0 is in A. Since F is continuous at x0, the sequence {F(xk)} is a sequence in C that converges to the point F(x0) in Rm. Since C is closed, F(x0) is in C. Therefore, x0 is in D. This proves that D is closed. Chapter 2. Limits of Multivariable Functions and Continuity 101 Conversely, assume that (a) does not hold. Then F : A → Rm is not continuous at some x0 ∈ A. Thus, there exists ε > 0 such that for any δ > 0, there exists a point x in A∩B(x0, δ) such that ∥F(x)−F(x0)∥ ≥ ε. For k ∈ Z+, let xk be a point in A∩B(x0, 1/k) such that ∥F(xk)−F(x0)∥ ≥ ε. Since ∥xk − x0∥ < 1 k for all k ∈ Z+, the sequence {xk} is a sequence in A that converges to x0. Let C = {y ∈ Rm | ∥y − F(x0)∥ ≥ ε} . Then C is the complement of the open setB(F(x0), ε). Hence, C is closed. It contains F(xk) for all k ∈ Z+, but it does not contain F(x0). Thus, the set D = F−1(C) contains the sequence {xk}, but does not contain its limit x0. This means D is not closed. Therefore, (b) does not hold. There is a much easier proof of Theorem 2.27 if A = Rn, using Theorem 2.26, and the fact that a set is closed if and only if its complement is open. Theorem 2.26 and Theorem 2.27 provide useful tools to justfy that a set is open or closed in Rn, using our known library of continuous functions. Example 2.26 Let A be the subset of R2 given by A = { (x, y) |x2 + y2 < 20, y > x2 } . Show that A is open. Solution Let O = {(x, y) |x2 + y2 < 20}. This is a ball of radius √ 20 centered at the origin. Hence, O isopen. Define the function f : O → R by f(x, y) = y − x2. Since f is a polynomial, it is continuous. Notice that y > x2 if and only if f(x, y) > 0, if and only if f(x, y) ∈ (0,∞). This shows that A = f−1((0,∞)). Since (0,∞) is open in R, Theorem 2.26 implies that A is an open set. Chapter 2. Limits of Multivariable Functions and Continuity 102 Figure 2.12: The set A in Example 2.26. Example 2.27 Let C be the subset of R3 given by C = { (x, y, z) |x ≥ 0, y ≥ 0, y2 + z2 ≤ 20. } . Show that C is closed. Solution Let πx : R3 → R and πy : R3 → R be the projection functions πx(x, y, z) = x and πy(x, y, z) = y, and consider the function g : R3 → R defined as g(x, y, z) = 20− (y2 + z2). Notice that y2 + z2 ≤ 20 if and only if g(x, y, z) ≥ 0, if and only if g(x, y, z) ∈ I = [0,∞). The projection functions πx and πy are continuous. Since g is a polynomial, it is also continuous. The set I = [0,∞) is closed in R. Therefore, the sets π−1 x (I), π−1 y (I) and g−1(I) are closed in R3. Since A = π−1 x (I) ∩ π−1 y (I) ∩ g−1(I), being an intersection of three closed sets, A is closed in R3. Using the same reasonings, we obtain the following. Chapter 2. Limits of Multivariable Functions and Continuity 103 Theorem 2.28 Let I1, . . . , In be intervals in R. 1. If each of I1, . . . , In are open intervals of the form (a, b), (a,∞), (−∞, a) or R, then I1 × · · · × In is an open subset of Rn. 2. If each of I1, . . . , In are closed intervals of the form [a, b], [a,∞), (−∞, a] or R, then I1 × · · · × In is a closed subset of Rn. Sketch of Proof Use the fact that I1 × · · · × In = n⋂ i=1 π−1 i (Ii), where πi : Rn → R is the projection function πi(x1, . . . , xn) = xi. Example 2.28 The set A = {(x, y, z) |x < 0, y > 2,−10 < z < −3} is open in R3, since A = (−∞, 0)× (2,∞)× (−10,−3). The set C = {(x, y, z) |x ≤ 0, y ≥ 2,−10 ≤ z ≤ −3} is closed in R3, since C = (−∞, 0]× [2,∞)× [−10,−3]. We also have the following. Chapter 2. Limits of Multivariable Functions and Continuity 104 Theorem 2.29 Let a and b be real numbers, and assume that f : Rn → R is a continuous function. Define the sets A,B,C,D,E and F as follows. (a) A = {x ∈ Rn | f(x) > a} (b) B = {x ∈ Rn | f(x) ≥ a} (c) C = {x ∈ Rn | f(x) < a} (d) D = {x ∈ Rn | f(x) ≤ a} (e) E = {x ∈ Rn | a < f(x) < b} (f) F = {x ∈ Rn | a ≤ f(x) ≤ b} Then A,C and E are open sets, while B, D and F are closed sets. The proof is left as an exercise. Example 2.29 Find the interior, exterior and boundary of each of the following sets. (a) A = { (x, y) | 0 < x2 + 4y2 < 4 } (b) B = { (x, y) | 0 < x2 + 4y2 ≤ 4 } (c) C = { (x, y) |x2 + 4y2 ≤ 4 } Figure 2.13: The sets A, B and C defined in Example 2.29. Chapter 2. Limits of Multivariable Functions and Continuity 105 Solution Let D = { (x, y) |x2 + 4y2 < 4 } , E = { (x, y) |x2 + 4y2 > 4 } , and let f : R2 → R be the function defined as f(x, y) = x2 + 4y2. Since f is a polynomial, it is continuous. By Theorem 2.29, A,D and E are open sets and C is a closed set. Since A ⊂ B and D ⊂ C, we have A = intA ⊂ intB ⊂ B, D ⊂ intC. Since E = R2 \ C ⊂ R2 \B ⊂ R2 \ A, We have E = extC ⊂ extB ⊂ extA. Let F = { (x, y) |x2 + 4y2 = 4 } . Then Rn is a disjoint union of D, E and F . If u0 = (x0, y0) ∈ F , either x0 ̸= 0 or y0 ̸= 0, but not both. If x0 ̸= 0, define the sequences {uk} and {vk} by uk = ( k k + 1 x0, y0 ) , vk = ( k + 1 k x0, y0 ) . If x0 = 0, then y0 ̸= 0. Define the sequences {uk} and {vk} by uk = ( x0, k k + 1 y0 ) , vk = ( x0, k + 1 k y0 ) . In either case, {uk} is a sequence of points in A that converges to u0, while {vk} is a sequence of points in E that converges to u0. This proves that u0 is a boundary point of A, B and C. For the point 0, since it is not in A and B, it is not an interior point of A and B, but it is the limit of the sequence {(1/k, 0)} that is in both A and B. Hence, 0 is in the closure of A and B, and hence, is a boundary point of A and B. We conclude that Chapter 2. Limits of Multivariable Functions and Continuity 106 intA = intB = { (x, y) | 0 < x2 + 4y2 < 4 } , intC = { (x, y) |x2 + 4y2 < 4 } , extA = extB = extC = { (x, y) |x2 + 4y2 > 4 } , bdA = bdB = { (x, y) |x2 + 4y2 = 4 } ∪ {0}, bdC = { (x, y) |x2 + 4y2 = 4 } . Remark 2.1 Let f : Rn → R be a continuous function and let C = {x ∈ Rn | a ≤ f(x) ≤ b} . One is tempting to say that bdC = {x ∈ Rn | f(x) = a or f(x) = b} . This is not necessary true. For example, consider the set C in Example 2.29. It can be written as C = { (x, y) | 0 ≤ x2 + 4y2 ≤ 4 } However, the point where f(x, y) = x2 + 4y2 = 0 is not a boundary point of C. Now we return to continuous functions. Theorem 2.30 Pasting of Continuous Functions Let A and B be closed subsets of Rn, and let S = A ∪ B. If F : S → Rm is a function such that FA = F|A : A→ Rm and FB = F|B : B → Rm are both continuous, then F : S → Rm is continuous. Chapter 2. Limits of Multivariable Functions and Continuity 107 Proof Since S is a union of two closed sets, it is closed. Applying Theorem 2.27, it suffices to show that if C is a closed subset of Rm, then F−1(C) is closed in Rn. Notice that F−1(C) = {x ∈ S |F(x) ∈ C} = {x ∈ A |F(x) ∈ C} ∪ {x ∈ B |F(x) ∈ C} = F−1 A (C) ∪ F−1 B (C). Since FA : A → Rm and FB : B → Rm are both continuous functions, F−1 A (C) and F−1 B (C) are closed subsets of Rn. Being a union of two closed subsets, F−1(C) is closed. This completes the proof. Example 2.30 Let f : R2 → R be the function defined as f(x, y) = x2 + y2, if x2 + y2 < 1 1, if x2 + y2 ≥ 1. Show that f is a continuous function. Solution Let A = {(x, y) |x2 + y2 ≤ 1} and B = {(x, y) |x2 + y2 ≥ 1}. Then A andB are closed subsets of R2 and R2 = A∪B. Notice that f |A : A→ R is the function f(x, y) = x2+y2, which is continuous since it is a polynomial. By definition, f |B : B → R is the constant function fB(x, y) = 1, which is also continuous. By Theorem 2.30, the function f : R2 → R is continuous. Given positive integers n and m, there is a natural bijective correspondence between Rn × Rm and Rn+m given by T : Rn × Rm → Rn+m, (x,y) 7→ (x1, . . . , xn, y1, . . . , ym), where x = (x1, . . . , xn) and y = (y1, . . . , ym). Chapter 2. Limits of Multivariable Functions and Continuity 108 Hence, sometimes we will denote a point in Rn+m as (x,y), where x ∈ Rn and y ∈ Rm. By generalized Pythagoras theorem, ∥(x,y)∥2 = ∥x∥2 + ∥y∥2. If A is a subset of Rn, B is a subset of Rm, A × B can be considered as a subset of Rn+m given by A×B = {(x,y) |x ∈ A,y ∈ B} . The following is more general than Proposition 2.16. Proposition 2.31 Let D be a subset of Rn, and let F : D → Rk and G : D → Rl be functions defined on D. Define the function H : D → Rk+l by H(x) = (F(x),G(x)). Then the function H : D → Rk+l is continuous if and only if the functions F : D → Rk and G : D → Rl are continuous. Sketch of Proof This proposition follows immediately from Proposition 2.16, since H(x) = (F1(x), . . . , Fk(x), G1(x), . . . , Gl(x)). For a function defined on a subset of Rn, we can define its graph in the following way. Definition 2.8 The Graph of a Function Let F : D → Rm be a function defined on D ⊂ Rn. The graph of F, denoted by GF, is the subset of Rn+m defined as GF = {(x,y) |x ∈ D,y = F(x)} . Chapter 2. Limits of Multivariable Functions and Continuity 109 Example 2.31 Let D = {(x, y) |x2 + y2 ≤ 1}, and let f : D → R be the function defined as f(x, y) = √ 1− x2 − y2. The graph of f is Gf = { (x, y, z) |x2 + y2 ≤ 1, z = √ 1− x2 − y2 } , which is the upper hemisphere. Figure 2.14: The upper hemisphere is the graph of a function. Notice that if D is a subset of Rn, then the graph of the function F : D → Rm is the image of the function H : D → Rn+m defined as H(x) = (x,F(x)) . From Proposition 2.31, we obtain the following. Corollary 2.32 Let D be a subset of Rn, and let F : D → Rm be a function defined on D. The image of the function H : D → Rn+m, H(x) = (x,F(x)) , is the graph of F. If the function F : D →Rm is continuous, then the function H : D → Rn+m is continuous. Now we consider a special class of functions called Lipschitz functions. Chapter 2. Limits of Multivariable Functions and Continuity 110 Definition 2.9 Let D be a subset of Rn. A function F : D → Rm is Lipschitz provided that there exists a positive constant c such that ∥F(u)− F(v)∥ ≤ c∥u− v∥ for all u,v ∈ D. The constant c is called a Lipschitz constant of the function. If c < 1, then F : D → Rm is called a contraction. The following is easy to establish. Proposition 2.33 Let D be a subset of Rn, and let F : D → Rm be a Lipschitz function. Then F : D → Rm is continuous. Example 2.32 A linear transformation of the form T : Rn → Rn, T(x) = ax, is a Lipschitz function with Lipschitz constant |a|. In fact, we have the following. Theorem 2.34 A linear transformation T : Rn → Rm is a Lipschitz function. Proof Let A be the m× n matrix such that T(x) = Ax. When x is in Rn, ∥T(x)∥2 = (Ax)T (Ax) = xT (ATA)x. The matrix B = ATA is a positive semi-definite n × n symmetric matrix. By Theorem 2.7, xT (ATA)x ≤ λmax∥x∥2, where λmax is the largest eigenvalue of ATA. Chapter 2. Limits of Multivariable Functions and Continuity 111 Therefore, for any x ∈ Rn, ∥T(x)∥ ≤ √ λmax∥x∥. It follows that for any u and v in Rn, ∥T(u)−T(v)∥ = ∥T(u− v)∥ ≤ √ λmax∥u− v∥. Hence, T : Rn → Rm is a Lipschitz mapping with Lipschitz constant√ λmax. Example 2.33 Let T : R2 → R2 be the mapping defined as T(x, y) = (x− 3y, 7x+ 4y). Find the smallest constant c such that ∥T(u)−T(v)∥ ≤ c∥u− v∥ for all u and v in R2. Solution Notice that T(u) = Au, whereA is the 2×2 matrixA = [ 1 −3 7 4 ] . Hence, ∥T(u)∥2 = uTATAu = uTCu, where C = [ 1 7 −3 4 ][ 1 −3 7 4 ] = [ 50 25 25 25 ] = 25 [ 2 1 1 1 ] . For the matrix G = [ 2 1 1 1 ] , the eigenvalues are the solutions of Chapter 2. Limits of Multivariable Functions and Continuity 112 λ2 − 3λ+ 1 = 0, which are λ1 = 3 + √ 5 2 and λ2 = 3− √ 5 2 . Hence, ∥T(u)∥2 ≤ 25(3 + √ 5) 2 ∥u∥2. The smallest c such that ∥T(u)−T(v)∥ ≤ c∥u− v∥ for all u and v in R2 is c = √ 25(3 + √ 5) 2 = 8.0902. Remark 2.2 If A is an m × n matrix, the matrix B = ATA is a positive semi-definite n × n symmetric matrix. Thus, all its eigenvalues are nonnegative. Let λ1, . . . , λn be its eigenvalues with 0 = λn = · · · = λr+1 < λr ≤ λr−1 ≤ · · · ≤ λ1. Then λ1, · · · , λr are the nonzero eigenvalues of ATA. The singular values of A are the numbers σ1, . . . , σr, where σi = √ λi, 1 ≤ i ≤ r. Theorem 2.34 says that σ1 is a Lipschitz constant of the linear transformation T(x) = Ax. At the end of this section, we want to discuss the vector space of m × n matrices Mm,n. There is a natural vector space isomorphism between Mm,n and Rmn, by mapping the matrix A = [aij] to x = (xk), where x(i−1)n+j = aij for 1 ≤ i ≤ m, 1 ≤ j ≤ n. Chapter 2. Limits of Multivariable Functions and Continuity 113 In other words, if a1 = (a11, a12, . . . , a1n), a2 = (a21, a22, . . . , a2n), ... am = (am,1, am,2, . . . , am,n) are the row vectors of A, then A is mapped to the vector (a1, a2, . . . , am) in Rmn. Under this isomorphism, the norm of a matrix A = [aij] is ∥A∥ = √√√√ m∑ i=1 n∑ j=1 a2ij = √√√√ m∑ i=1 ∥ai∥2, and the distance between two matrices A = [aij] and B = [bij] is d(A,B) = ∥A−B∥ = √√√√ n∑ i=1 n∑ j=1 (aij − bij)2. The following proposition can be used to give an alternative proof of Theorem 2.34. Proposition 2.35 Let A be an m× n matrix. If x is in Rn, then ∥Ax∥ ≤ ∥A∥∥x∥. Proof Let a1, . . ., am be the row vectors of A, and let w = Ax. Then wi = ⟨ai,x⟩ for 1 ≤ i ≤ m. By Cauchy-Schwarz inequality, |wi| ≤ ∥ai∥∥x∥ for 1 ≤ i ≤ m. Chapter 2. Limits of Multivariable Functions and Continuity 114 Thus, ∥w∥ = √ w2 1 + w2 2 + · · ·+ w2 m ≤ ∥x∥ √ ∥a1∥2 + ∥a2∥2 + · · ·+ ∥am∥2 = ∥A∥∥x∥. The difference between the proofs of Theorem 2.34 and Proposition 2.35 is that, in the proof of Theorem 2.34, we find that the smallest possible c such that ∥Ax∥ ≤ c∥x∥ for all x in Rn is the largest singular value of the matrix A. In Proposition 2.35, we find a candidate for c, which is the norm of the matrix A, but this is usually not the optimal one. When m = n, we denote the space of n × n matrices Mn,n simply as Mn . The determinant of the matrix A = [aij] ∈ Mn is given by detA = ∑ σ sgn(σ)a1σ(1)a2σ(2) · · · anσ(n). Here the summation is over all the n! permutations σ of the set Sn = {1, 2, . . . , n}, and sgn(σ) is the sign of the permutation σ, which is equal to 1 or −1, depending on whether σ can be written as the product of an even number or an odd number of transpositions. For example, when n = 1, det[a] = a. When n = 2, det [ a11 a12 a21 a22 ] = a11a22 − a12a21. When n = 3, det a11 a12 a13 a21 a22 a23 a31 a32 a33 = a11a22a33 + a12a23a31 + a13a21a32 − a11a23a32 − a13a22a31 − a12a21a33. The determinant function det : Mn → R is a polynomial function on the variables (aij). Hence, it is a continuous function. Recall that a matrix A ∈ Mn is invertible if and only if detA ̸= 0. Let GL (n,R) = {A ∈ Mn | detA ̸= 0} Chapter 2. Limits of Multivariable Functions and Continuity 115 be the subset of Mn that consist of invertible n× n matrices. It is a group under matrix multiplication, called the general linear group. By definition, GL (n,R) = det−1(R \ {0}). Since R \ {0} is an open subset of R, GL (n,R) is an open subset of Mn. This gives the following. Proposition 2.36 Given that A is an invertible n× n matrix, there exists r > 0 such that if B is an n× n matrix with ∥B − A∥ < r, then B is also invertible. Sketch of Proof This is simply a rephrase of the statement that if A is a point in the open set GL (n,R), then there is a ball B(A, r) with center at A that is contained in GL (n,R). Let A be an n× n matrix. For 1 ≤ i, j ≤ n, the (i, j)-minor of A, denoted by Mi,j , is the determinant of the (n− 1)× (n− 1) matrix obtained by deleting the ith-row and j th- column of A. Using the same reasoning as above, we find that the function Mi,j : Mn → R is a continuous function. The (i, j) cofactor Ci,j of A is given by Ci,j = (−1)i+jMi,j . The cofactor matrix of A is CA = [Cij]. Since each of the components is continuous, the function C : Mn → Mn taking A to CA is a continuous function. If A is invertible, A−1 = 1 detA CT A . Since both C : Mn → Mn and det : Mn → R are continuous functions, and det : GL (n,R) → R is a function that is never equal to 0, we obtain the following. Theorem 2.37 The map I : GL (n,R) → GL (n,R) that takes A to A−1 is continuous. Chapter 2. Limits of Multivariable Functions and Continuity 116 Exercises 2.3 Question 1 Let x0 be a point in Rn. Define the function f : Rn → R by f(x) = ∥x− x0∥. Show that f is a continuous function. Question 2 Let O = R3 \ {(0, 0, 0)} and define the function F : O → R2 by F(x, y, z) = ( y x2 + y2 + z2 , z x2 + y2 + z2 ) . Show that F is a continuous function. Question 3 Let f : Rn → R be the function defined as f(x) = 1, if at least one of the xi is rational, 0, otherwise. At which point of Rn is the function f continuous? Question 4 Let f : Rn → R be the function defined as f(x) = x21 + · · ·+ x2n, if at least one of the xi is rational, 0, otherwise. At which point of Rn is the function f continuous? Chapter 2. Limits of Multivariable Functions and Continuity 117 Question 5 Let f : R3 → R be the function defined by f(x, y, z) = sin(x2 + 4y2 + z2) x2 + 4y2 + z2 , if (x, y, z) ̸= (0, 0, 0), a, if (x, y, z) = (0, 0, 0). Show that there exists a value a such that f is a continuous function, and find this value of a. Question 6 Let a and b be positive numbers, and let O be the subset of Rn defined as O = {x ∈ Rn | a < ∥x∥ < b} . Show that O is open. Question 7 Let A be the subset of R2 given by A = {(x, y) | sin(x+ y) + xy > 1} . Show that A is an open set. Question 8 Let A be the subset of R3 given by A = {(x, y, z) |x ≥ 0, y ≤ 1, exy ≤ z} . Show thatA is a closed set. Chapter 2. Limits of Multivariable Functions and Continuity 118 Question 9 A plane in R3 is the set of all points (x, y, z) satisfying an equation of the form ax+ by + cz = d, where (a, b, c) ̸= (0, 0, 0). Show that a plane is a closed subset of R3. Question 10 Define the sets A,B,C and D as follows. (a) A = { (x, y, z) |x2 + 4y2 + 9z2 < 36 } (b) B = { (x, y, z) |x2 + 4y2 + 9z2 ≤ 36 } (c) C = { (x, y, z) | 0 < x2 + 4y2 + 9z2 < 36 } (d) D = { (x, y, z) | 0 < x2 + 4y2 + 9z2 ≤ 36 } For each of these sets, find its interior, exterior and boundary. Question 11 Let a and b be real numbers, and assume that f : Rn → R is a continuous function. Consider the following subsets of Rn. (a) A = {x ∈ Rn | f(x) > a} (b) B = {x ∈ Rn | f(x) ≥ a} (c) C = {x ∈ Rn | f(x) < a} (d) D = {x ∈ Rn | f(x) ≤ a} (e) E = {x ∈ Rn | a < f(x) < b} (f) F = {x ∈ Rn | a ≤ f(x) ≤ b} Show that A,C and E are open sets, while B, D and F are closed sets. Chapter 2. Limits of Multivariable Functions and Continuity 119 Question 12 Let f : R2 → R be the function defined as f(x, y) = x2 + y2, if x2 + y2 < 4 8− x2 − y2, if x2 + y2 ≥ 4. Show that f is a continuous function. Question 13 Show that the distance function on Rn, d : Rn × Rn → R, d(u,v) = ∥u− v∥, is continuous in the following sense. If {uk} and {vk} are sequences in Rn that converges to u and v respectively, then the sequence {d(uk,vk)} converges to d(u,v). Question 14 Let T : R2 → R3 be the mapping T(x, y) = (x+ y, 3x− y, 6x+ 5y). Show that T : R2 → R3 is a Lipschitz mapping, and find the smallest Lipschitz constant for this mapping. Question 15 Given that A is a subset of Rm and B is a subset of Rn, let C = A × B. Then C is a subset of Rm+n. (a) If A is open in Rm and B is open in Rn, show that A × B is open in Rm+n. (b) If A is closed in Rm and B is closed in Rn, show that A× B is closed in Rm+n. Chapter 2. Limits of Multivariable Functions and Continuity 120 Question 16 Let D be a subset of Rn, and let f : D → R be a continuous function defined on D. Let A = D× R and define the function g : A→ R by g(x, y) = y − f(x). Show that g : A→ R is continuous. Question 17 Let U be an open subset of Rn, and let f : U → R be a continuous function defined on U . Show that the sets O1 = {(x, y) |x ∈ U, y < f(x)} , O2 = {(x, y) |x ∈ U, y > f(x)} are open subsets of Rn+1. Question 18 Let C be a closed subset of Rn, and let f : C → R be a continuous function defined on C. Show that the sets A1 = {(x, y) |x ∈ C, y ≤ f(x)} , A2 = {(x, y) |x ∈ C, y ≥ f(x)} are closed subsets of Rn+1. Chapter 2. Limits of Multivariable Functions and Continuity 121 2.4 Uniform Continuity In volume I, we have seen that uniform continuity plays important role in single variable analysis. In this section, we extend this concept to multivariable functions. Definition 2.10 Continuity Let D be a subset of Rn, and let F : D → Rm be a function defined on D. We say that the function F is uniformly continuous provided that for any ε > 0, there exists δ > 0 such that if u and v are points in D and ∥u− v∥ < δ, then ∥F(u)− F(v)∥ < ε. The following two propositions are obvious. Proposition 2.38 A uniformly continuous function is continuous. Proposition 2.39 Given that D is a subset of Rn, and D′ is a subset of D, if the function F : D → Rm is uniformly continuous, then the function F : D′ → Rm is also uniformly continuous. A special class of uniformly continuous functions is the class of Lipschitz functions. Theorem 2.40 Let D be a subset of Rn, and let F : D → Rm be a function defined on D. If F : D → Rm is Lipschitz, then it is uniformly continuous. The proof is straightforward. Remark 2.3 Theorem 2.34 and Theorem 2.40 imply that a linear transformation is uniformly continuous. Chapter 2. Limits of Multivariable Functions and Continuity 122 There is an equivalent definition for uniform continuity in terms of sequences. Theorem 2.41 Let D be a subset of Rn, and let F : D → Rm be a function defined on D. Then the following are equivalent. (i) F : D → Rm is uniformly continuous. Namely, given ε > 0, there exists δ > 0 such that if u and v are points in D and ∥u − v∥ < δ, then ∥F(u)− F(v)∥ < ε. (ii) If {uk} and {vk} are two sequences in D such that lim k→∞ (uk − vk) = 0, then lim k→∞ (F(uk)− F(vk)) = 0. Let us give a proof of this theorem here. Proof Assume that (i) holds, and {uk} and {vk} are two sequences in D such that lim k→∞ (uk − vk) = 0. Given ε > 0, (i) implies that there exists δ > 0 such that if u and v are points in D and ∥u− v∥ < δ, then ∥F(u)− F(v)∥ < ε. Since lim k→∞ (uk − vk) = 0, there is a positive integer K such that for all k ≥ K, ∥uk − vk∥ < δ. It follows that ∥F(uk)− F(vk)∥ < ε for all k ≥ K. Chapter 2. Limits of Multivariable Functions and Continuity 123 This shows that lim k→∞ (F(uk)− F(vk)) = 0, and thus completes the proof of (i) implies (ii). Conversely, assume that (i) does not hold. This means there exists an ε > 0, for all δ > 0, there exist points u and v in D such that ∥u − v∥ < δ and ∥F(u) − F(v)∥ ≥ ε. Thus, for every k ∈ Z+, there exists uk and vk in D such that ∥uk − vk∥ < 1 k , (2.5) and ∥F(uk)−F(vk)∥ ≥ ε. Notice that {uk} and {vk} are sequences in D. Eq. (2.5) implies that lim k→∞ (uk − vk) = 0. Since ∥F(uk)− F(vk)∥ ≥ ε, lim k→∞ (F(uk)− F(vk)) ̸= 0. This shows that if (i) does not hold, then (ii) does not hold. From Theorem 2.41, we can deduce the following. Proposition 2.42 Let D be a subset of Rn, and let F : D → Rm be a function defined on D. Then F : D → Rm is uniformly continuous if and only if each of the component functions Fj = (πj ◦ F) : D → R, 1 ≤ j ≤ m, is uniformly continuous. Let us look at some more examples. Example 2.34 Let D be the open rectangle D = (0, 5)× (0, 7), and consider the function f : D → R defined by f(x, y) = xy. Determine whether f : D → R is uniformly continuous. Chapter 2. Limits of Multivariable Functions and Continuity 124 Solution For any two points u1 = (x1, y1) and u2 = (x2, y2) in D, 0 < x1, x2 < 5 and 0 < y1, y2 < 7. Since f(u1)− f(u2) = x1y1 − x2y2 = x1(y1 − y2) + y2(x1 − x2), we find that |f(u1)− f(u2)| ≤ |x1||y1 − y2|+ |y2||x1 − x2| ≤ 5∥u1 − u2∥+ 7∥u1 − u2∥ = 12∥u1 − u2∥. This shows that f : D → R is a Lipschitz function. Hence, it is uniformly continuous. Example 2.35 Consider the function f : R2 → R defined by f(x, y) = xy. Determine whether f : R2 → R is uniformly continuous. Solution For k ∈ Z+, let uk = ( k + 1 k , k ) , vk = (k, k). Then {uk} and {vk} are sequences of points in R2 and lim k→∞ (uk − vk) = lim k→∞ ( 1 k , 0 ) = (0, 0). However, f(uk)− f(vk) = k ( k + 1 k ) − k2 = 1. Chapter 2. Limits of Multivariable Functions and Continuity 125 Thus, lim k→∞ (f(uk)− f(vk)) = 1 ̸= 0. Therefore, the function f : R2 → R is not uniformly continuous. Example 2.34 and 2.35 show that whether a function is uniformly continuous depends on the domain of the function. Chapter 2. Limits of Multivariable Functions and Continuity 126 Exercises 2.4 Question 1 Let F : R3 → R2 be the function defined as F(x, y, z) = (3x− 2z + 7, x+ y + z − 4). Show that F : R3 → R2 is uniformly continuous. Question 2 Let D = (0, 1)× (0, 2). Consider the function f : D → R defined as f(x, y) = x2 + 3y. Determine whether f is uniformly continuous. Question 3 Let D = (1,∞)× (1,∞). Consider the function f : D → R defined as f(x, y) = √ x+ y. Determine whether f is uniformly continuous. Question 4 Let D = (0, 1)× (0, 2). Consider the function f : D → R defined as f(x, y) = 1√ x+ y . Determine whether f is uniformly continuous. Chapter 2. Limits of Multivariable Functions and Continuity 127 2.5 Contraction Mapping Theorem Among the Lipschitz functions, there is a subset called contractions. Definition 2.11 Contractions Let D be a subset of Rn. A function F : D → Rm is called a contraction if there exists a constant 0 ≤ c < 1 such that ∥F(u)− F(v)∥ ≤ c∥u− v∥ for all u,v ∈ D.In other words, a contraction is a Lipschitz function which has a Lipschitz constant that is less than 1. Example 2.36 Let b be a point in Rn, and let F : Rn → Rn be the function defined as F(x) = cx+ b. The mapping F is a contraction if and only if |c| < 1. The contraction mapping theorem is an important result in analysis. Extended to metric spaces, it is an important tool to prove the existence and uniqueness of solutions of ordinary differential equations. Theorem 2.43 Contraction Mapping Theorem Let D be a closed subset of Rn, and let F : D → D be a contraction. Then F has a unique fixed point. Namely, there is a unique u in D such that F(u) = u. Proof By definition, there is a constant c ∈ [0, 1) such that ∥F(u)− F(v)∥ ≤ c∥u− v∥ for all u,v ∈ D. Chapter 2. Limits of Multivariable Functions and Continuity 128 We start with any point x0 in D and construct the sequence {xk} inductively by xk+1 = F(xk) for all k ≥ 0. Notice that for all k ∈ Z+, ∥xk+1 − xk∥ = ∥F(xk)− F(xk−1)∥ ≤ c∥xk − xk−1∥. By iterating, we find that ∥xk+1 − xk∥ ≤ ck∥x1 − x0∥. Therefore, if l > k ≥ 0, triangle inequality implies that ∥xl − xk∥ ≤ ∥xl − xl−1∥+ · · ·+ ∥xk+1 − xk∥ ≤ (cl−1 + . . .+ ck)∥x1 − x0∥. Since c ∈ [0, 1), cl−1 + . . .+ ck = ck(1 + c+ · · ·+ cl−k−1) < ck 1− c . Therefore, for all l > k ≥ 0, ∥xl − xk∥ < ck 1− c ∥x1 − x0∥. Given ε > 0, there exists a positive integer K such that for all k ≥ K, ck 1− c ∥x1 − x0∥ < ε. This implies that for all l > k ≥ K, ∥xl − xk∥ < ε. Chapter 2. Limits of Multivariable Functions and Continuity 129 In other words, we have shown that {xk} is a Cauchy sequence. Therefore, it converges to a point u in Rn. Since D is closed, u is in D. Since F is continuous, the sequence {F(xk)} converges to F(u). But F(xk) = xk+1. Being a subsequence of {xk}, the sequence {xk+1} converges to u as well. This shows that F(u) = u, which says that u is a fixed point of F. Now if v is another point in D such that F(v) = v, then ∥u− v∥ = ∥F(u)− F(v)∥ ≤ c∥u− v∥. Since c ∈ [0, 1), this can only be true if ∥u − v∥ = 0, which implies that v = u. Hence, the fixed point of F is unique. As an application of the contraction mapping theorem, we prove the following. Theorem 2.44 Let r be a positive number and let G : B(0, r) → Rn be a mapping such that G(0) = 0, and ∥G(u)−G(v)∥ ≤ 1 2 ∥u− v∥ for all u,v ∈ B(0, r). If F : B(0, r) → Rn is the function defined as F(x) = x+G(x), then F is a one-to-one continuous mapping whose image contains the open ball B(0, r/2). Chapter 2. Limits of Multivariable Functions and Continuity 130 Proof By definition, G is a contraction. Hence, it is continuous. Therefore, F : B(0, r) → Rn is also continuous. If F(u) = F(v), then u− v = G(v)−G(u). Therefore, ∥u− v∥ = ∥G(v)−G(u)∥ ≤ 1 2 ∥u− v∥. This implies that ∥u− v∥ = 0, and thus, u = v. Hence, F is one-to-one. Given y ∈ B(0, r/2), let r1 = 2∥y∥. Then r1 < r. Consider the map H : CB(0, r1) → Rn defined as H(x) = y −G(x). For any u and v in CB(0, r1), ∥H(u)−H(v)∥ = ∥G(u)−G(v)∥ ≤ 1 2 ∥u− v∥. Therefore, H is also a contraction. Notice that if x ∈ CB(0, r1), ∥H(x)∥ ≤ ∥y∥+ ∥G(x)−G(0)∥ ≤ r1 2 + 1 2 ∥x∥ ≤ r1 2 + r1 2 = r1. Therefore, H is a contraction that maps the closed set CB(0, r1) into itself. By the contraction mapping theorem, there exists u in CB(0, r1) such that H(u) = u. This gives y −G(u) = u, or equivalently, y = u+G(u) = F(u). In other words, we have shown that there exists u ∈ CB(0, r1) ⊂ B(0, r) such that F(u) = y. This proves that the image of the map F : B(0, r) → Rn contains the open ball B(0, r/2). Chapter 2. Limits of Multivariable Functions and Continuity 131 Exercises 2.5 Question 1 Let Sn = { (x1, . . . , xn, xn+1) ∈ Rn+1 |x21 + · · ·+ x2n + x2n+1 = 1 } be the n-sphere, and let F : Sn → Sn be a mapping such that ∥F(u)− F(v)∥ ≤ 2 3 ∥u− v∥ for all u,v ∈ Sn. Show that there is a unique w ∈ Sn such that F(w) = w. Question 2 Let r be a positive number, and let c be a positive number less than 1. Assume that G : B(0, r) → Rn is a mapping such that G(0) = 0, and ∥G(u)−G(v)∥ ≤ c∥u− v∥ for all u,v ∈ B(0, r). If F : B(0, r) → Rn is the function defined as F(x) = x+G(x), show that F is a one-to-one continuous mapping whose image contains the open ball B(0, ar), where a = 1− c. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 132 Chapter 3 Continuous Functions on Connected Sets and Compact Sets In volume I, we have seen that intermediate value theorem and extreme value theorem play important roles in analysis. In order to extend these two theorems to multivariable functions, we need to consider two topological properties of sets – the connectedness and compactness. 3.1 Path-Connectedness and Intermediate Value Theorem We want to extend the intermediate value theorem to multivariable functions. For this, we need to consider a topological property called connectedness. In this section, we will discuss the topological property called path-connectedness first, which is a more natural concept. Definition 3.1 Path Let S be a subset of Rn, and let u and v be two points in S. A path in S joining u to v is a continuos function γ : [a, b] → S such that γ(a) = u and γ(b) = v. For any real numbers a and b with a < b, the map u : [0, 1] → [a, b] defined by u(t) = a+ t(b− a) is a continuous bijection. Its inverse u−1 : [a, b] → [0, 1] is u−1(t) = t− a b− a , which is also continuous. Hence, in the definition of a path, we can let the domain be any [a, b] with a < b. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 133 Figure 3.1: A path in S joining u to v. Example 3.1 Given a set S and a point x0 in S, the constant function γ : [a, b] → S, γ(t) = x0, is a path in S. If γ : [a, b] → S is a path in S ⊂ Rn, and S ′ is any other subset of Rn that contains the image of γ, then γ is also a path in S ′. Example 3.2 Let R be the rectangle R = [−2, 2]× [−2, 2]. The function γ : [0, 1] → R2, γ(t) = (cos(πt), sin(πt)) is a path in R joining u = (1, 0) to v = (−1, 0). Figure 3.2: The path in Example 3.2. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 134 Example 3.3 Let S be a subset of Rn. If γ : [a, b] → S is a path in S joining u to v, then γ̃ : [−b,−a] → S, γ̃(t) = γ(−t), is a path in S joining v to u. Now we define path-connectedness. Definition 3.2 Path-Connected Let S be a subset of Rn. We say that S is path-connected if any two points u and v in S can be joined by a path in S. It is easy to characterize a path-connected subset of R. In volume I, we have defined the concept of convex sets. A subset S of R is a convex set provided that for any u and v in S and any t ∈ [0, 1], (1−t)u+tv is also in S. This is equivalent to if u and v are points in S with u < v, all the points w satisfying u < w < v is also in S. We have shown that a subset S of R is a convex set if and only if it is an interval. The following theorem characterize a path-connected subset of R. Theorem 3.1 Let S be a subset of R. Then S is path-connected if and only if S is an interval. Proof If S is an interval, then for any u and v in S, and for any t ∈ [0, 1], (1 − t)u + tv is in S. Hence, the function γ : [0, 1] → S, γ(t) = (1 − t)u + tv is a path in S that joins u to v. Conversely, assume that S is a path-connected subset of R. To show that S is an interval, we need to show that for any u and v in S with u < v, any w that is in the interval [u, v] is also in S. Since S is path-connected, there is a path γ : [0, 1] → S such that γ(0) = u and γ(1) = v. Since γ is continuous, and w is in between γ(0) and γ(1), intermediate value theorem implies that there is a c ∈ [0, 1] so that γ(c) = w. Thus, w is in S. To explore path-connected subsets of Rn with n ≥ 2, we first extend the Chapter 3. Continuous Functions on Connected Sets and Compact Sets 135 concept of convex sets to Rn. Given two points u and v in Rn, when t runs through all the points in the interval [0, 1], (1 − t)u + tv describes all the pointson the line segment between u and v. Definition 3.3 Convex Sets Let S be a subset of Rn. We say that S is convex if for any two points u and v in S, the line segment between u and v lies entirely in S. Equivalently, S is convex provided that for any two points u and v in S, the point (1 − t)u+ tv is in S for any t ∈ [0, 1]. Figure 3.3: A is a convex set, B is not. If u = (u1, . . . , un) and v = (v1, . . . , vn) are two points in Rn, the map γ : [0, 1] → Rn, γ(t) = (1− t)u+ tv = ((1− t)u1 + tv1, . . . , (1− t)un + tvn) is a continuous functions, since each of its components is continuous. Thus, we have the following. Theorem 3.2 Let S be a subset of Rn. If S is convex, then it is path-connected. Let us look at some examples of convex sets. Example 3.4 Let I1, . . ., In be intervals in R. Show that the set S = I1 × · · · × In is path-connected. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 136 Solution We claim that S is convex. Then Theorem 3.2 implies that S is path- connected. Given that u = (u1, . . . , un) and v = (v1, . . . , vn) are two points in S, for each 1 ≤ i ≤ n, ui and vi are in Ii. Since Ii is an interval, for any t ∈ [0, 1], (1− t)ui + tvi is in Ii. Hence, (1− t)u+ tv = ((1− t)u1 + tv1, . . . , (1− t)un + tvn) is in S. This shows that S is convex. Special cases of sets of the form S = I1 × · · · × In are open and closed rectangles. Example 3.5 An open rectangle U = (a1, b1)× · · · × (an, bn) and its closure R = [a1, b1]× · · · × [an, bn] are convex sets. Hence, they are path-connected. Example 3.6 Let x0 be a point in Rn, and let r be a positive number. Show that the open ball B(x0, r) and the closed ball CB(x0, r) are path-connected sets. Solution Let u and v be two points inB(x0, r). Then ∥u−x0∥ < r and ∥v−x0∥ < r. For any t ∈ [0, 1], t ≥ 0 and 1− t ≥ 0. By triangle inequality, ∥(1− t)u+ tv − x0∥ ≤ ∥(1− t)(u− x0)∥+ ∥t(v − x0)∥ = (1− t)∥u− x0∥+ t∥v − x0∥ < (1− t)r + tr = r. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 137 This shows that (1− t)u+ tv is in B(x0, r). Hence, B(x0, r) is convex. Replacing < by ≤, one can show that CB(x0, r) is convex. By Theorem 3.2, the open ball B(x0, r) and the closed ball CB(x0, r) are path-connected sets. Not all the path-connected sets are convex. Before we give an example, let us first prove the following useful lemma. Lemma 3.3 Let A and B be path-connected subsets of Rn. If A ∩ B is nonempty, then S = A ∪B is path-connected. Proof Let u and v be two points in S. If both u and v are in the set A, then they can be joined by a path in A, which is also in S. Similarly, if both u and v are in the set B, then they can be joined by a path in S. If u is in A and v is in B, let x0 be any point in A ∩ B. Then u and x0 are both in the path-connected set A, and v and x0 are both in the path-connected set B. Therefore, there exist continuous functions γ1 : [0, 1] → A and γ2 : [1, 2] → B such that γ1(0) = u, γ1(1) = x0, γ2(1) = x0 and γ2(2) = v. Define the function γ : [0, 2] → A ∪B by γ(t) = γ1(t), if 0 ≤ t ≤ 1, γ2(t), if 1 ≤ t ≤ 2. Since [0, 1] and [1, 2] are closed subsets of R, the function γ : [0, 2] → S is continuous. Thus, γ is a path in S from u to v. This proves that S is path-connected. Now we can give an example of a path-connected set that is not convex. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 138 Figure 3.4: If two sets A and B are path-connected and A ∩ B is nonempty, then A ∪B is also path-connected. Example 3.7 Show that the set S = {(x, y) | 0 ≤ x ≤ 1,−2 ≤ y ≤ 2} ∪ { (x, y) | (x− 2)2 + y2 ≤ 1 } is path-connected, but not convex. Solution The set A = {(x, y) | 0 ≤ x ≤ 1,−2 ≤ y ≤ 2} = [0, 1]× [−2, 2] is a closed rectangle. Therefore, it is path- connected. The set B = { (x, y) | (x− 2)2 + y2 ≤ 1 } is a closed ball with center at (2, 0) and radius 1. Hence, it is also path- connected. Since the point x0 = (1, 0) is in both A and B, S = A ∪ B is path-connected. The points u = (1, 2) and v = (2, 1) are in S. Consider the point w = 1 2 u+ 1 2 v = ( 3 2 , 3 2 ) . It is not in S. This shows that S is not convex. Let us now prove the following important theorem which says that continuous Chapter 3. Continuous Functions on Connected Sets and Compact Sets 139 Figure 3.5: The set A ∪B is path-connected but not convex. functions preserve path-connectedness. Theorem 3.4 Let D be a path-connected subset of Rn. If F : D → Rm is a continuous function, then F(D) is path-connected. Proof Let v1 and v2 be two points in F(D). Then there exist u1 and u2 in D such that F(u1) = v1 and F(u2) = v2. Since D is path-connected, there is a continuous function γ : [0, 1] → D such that γ(0) = u1 and γ(1) = u2. The map α = (F ◦ γ) : [0, 1] → F(D) is then a conitnuous map with α(0) = v1 and α(1) = v2. This shows that F(D) is path-connected. From Theorem 3.4, we obtain the following. Theorem 3.5 Intermediate Value Theorem for Path-Connected Sets Let D be a path-connected subset of Rn, and let f : D → R be a function defined on D. If f is continuous, then f(D) is an interval. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 140 Proof By Theorem 3.4, f(D) is a path-connected subset of R. By Theorem 3.1, f(D) is an interval. We can also use Theorem 3.4 to establish more examples of path-connected sets. Let us first look at an example. Example 3.8 Show that the circle S1 = { (x, y) |x2 + y2 = 1 } is path-connected. Solution Define the function f : [0, 2π] → R2 by f(t) = (cos t, sin t). Notice that S1 = f([0, 2π)]. Since each component of f is a continuous function, f is a continuous function. Since [0, 2π] is an interval, it is path- connected. By Theorem 3.4, S1 = f([0, 2π]) is path-connected. A more general theorem is as follows. Theorem 3.6 Let D be a path-connected subset of Rn, and let F : D → Rm be a function defined on D. If F : D → Rm is continuous, then the graph of F, GF = {(x,y) |x ∈ D,y = F(x)} is a path-connected subset of Rn+m. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 141 Proof By Corollary 2.32, the function H : D → Rn+m, H(x) = (x,F(x)), is continuous. Since H(D) = GF, Theorem 3.4 implies that GF is a path- connected subset of Rn+m. Now let us consider spheres, which are boundary of balls. Definition 3.4 The Standard Unit nnn-Sphere SnSnSn A standard unit n-sphere Sn is a subset of Rn+1 consists of all points x = (x1, . . . , xn, xn+1) in Rn+1 satisfying the equation ∥x∥ = 1, namely, x21 + · · ·+ x2n + x2n+1 = 1. The n-sphere Sn is the boundary of the (n + 1) open ball Bn+1 = B(0, 1) with center at the origin and radius 1. Figure 3.6: A sphere. Example 3.9 Show that the standard unit n-sphere Sn is path-connected. Solution Notice that Sn = Sn +∪Sn −, where Sn + and Sn − are respectively the upper and lower hemispheres with xn+1 ≥ 0 and xn+1 ≤ 0 respectively. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 142 If x ∈ Sn +, then xn+1 = √ 1− x21 − . . .− x2n; whereas if x ∈ Sn −, xn+1 = − √ 1− x21 − . . .− x2n. Let CBn = { (x1, . . . , xn) |x21 + · · ·+ x2n ≤ 1 } be the closed ball in Rn with center at the origin and radius 1. Define the functions f± : CBn → R by f±(x1, . . . , xn) = ± √ 1− x21 − . . .− x2n. Notice that Sn + and Sn − are respectively the graphs of f+ and f−. Since they are compositions of the square root function and a polynomial function, which are both continuous, f+ and f− are continuous functions. The closed ball CBn is path-connected. Theorem 3.6 then implies that Sn + and Sn − are path-connected. Since both Sn + and Sn − contain the unit vector e1 in Rn+1, the set Sn + ∩ Sn − is nonempty. By Lemma 3.3, Sn = Sn + ∪ Sn − is path-connected. Remark 3.1 There is an alternative way to prove that the n-sphere Sn is path-connected. Given two distinct points u and v in Sn, they are unit vectors in Rn+1. We want to show that there is a path in Sn joining u to v. Notice that the line segmentL = {(1− t)u+ tv | 0 ≤ t ≤ 1} in Rn+1 contains the origin if and only if u and v are parallel, if and only if v = −u. Thus, we discuss two cases. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 143 Case I: v ̸= −u. In this case, let γ : [0, 1] → Rn+1 be the function defined as γ(t) = (1− t)u+ tv ∥(1− t)u+ tv∥ . Since (1− t)u+ tv ̸= 0 for all 0 ≤ t ≤ 1, γ is a continuous function. It is easy to check that its image lies in Sn. Hence, γ is a path in Sn joining u to v. Case 2: v = −u. In this case, let w be a unit vector orthogonal to u, and let γ : [0, π] → Rn+1 be the function defined as γ(t) = (cos t)u+ (sin t)w. Since sin t and cos t are continuous functions, γ is a continuous function. Since u and w are orthogonal, the generalized Pythagoras theorem implies that ∥γ(t)∥2 = cos2 t∥u∥2 + sin2 t∥w∥2 = cos2 t+ sin2 t = 1. Therefore, the image of γ lies in Sn. It is easy to see that γ(0) = u and γ(π) = −u = v. Hence, γ is a path in Sn joining u to v. Example 3.10 Let f : Sn → R be a continuous function. Show that there is a point u0 on Sn such that f(u0) = f(−u0). Solution The function g : Rn+1 → Rn+1, g(u) = −u is a linear transformation. Hence, it is continuous. Restricted to Sn, g(Sn) = Sn. Thus, the function f1 : S n → R, f1(u) = f(−u), is also continuous. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 144 It follows that the function h : Sn → R defined by h(u) = f(u)− f(−u) is continuous. Notice that h(−u) = f(−u)− f(u) = −h(u). This implies that if the number a is in the range of h, so does the number −a. Since the number 0 is in between a and −a for any a, and Sn is path- connected, intermediate value theorem implies that the number 0 is also in the range of h. This means that there is an u0 on Sn such that h(u0) = 0. Equivalently, f(u0) = f(−u0). Theorem 3.5 says that a continuous function defined on a path-connected set satisfies the intermediate value theorem. We make the following definition. Definition 3.5 Intermediate Value Property Let S be a subset of Rn. We say that S has intermediate value property provided that whenever f : S → R is a continuous function, then f(S) is an interval. Theorem 3.5 says that if S is a path-connected set, then it has intermediate value property. It is natural to ask whether it is true that any set S that has the intermediate value property must be path-connected. Unfortunately, it turns out that the answer is yes only when S is a subset of R. If S is a subset of Rn with n ≥ 2, this is not true. This leads us to define a new property of sets called connectedness in the next section. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 145 Exercises 3.1 Question 1 Is the set A = (−1, 2) ∪ (2, 5] path-connected? Justify your answer. Question 2 Let a and b be positive numbers, and let A be the subset of R2 given by A = { (x, y) ∣∣∣∣ x2a2 + y2 b2 ≤ 1 } . Show that A is convex, and deduce that it is path-connected. Question 3 Let (a, b, c) be a nonzero vector, and let P be the plane in R3 given by P = {(x, y, z) | ax+ by + cz = d} , where d is a constant. Show that P is convex, and deduce that it is path- connected. Question 4 Let S be the subset of R3 given by S = {(x, y, z) |x > 0, y ≤ 1, 2 ≤ z < 7} . Show that S is path-connected. Question 5 Let a, b and c be positive numbers, and let S be the subset of R3 given by S = { (x, y, z) ∣∣∣∣ x2a2 + y2 b2 + z2 c2 = 1 } . Show that S is path-connected. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 146 Question 6 Let u = (3, 0) and let A be the subset of R2 given by A = { (x, y) |x2 + y2 ≤ 1 } . Define the function f : A→ R by f(x) = d(x,u). (a) Find f(x1) and f(x2), where x1 = (1, 0) and x2 = (−1, 0). (b) Use intermediate value theorem to justify that there is a point x0 in A such that d(x0,u) = π. Question 7 Let A and B be subsets of Rn. If A and B are convex, show that A ∩ B is also convex. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 147 3.2 Connectedness and Intermediate Value Property In this section, we study a property of sets which is known as connectedness. Let us first look at the path-connected subsets of R from a different perpective. We have shown in the previous section that a subset of R is path-connected if and only if it is an interval. A set of the form A = (−2, 2] \ {0} = (−2, 0) ∪ (0, 2] is not path-connected, since it contains the points −1 and 1, but it does not contain the point 0 that is in between. Intuitively, there is no way to go from the point −1 to 1 continuously without leaving the set A. Let U = (−∞, 0) and V = (0,∞). Notice that U and V are open subsets of R which both intersect the set A. Moreover, A = (A ∩ U) ∪ (A ∩ V ), or equivalently, A ⊂ U ∪ V. We say that A is separated by the open sets U and V . Definition 3.6 Separation of a Set Let A be a subset of Rn. A separation of A is a pair (U, V ) of subsets of Rn which satisfies the following conditions. (a) U and V are open sets. (b) A ∩ U ̸= ∅ and A ∩ V ̸= ∅. (c) A ⊂ U ∪ V , or equivalently, A is the union of A ∩ U and A ∩ V . (d) A is disjoint from U ∩V , or equivalently, A∩U and A∩V are disjoint. If (U, V ) is a separation of A, we say that A is separated by the open sets U and V , or the open sets U and V separate A. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 148 Example 3.11 Let A = (−2, 0) ∪ (0, 2], and let U = (−∞, 0) and V = (0,∞). Then the open sets U and V separate A. Let U1 = (−3, 0) and V1 = (0, 3). The open sets U1 and V1 also separate A. Now we define connectedness. Definition 3.7 Connected Sets Let A be a subset of Rn. We say that A is connected if there does not exist a pair of open sets U and V that separate A. Example 3.12 Determine whether the set A = {(x, y) | y = 0} ∪ { (x, y) ∣∣∣∣ y = 2 1 + x2 } is connected. Solution Let f : R2 → R be the function defined as f(x, y) = y(x2 + 1). Since f is a polynomial function, it is continuous. The intervals V1 = (−1, 1) and V2 = (1, 3) are open sets in R. Hence, the sets U1 = f−1(V1) and U2 = f−1(V2) are disjoint and they are open in R2. Notice that A ∩ U1 = {(x, y) | y = 0} , A ∩ U2 = { (x, y) ∣∣∣∣ y = 2 1 + x2 } . Thus, A ∩ U1 and A ∩ U2 are nonempty, A ∩ U1 and A ∩ U2 are disjoint, and A is a union of A ∩ U1 and A ∩ U2. This shows that the open sets U1 and U2 separate A. Hence, A is not connected. Now let us explore the relation between path-connected and connected. We Chapter 3. Continuous Functions on Connected Sets and Compact Sets 149 Figure 3.7: The set A defined in Example 3.12 is not connected. first prove the following. Theorem 3.7 Let A be a subset of Rn, and assume that the open sets U and V separate A. Define the function f : A→ R by f(x) = 0, if x ∈ A ∩ U, 1, if x ∈ A ∩ V. Then f is continuous. Notice that the function f is well defined since A ∩ U and A ∩ V are disjoint. Proof Let x0 be a point in A. We want to prove that f is continuous at x0. Since A is contained in U ∪ V , x0 is in U or in V . It suffices to consider the case where x0 is in U . The case where x0 is in V is similar. If x0 is in U , since U is open, there is an r > 0 such that B(x0, r) ⊂ U . If {xk} is a sequence in A that converges x0, there exists a positive integer K such that for all k ≥ K, ∥xk − x0∥ < r. Thus, for all k ≥ K, xk ∈ B(x0, r) ⊂ U , and hence, f(xk) = 0. This proves that the sequence {f(xk)} converges to 0, which is f(x0). Therefore, f is continuous at x0. Now we can prove the theorem which says that a path-connected set is connected. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 150 Theorem 3.8 Let A be a subset of Rn. If A is path-connected, then it is connected. Proof We prove the contrapositive, which says that if A is not connected, then it is not path-connected. If A is not connected, there is a pair of open sets U and V that separate A. By Theorem 3.7, the function f : A→ R defined by f(x)= 0, if x ∈ A ∩ U, 1, if x ∈ A ∩ V is continuous. Since f(A) = {0, 1} is not an interval, by the contrapositive of the intermediate value theorem for path-connected sets, A is not path- connected. Theorem 3.8 provides us a large library of connected sets. Example 3.13 The following sets are path-connected. Hence, they are also connected. 1. A set S in Rn of the form S = I1 × · · · × In, where I1, . . . , In are intervals in R. 2. Open rectangles and closed rectangles. 3. Open balls and closed balls. 4. The n-sphere Sn. The following theorem says that path-connectedness and connectedness are equivalent in R. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 151 Theorem 3.9 Let S be a subset of R. Then the following are equivalent. (a) S is an interval. (b) S is path-connected. (c) S is connected. Proof We have proved (a) ⇐⇒ (b) in the previous section. In particular, (a) implies (b). Theorem 3.8 says that (b) implies (c). Now we only need to prove that (c) implies (a). Assume that (a) is not true. Namely, S is not an interval. Then there are points u and v in S with u < v, such that there is a w ∈ (u, v) that is not in S. Let U = (−∞, w) and V = (w,∞). Then U and V are disjoint open subsets of R. Since w /∈ S, S ⊂ U ∪ V . Since u ∈ S ∩ U and v ∈ S ∩ V , S∩U and S∩V are nonempty. Hence, U and V are open sets that separate S. This shows that S is not connected. Thus, we have proved that if (a) is not true, then (c) is not true. This is equivalent to (c) implies (a). Connectedness is also preserved by continuous functions. Theorem 3.10 Let D be a connected subset of Rn. If F : D → Rm is a continuous function, then F(D) is connected. Proof We prove the contra-positive. Assume that F(D) is not connected. Then there are open sets V1 and V2 in Rm that separate F(D). Let D1 = {x ∈ D |F(x) ∈ V1} , D2 = {x ∈ D |F(x) ∈ V2} . Chapter 3. Continuous Functions on Connected Sets and Compact Sets 152 Since F(D) ∩ V1 and F(D) ∩ V2 are nonempty, D1 and D2 are nonempty. Since F(D) ⊂ V1∪V2, D = D1∪D2. Since V1∩V2 is disjoint from F(D), D1 and D2 are disjoint. However, D1 and D2 are not necessary open sets. We will define two open sets U1 and U2 in Rn such that D1 = D ∩ U1 and D2 = D ∩ U2. Then U1 and U2 are open sets that separate D. For each x0 in D1, F(x0) ∈ V1. Since V1 is open, there exists εx0 > 0 such that the ball B(F(x0), εx0) is contained in V1. By the continuity of F at x0, there exists δx0 > 0 such that for all x in D, if x ∈ B(x0, δx0), then F(x) ∈ B(F(x0), εx0) ⊂ V1. In other words, D ∩B(x0, δx0) ⊂ F−1(V1) = D1. Notice that B(x0, δx0) is an open set. Define U1 = ⋃ x0∈D1 B(x0, δx0). Being a union of open sets, U1 is open. Since D ∩ U1 = ⋃ x0∈D1 (D ∩B(x0, δx0)) ⊂ D1, and D1 = ⋃ x0∈D1 {x0} ⊂ ⋃ x0∈D1 (D ∩B(x0, δx0)) = D ∩ U1, we find that D ∩ U1 = D1. Similarly, define U2 = ⋃ x0∈D2 B(x0, δx0). Then U2 is an open set and D ∩ U2 = D2. This completes the construction of the open sets U1 and U2 that separate D. Thus, D is not connected. From Theorem 3.9 and Theorem 3.10, we also have an intermediate value theorem for connected sets. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 153 Theorem 3.11 Intermediate Value Theorem for Connected Sets Let D be a connected subset of Rn, and let f : D → R be a function defined on D. If f is continuous, then f(D) is an interval. Proof By Theorem 3.10, f(D) is a connected subset of R. By Theorem 3.9, f(D) is an interval. Now we can prove the following. Theorem 3.12 Let S be a subset of Rn. Then S is connected if and only if it has the intermediate value property. Proof If S is connected and f : S → R is continuous, Theorem 3.11 implies that f(S) is an interval. Hence, S has the intermediate value property. If S is not connected, Theorem 3.7 gives a continuous function f : S → R such that f(S) = {0, 1} is not an interval. Thus, S does not have the intermediate value property. To give an example of a connected set that is not path-connected, we need a lemma. Lemma 3.13 Let A be a subset of Rn that is separated by the open sets U and V . If C is a connected subset of A, then C ∩ U = ∅ or C ∩ V = ∅. Proof Since C ⊂ A, C ⊂ U ∪V , and C is disjoint from U ∩V . If C ∩U ̸= ∅ and C ∩ V ̸= ∅, then the open sets U and V also separate C. This contradicts to C is connected. Thus, we must have C ∩ U = ∅ or C ∩ V = ∅. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 154 Theorem 3.14 Let A be a connected subset of Rn. If B is a subset of Rn such that A ⊂ B ⊂ A, then B is also connected. Proof If B is not connected, there exist open sets U and V in Rn that separate A. Since A is connected, Lemma 3.13 says that A ∩ U = ∅ or A ∩ V = ∅. Without loss of generality, assume that A ∩ V = ∅. Then A ⊂ Rn \ V . Thus, Rn \V is a closed set that contains A. This implies that A ⊂ Rn \V . Hence, we also have B ⊂ Rn \ V , which contradicts to the fact that the set B ∩ V is not empty. Example 3.14 The Topologist’s Sine Curve Let S be the subset of R2 given by S = A ∪ L, where A = { (x, y) ∣∣∣∣ 0 < x ≤ 1, y = sin ( 1 x )} , and L = {(x, y) |x = 0,−1 ≤ y ≤ 1} . (a) Show that S ⊂ A. (b) Show that S is connected. (c) Show that S is not path-connected. Solution (a) Since A ⊂ A, it suffices to show that L ⊂ A. Given (0, u) ∈ L, −1 ≤ u ≤ 1. Thus, a = sin−1 u ∈ [−π/2, π/2]. Let xk = 1 a+ 2πk for k ∈ Z+. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 155 Notice that xk ∈ (0, 1] and sin 1 xk = sin a = u. Thus, {(xk, sin(1/xk))} is a sequence of points in A that converges to (0, u). This proves that (0, u) ∈ A. Hence, L ⊂ A. (b) The interval (0, 1] is path-connected and the function f : (0, 1] → R, f(x) = sin ( 1 x ) is continuous. Thus, A = Gf is path-connected, and hence it is connected. Since A ⊂ S ⊂ A, Theorem 3.14 implies that S is connected. (c) If S is path connected, there is a path γ : [0, 1] → S such that γ(0) = (0, 0) and γ(1) = (1, sin 1). Let γ(t) = (γ1(t), γ2(t)). Then γ1 : [0, 1] → R and γ2 : [0, 1] → R are continuous functions. Consider the sequence {xk} with xk = 1 π 2 + πk , k ∈ Z+. Notice that {xk} is a decreasing sequence of points in [0, 1] that converges to 0. For each k ∈ Z+, (xk, yk) ∈ S if and onlly if yk = sin(1/xk). Since γ1 : [0, 1] → R is continuous, γ1(0) = 0 and γ1(1) = 1, intermediate value theorem implies that there exists t1 ∈ [0, 1] such that γ1(t1) = x1. Similarly, there exists t2 ∈ [0, t1] such that γ1(t2) = x2. Continue the argument gives a decreasing sequence {tk} in [0, 1] such that γ1(tk) = xk for all k ∈ Z+. Since the sequence {tk} is bounded below, it converges to some t0 in [0, 1]. Since γ2 : [0, 1] → R is also continuous, the sequence {γ2(tk)} should converge to γ2(t0). Since γ(tk) ∈ S and γ1(tk) = xk, we must have γ2(tk) = yk = (−1)k. But then the sequence {γ2(tk)} is not convergent. This gives a contradiction. Hence, there does not exist a path in S that joins the point (0, 0) to the point (1, sin 1). This proves that S is not path-connected. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 156 Figure 3.8: The topologist’s sine curve. Remark 3.2 Example 3.14 gives a set that is connected but not path-connected. 1. One can in fact show that S = A. 2. To show that A is connected, we can also use the fact that if D is a connected subset of Rn, and F : D → Rm is a continuous function, then the graph of F is connected. The proof of this fact is left as an exercise. At the end of this section, we want to give a sufficient condition for a connected subset of Rn to be path-connected. First we define the meaning of a polygonal path. Definition 3.8 Polygonal Path Let S be a subset of Rn, and let u and v be two points in S. A path γ : [a, b] → S in S that joins u to v is a polygonal path provided that there is a partition P = {t0, t1, . . . , tk} of [a, b] such that for 1 ≤ i ≤ k, γ(t) = xi−1 + t− ti−1 ti − ti−1 (xi − xi−1) , when ti−1 ≤ t ≤ ti. Obviously, we havethe following. Proposition 3.15 If S is a convex subset of Rn, then any two points in S can be joined by a polygonal path in Rn. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 157 Figure 3.9: A polygonal path. If γ1 : [a, c] → A is a polygonal path in A that joins u to w, γ2 : [c, b] → B is a polygonal path in B that joins w to v, then the path γ : [a, b] → A ∪B, γ(t) = γ1(t), if a ≤ t ≤ c, γ2(t), if c ≤ t ≤ b, is a polygonal path in A ∪ B that joins u to v. Using this, we can prove the following useful theorem. Theorem 3.16 Let S be a connected subset of Rn. If S is an open set, then any two points in S can be joined by a polygonal path in S. In particular, S is path connected. Proof We use proof by contradiction. Supposed that S is open but there are two points u and v in S that cannot be joined by a polygonal path in S. Consider the sets U = {x ∈ S | there is a polygonal path in S that joins u to x} , V = {x ∈ S | there is no polygonal path in S that joins u to x} . Obviously u is in U and v is in V , and S = U ∪ V . We claim that both U and V are open sets. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 158 If x is in the open set S, there is an r > 0 such that B(x, r) ⊂ S. Since B(x, r) is convex, any point w inB(x, r) can be joined by a polygonal path in B(x, r) to x. Hence, if x is in U , w is in U . If x is in V , w is in V . This shows that if x is in U , then B(x, r) ⊂ U . If x is in V , then B(x, r) ⊂ V . Hence, U and V are open sets. Since U and V are nonempty open sets and S = U ∪ V , they form a separation of S. This contradicts to S is connected. Hence, any two points in S can be joined by a polygonal path in S. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 159 Exercises 3.2 Question 1 Determine whether the set A = {(x, y) | y = 0} ∪ { (x, y) ∣∣∣∣x > 0, y = 2 x } is connected. Question 2 Let D be a connected subset of Rn, and let F : D → Rm be a function defined on D. If F : D → Rm is continuous, show that the graph of F, GF = {(x,y) |x ∈ D,y = F(x)} is also connected. Question 3 Determine whether the set A = {(x, y) | 0 ≤ x < 1,−1 < y ≤ 1} ∪ {(1, 0), (1, 1)} is connected. Question 4 Assume that A is a connected subset of R3 that contains the points u = (0, 2, 0) and v = (2,−6, 3). (a) Show that there is a point x = (x, y, z) inA that lies in the plane y = 0. (b) Show that there exists a point x = (x, y, z) in A that lies on the sphere x2 + y2 + z2 = 25. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 160 Question 5 Let A and B be connected subsets of Rn. If A ∩ B is nonempty, show that S = A ∪B is connected. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 161 3.3 Sequential Compactness and Compactness In volume I, we have seen that sequential compactness plays important role in extreme value theorem. In this section, we extend the definition of sequential compactness to subsets of Rn. We will also consider another concept called compactness. Let us start with the definition of bounded sets. Definition 3.9 Bounded Sets Let S be a subset of Rn. We say that S is bounded if there exists a positive number M such that ∥x∥ ≤M for all x ∈ S. Remark 3.3 Let S be a subset of Rn. If S is bounded and S ′ is a subset of S, then it is obvious that S ′ is also bounded. Example 3.15 Show that a ball B(x0, r) in Rn is bounded. Solution Given x ∈ B(x0, r), ∥x− x0∥ < r. Thus, ∥x∥ ≤ ∥x0∥+ ∥x− x0∥ < ∥x0∥+ r. Since M = ∥x0∥ + r is a constant independent of the points in the ball B(x0, r), the ball B(x0, r) is bounded. Notice that if x1 and x2 are points in Rn, and S is a set in Rn such that ∥x− x1∥ < r1 for all x ∈ S, then ∥x− x2∥ < r1 + ∥x2 − x1∥ for all x ∈ S. Thus, we have the following. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 162 Proposition 3.17 Let S be a subset in Rn. The following are equivalent. (a) S is bounded. (b) There is a point x0 in Rn and a positive constant M such that ∥x− x0∥ ≤M for all x ∈ S. (c) For any x0 in Rn, there is a positive constant M such that ∥x− x0∥ ≤M for all x ∈ S. Figure 3.10: The set S is bounded. We say that a sequence {xk} is bounded if the set {xk | k ∈ Z+} is bounded. The following is a standard theorem about convergent sequences. Proposition 3.18 If {xk} is a sequence in Rn that is convergent, then it is bounded. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 163 Proof Assume that the sequence {xk} converges to the point x0. Then there is a positive integer K such that ∥xk − x0∥ < 1 for all k ≥ K. Let M = max{∥xk − x0∥ | 1 ≤ k ≤ K − 1}+ 1. Then M is finite and ∥xk − x0∥ ≤M for all k ∈ Z+. Hence, the sequence {xk} is bounded. Figure 3.11: A convergent sequence is bounded. Let us now define the diameter of a bounded set. If S is a subset of Rn that is bounded, there is a positive number M such that ∥x∥ ≤M for all x ∈ S. It follows from triangle inequality that for any u and v in S, ∥u− v∥ ≤ ∥u∥+ ∥v∥ ≤ 2M. Thus, the set DS = {d(u,v) |u,v ∈ S} = {∥u− v∥ |u,v ∈ S} (3.1) Chapter 3. Continuous Functions on Connected Sets and Compact Sets 164 is a set of nonnegative real numbers that is bounded above. In fact, for any subset S of Rn, one can define the set of real numbers DS by (3.1). Then S is a bounded set if and only if the set DS is bounded above. Definition 3.10 Diameter of a Bounded Set Let S be a bounded subset of Rn. The diameter of S, denoted by diamS, is defined as diamS = sup {d(u,v) |u,v ∈ S} = sup {∥u− v∥ |u,v ∈ S} . Example 3.16 Consider the rectangle R = [a1, b1] × · · · × [an, bn]. If u and v are two points in R, for each 1 ≤ i ≤ n, ui, vi ∈ [ai, bi]. Thus, |ui − vi| ≤ bi − ai. It follows that ∥u− v∥ ≤ √ (b1 − a1)2 + · · ·+ (bn − an)2. If u0 = a = (a1, . . . , an) and v0 = b = (b1, . . . , bn), then u0 and v0 are in R, and ∥u0 − v0∥ = √ (b1 − a1)2 + · · ·+ (bn − an)2. This shows that the diameter of R is diamR = ∥b− a∥ = √ (b1 − a1)2 + · · ·+ (bn − an)2. Figure 3.12: The diameter of a rectangle. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 165 Intuitively, the diameter of the open rectangle U = (a1, b1)× · · · × (an, bn) is also equal to d = √ (b1 − a1)2 + · · ·+ (bn − an)2. However, the points a = (a1, . . . , an) and b = (b1, . . . , bn) are not in U . There does not exist two points in U whose distance is d, but there are sequences of points {uk} and {vk} such that their distances {∥uk−vk∥} approach d as k → ∞. We will formulate this as a more general theorem. Theorem 3.19 Let S be a subset of Rn. If S is bounded, then its closure S is also bounded. Moreover, diamS = diamS. Proof If u and v are two points in S, there exist sequences {uk} and {vk} in S that converge respectively to u and v. Then d(u,v) = lim k→∞ d(uk,vk). (3.2) For each k ∈ Z+, since uk and vk are in S, d(uk,vk) ≤ diamS. Eq. (3.2) implies that d(u,v) ≤ diamS. Since this is true for any u and v in S, S is bounded and diamS ≤ diamS. Since S ⊂ S, we also have diamS ≤ diamS. We conclude that diamS = diamS. The following example justifies that the diameter of a ball of radius r is indeed 2r. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 166 Example 3.17 Find the diameter of the open ball B(x0, r) in Rn. Solution By Theorem 3.19, the diameter of the open ball B(x0, r) is the same as the diameter of its closure, the closed ball CB(x0, r). Given u and v in CB(x0, r), ∥u− x0∥ ≤ r and ∥v − x0∥ ≤ r. Therefore, ∥u− v∥ ≤ ∥u− x0∥+ ∥v − x0∥ ≤ 2r. This shows that diamCB(x0, r) ≤ 2r. The points u0 = x0 + re1 and v0 = x0 − re1 are in the closed ball CB(x0, r). Since ∥u0 − v0∥ = ∥2re1∥ = 2r, diamCB(x0, r) ≥ 2r. Therefore, the diameter of the closed ballCB(x0, r) is exactly 2r. By Theorem 3.19, the diameter of the open ball B(x0, r) is also 2r. Figure 3.13: The diameter of a ball. In volume I, we have shown that a bounded sequence in R has a convergent subsequence.This is achieved by using the monotone convergence theorem, which says that a bounded monotone sequence in R is convergent. For points in Rn with n ≥ 2, we cannot apply monotone convergence theorem, as we cannot define a simple order on the points in Rn when n ≥ 2. Nevertheless, we can use the result of n = 1 and the componentwise convergence theorem to show that a bounded sequence in Rn has a convergent subsequence. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 167 Theorem 3.20 Let {uk} be a sequence in Rn. If {uk} is bounded, then there is a subsequence that is convergent. Sketch of Proof The n = 1 case is already established in volume I. Here we prove the n = 2 case. The n ≥ 3 case can be proved by induction using the same reasoning. For k ∈ Z+, let uk = (xk, yk). Since |xk| ≤ ∥uk∥ and |yk| ≤ ∥uk∥, the sequences {xk} and {yk} are bounded sequences. Thus, there is a subsequence {xkj}∞j=1 of {xk}∞k=1 that converges to a point x0 in R. Consider the subsequence {ykj}∞j=1 of the sequence {yk}∞k=1. It is also bounded. Hence, there is a subsequence {ykjl} ∞ l=1 that converges to a point y0 in R. Notice that the subsequence {xkjl} ∞ l=1 of {xk}∞k=1 is also a subsequence of {xkj}∞j=1. Hence, it also converges to x0. By componentwise convergence theorem, {ukjl }∞l=1 is a subsequence of {uk}∞k=1 that converges to (x0, y0). This proves the theorem when n = 2. Now we study the concept of sequential compactness. It is the same as the n = 1 case. Definition 3.11 Sequentially Compact Let S be a subset of Rn. We say that S is sequentially compact provided that every sequence in S has a subsequence that converges to a point in S. In volume I, we proved the Bolzano-Weierstrass theorem, which says that a subset of R is sequentially compact if and only if it is closed and bounded. In fact, the same is true for the n ≥ 2 case. Let us first look at some examples. Example 3.18 Show that the set A = {(x, y) |x2 + y2 < 1} is not sequentially compact. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 168 Solution For k ∈ Z+, let uk = ( k k + 1 , 0 ) . Then {uk} is a sequence in A that converges to the point u0 = (1, 0) that is not in A. Thus, every subsequence of {uk} converges to the point u0, which is not in A. This means the sequence {uk} in A does not have a subsequence that converges to a point in A. Hence, A is not sequentially compact. Note that the set A in Example 3.18 is not closed. Example 3.19 Show that the set C = {(x, y) | 1 ≤ x ≤ 3, y ≥ 0} is not sequentially compact. Solution For k ∈ Z+, let uk = (2, k). Then {uk} is a sequence in C. If {ukj}∞j=1 is a subsequence of {uk}, then k1, k2, k3, . . . is a strictly increasing sequence of positive integers. Therefore kj ≥ j for all j ∈ Z+. It follows that ∥ukj∥ = ∥(2, kj)∥ ≥ kj ≥ j for all j ∈ Z+. Hence, the subsequence {ukj} is not bounded. Therefore, it is not convergent. This means that the sequence {uk} in C does not have a convergent subsequence. Therefore, C is not sequentially compact. Note that the set C in Example 3.19 is not bounded. Now we prove the main theorem. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 169 Theorem 3.21 Bolzano-Weierstrass Theorem Let S be a subset of Rn. The following are equivalent. (a) S is closed and bounded. (b) S is sequentially compact. Proof First assume that S is closed and bounded. Let {xk} be a sequence in S. Then {xk} is also bounded. By Theorem 3.20, there is subsequence {xkj} that converges to some x0. Since S is closed, we must have x0 is in S. This proves that every sequence in S has a subsequence that converges to a point in S. Hence, S is sequentially compact. This completes the proof of (a) implies (b). To prove that (b) implies (a), it suffices to show that if S is not closed or S is not bounded, then S is not sequentially compact. If S is not closed, there is a sequence {xk} in S that converges to a point x0, but x0 is not in S. Then every subsequence of {xk} converges to the point x0, which is not in S. Thus, {xk} is a sequence in S that does not have any subsequence that converges to a point in S. This shows that S is not sequentially compact. If S is not bounded, for each positive integer k, there is a point xk in S such that ∥xk∥ ≥ k. If {xkj}∞j=1 is a subsequence of {xk}, then k1, k2, k3, . . . is a strictly increasing sequence of positive integers. Therefore kj ≥ j for all j ∈ Z+. It follows that ∥xkj∥ ≥ kj ≥ j for al j ∈ Z+. Hence, the subsequence {xkj} is not bounded. Therefore, it is not convergent. This means that the sequence {xk} in S does not have a convergent subsequence. Therefore, S is not sequentially compact. Corollary 3.22 A closed rectangle R = [a1, b1] × · · · × [an, bn] in Rn is sequentially compact. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 170 Proof We have shown in Chapter 1 that R is closed. Example 3.16 shows that R is bounded. Thus, R is sequentially compact. An interesting consequence of Theorem 3.19 is the following. Corollary 3.23 If S be a bounded subset of Rn, then its closure S is sequentially compact. Example 3.20 Determine whether the following subsets of R3 is sequentially compact. (a) A = {(x, y, z) |xyz = 1}. (b) B = {(x, y, z) |x2 + 4y2 + 9z2 ≤ 36}. (c) C = {(x, y, z) | 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, 0 < xyz ≤ 4}. Solution (a) For any k ∈ Z+, let uk = ( k, 1 k , 1 ) . Then {uk} is a sequence in A, and ∥uk∥ ≥ k. Therefore, A is not bounded. Hence, A is not sequentially compact. (b) For any u = (x, y, z) ∈ B, ∥u∥2 = x2 + y2 + z2 ≤ x2 + 4y2 + 9z2 ≤ 36. Hence, B is bounded. The function f : R3 → R, f(x, y, z) = x2 + 4y2 + 9z2 is a polynomial. Hence, it is continuous. Since the set I = (−∞, 36] is closed in R, and B = f−1(I), B is closed in R3. Since B is closed and bounded, it is sequentially compact. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 171 (c) For any k ∈ Z+, let uk = ( 1, 1, 1 k ) . Then {uk} is a sequence of points in C that converges to the point u0 = (1, 1, 0), which is not in C. Thus, C is not closed, and so C is not sequentially compact. The following theorem asserts that continuous functions preserve sequential compctness. Theorem 3.24 Let D be a sequentially compact subset of Rn. If the function F : D → Rm is continuous, then F(D) is a sequentially compact subset of Rm. The proof of this theorem is identical to the n = 1 case. Proof Let {yk} be a sequence in F(D). For each k ∈ Z+, there exists xk ∈ D such that F(xk) = yk. Since D is sequentially compact, there is a subsequence {xkj} of {xk} that converges to a point x0 in D. Since F is continuous, the sequence {F(xkj)} converges to F(x0). Note that F(x0) is in F(D). In other words, {ykj} is a subsequence of the sequence {yk} that converges to F(x0) in F(D). This shows that every sequence in F(D) has a subsequence that converges to a point in F(D). Thus, F(D) is a sequentially compact subset of Rm. We are going to discuss important consequences of Theorem 3.24 in the coming section. For the rest of this section, we introduce the concept of compactness, which plays a central role in modern analysis. We start with the definition of an open covering. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 172 Definition 3.12 Open Covering Let S be a subset of Rn, and let A = {Uα |α ∈ J} be a collection of open sets in Rn indexed by the set J . We say that A is an open covering of S provided that S ⊂ ⋃ α∈J Uα. Example 3.21 For each k ∈ Z+, let Uk = (1/k, 1). Then Uk is an open set in R and ∞⋃ k=1 Uk = (0, 1). Hence, A = {Uk | k ∈ Z+} is an open covering of the set S = (0, 1). Remark 3.4 If A = {Uα |α ∈ J} is an open covering of S and S ′ is a subset of S, then A = {Uα |α ∈ J} is also an open covering of S ′. Example 3.22 For each k ∈ Z+, let Uk = B(0, k) be the ball in Rn centered at the origin and having radius k. Then ∞⋃ k=1 Uk = Rn. Thus, A = {Uk | k ∈ Z+} is an open covering of any subset S ofRn. Definition 3.13 Subcover Let S be a subset of Rn, and let A = {Uα |α ∈ J} be an open covering of S. A subcover is a subcollection of A which is also a covering of S. A finite subcover is a subcover that contains only finitely many elements. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 173 Example 3.23 For each k ∈ Z, let Uk = (k, k + 2). Then ∞⋃ k=−∞ Uk = R. Thus, A = {Uk | k ∈ Z} is an open covering of the set S = [−3, 4). There is a finite subcover of S given by B = {U−4, U−3, U−2, U−1, U0, U1, U2}. Definition 3.14 Compact Sets Let S be a subset of Rn. We say that S is compact provided that every open covering of S has a finite subcover. Namely, if A = {Uα |α ∈ J} is an open covering of S, then there exist α1, . . . , αk ∈ J such that S ⊂ k⋃ j=1 Uαj . Example 3.24 The subset S = (0, 1) of R is not compact. For k ∈ Z+, let Uk = (1/k, 1). Example 3.21 says that A = {Uk | k ∈ Z+} is an open covering of the set S. We claim that there is no finite subcollection of A that covers S. Assume to the contrary that there exists a finite subcollection of A that covers S. Then there are positive integers k1, . . . , km such that (0, 1) ⊂ m⋃ j=1 Ukj = m⋃ j=1 ( 1 kj , 1 ) . Notice that if ki ≤ kj , then Uki ⊂ Ukj . Thus, if K = max{k1, . . . , km}, then m⋃ j=1 Ukj = UK = ( 1 K , 1 ) , and so S = (0, 1) is not contained inUK . This gives a contradiction. Hence, S is not compact. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 174 Example 3.25 As a subset of itself, Rn is not compact. For k ∈ Z+, let Uk = B(0, k) be the ball in Rn centered at the origin and having radius k. Example 3.22 says that A = {Uk | k ∈ Z+} is an open covering of Rn. We claim that there is no finite subcover. Assume to the contrary that there is a finite subcover. Then there exist positive integers k1, . . . , km such that Rn = m⋃ j=1 Uk. Notice that if ki ≤ kj , then Uki ⊂ Ukj . Thus, if K = max{k1, . . . , km}, then m⋃ j=1 Ukj = UK = B(0, K). Obviously, B(0, K) is not equal to Rn. This gives a contradiction. Hence, Rn is not compact. Our goal is to prove the Heine-Borel theorem, which says that a subset of Rn is compact if and only if it is closed and bounded. We first prove the easier direction. Theorem 3.25 Let S be a subset of Rn. If S is compact, then it is closed and bounded. Proof We show that if S is compact, then it is bounded; and if S is compact, then it is closed. First we prove that if S is compact, then it is bounded. For k ∈ Z+, let Uk = B(0, k) be the ball in Rn centered at the origin and having radius k. Example 3.22 says that A = {Uk | k ∈ Z+} is an open covering of S. Since S is compact, there exist positive integers k1, . . . , km such that S ⊂ m⋃ j=1 Ukj = UK = B(0, K), Chapter 3. Continuous Functions on Connected Sets and Compact Sets 175 where K = max{k1, . . . , km}. This shows that ∥x∥ ≤ K for all x ∈ S. Hence, S is bounded. Now we prove that if S is compact, then it is closed. For this, it suffices to show that S ⊂ S, or equivalently, any point that is not in S is not in S. Assume that x0 is not in S. For each k ∈ Z+, let Vk = extB(x0, 1/k) = { x ∈ Rn ∣∣∣∣ ∥x− x0∥ > 1 k } . Then Vk is open in Rn. If x is a point in Rn and x ̸= x0, then r = ∥x − x0∥ > 0. There is a k ∈ Z+ such that 1/k < r. Then x is in Vk. This shows that ∞⋃ k=1 Vk = Rn \ {x0}. Therefore, A = {Vk | k ∈ Z+} is an open covering of S. Since S is compact, there is a finite subcover. Namely, there exist positive integers k1, . . . , km such that S ⊂ m⋃ j=1 Vkj = VK , where K = max{k1, . . . , km}. Since B(x0, 1/K) is disjoint from VK , it does not contain any point of S. This shows that x0 is not in S, and thus the proof is completed. Example 3.26 The set A = {(x, y, z) |xyz = 1} in Example 3.20 is not compact because it is not bounded. The set C = {(x, y, z) | 1 ≤ x ≤ 2, 1 ≤ y ≤ 3, 0 < xyz ≤ 4} is not compact because it is not closed. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 176 We are now left to show that a closed and bounded subset of Rn is compact. We start by proving a special case. Theorem 3.26 A closed rectangle R = [a1, b1]× · · · × [an, bn] in Rn is compact. Proof We will prove by contradiction. Assume that R is not compact, and we show that this will lead to a contradiction. The idea is to use the bisection method. If R is not compact, there is an open covering A = {Uα |α ∈ J} of R which does not have a finite subcover. Let R1 = R, and let d1 = diamR1. For 1 ≤ i ≤ n, let ai,1 = ai and bi,1 = bi, and letmi,1 to be the midpoint of the interval [ai,1, bi,1]. The hyperplanes xi = mi,1, 1 ≤ i ≤ n, divides the rectangle R1 into 2n subrectangles. Notice that A is also an open covering of each of these subrectangles. If each of these subrectangles can be covered by a finite subcollection of open sets in A , then R also can be covered by a finite subcollection of open sets in A . Since we assume R cannot be covered by any finite subcollection of open sets in A , there is at least one of the 2n subrectangles which cannot be covered by any finite subcollection of open sets in A . Choose one of these, and denote it by R2. Define ai,2, bi,2 for 1 ≤ i ≤ n so that R2 = [a1,2, b1,2]× · · · × [an,2, bn,2]. Note that bi,2 − ai,2 = bi,1 − ai,1 2 for 1 ≤ i ≤ n. Therefore, d2 = diamR2 = d1/2. We continue this bisection process to obtain the rectangles R1, R2, · · · , so that Rk+1 ⊂ Rk for all k ∈ Z+, and Rk cannot be covered by any finite subcollections of A . Chapter 3. Continuous Functions on Connected Sets and Compact Sets 177 Figure 3.14: Bisection method. Define ai,k, bi,k for 1 ≤ i ≤ n so that Rk = [a1,k, b1,k]× · · · × [an,k, bn,k]. Then for all k ∈ Z+, bi,k+1 − ai,k+1 = bi,k − ai,k 2 for 1 ≤ i ≤ n. It follows that dk+1 = diamRk+1 = dk/2. For any 1 ≤ i ≤ n, {ai,k}∞k=1 is an increasing sequence that is bounded above by bi, and {bi,k}∞k=1 is a decreasing sequence that is bounded below by ai. By monotone convergence theorem, the sequence {ai,k}∞k=1 converges to ai,0 = sup k∈Z+ ai,k; while the sequence {bi,k}∞k=1 converges to bi,0 = inf k∈Z+ bi,k. Since bi,k − ai,k = bi − ai 2k−1 for all k ∈ Z+, we find that ai,0 = bi,0. Let ci = ai,0 = bi,0. Then ai,k ≤ ci ≤ bi,k for all 1 ≤ i ≤ n and all k ∈ Z+. Thus, c = (c1, . . . , cn) is a point in Rk for all k ∈ Z+. By assumption that A is an open covering of R = R1, there exists β ∈ J such that c ∈ Uβ . Since Uβ is an open set, there is an r > 0 such that B(c, r) ⊂ Uβ . Since dk = diamRk = d1 2k−1 for all k ∈ Z+, Chapter 3. Continuous Functions on Connected Sets and Compact Sets 178 we find that lim k→∞ dk = 0. Hence, there is a positive integer K such that dK < r. If x ∈ RK , then ∥x− c∥ ≤ diamRK = dK < r. This implies that x is in B(c, r). Thus, we have shown that RK ⊂ B(c, r). Therefore, RK is contained in the single element Uβ of A , which contradicts to RK cannot be covered by any finite subcollection of open sets in A . We conclude that R must be compact. Now we can prove the Heine-Borel theorem. Theorem 3.27 Heine-Borel Theorem Let S be a subset of Rn. Then S is compact if and only if it is closed and bounded. Proof We have shown in Theorem 3.25 that if S is compact, then it must be closed and bounded. Now assume that S is closed and bounded, and let A = {Uα |α ∈ J} be an open covering of S. Since S is bounded, there exists a positive number M such that ∥x∥ ≤M for all x ∈ S. Thus, if x = (x1, . . . , xn) is in S, then for all 1 ≤ i ≤ n, |xi| ≤ ∥x∥ ≤ M . This implies that S is contained in the closed rectangle R = [−M,M ]× · · · × [−M,M ]. Let V = Rn\S. Since S is closed, V is an open set. Then à = A ∪{V } is an open covering of Rn, and hence it is an open covering ofR. By Theorem 3.26, R is compact. Thus, there exists B̃ ⊂ à which is a finite subcover of R. Then B = B̃ \{V } is a finite subcollection of A that covers S. This proves that S is compact. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 179Example 3.27 We have shown in Example 3.20 that the set B = { (x, y, z) |x2 + 4y2 + 9z2 ≤ 36 } is closed and bounded. Hence, it is compact. Now we can conclude our main theorem from the Bolzano-Weierstrass theorem and the Heine-Borel theorem. Theorem 3.28 Let S be a subset of Rn. Then the following are equivalent. (a) S is sequentially compact. (b) S is closed and bounded. (c) S is compact. Remark 3.5 Henceforth, when we say a subset S of Rn is compact, we mean it is a closed and bounded set, and it is sequentially compact. By Theorem 3.19, a subset S of Rn has compact closure if and only if it is a bounded set. Finally, we can conclude the following, which says that continuous functions preserve compactness. Theorem 3.29 Let D be a compact subset of Rn. If the function F : D → Rm is continuous, then F(D) is a compact subset of Rm. Proof Since D is compact, it is sequentially compact. By Theorem 3.24, F(D) is a sequentially compact subset of Rm. Hence, F(D) is a compact subset of Rm. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 180 Exercises 3.3 Question 1 Determine whether the following subsets of R2 is sequentially compact. (a) A = {(x, y) |x2 + y2 = 9}. (b) B = {(x, y) | 0 < x2 + 4y2 ≤ 36}. (c) C = {(x, y) |x ≥ 0, 0 ≤ y ≤ x2}. Question 2 Determine whether the following subsets of R3 is compact. (a) A = {(x, y, z) | 1 ≤ x ≤ 2}. (b) B = {(x, y, z) | |x|+ |y|+ |z| ≤ 10}. (c) C = {(x, y, z) | 4 ≤ x2 + y2 + z2 ≤ 9}. Question 3 Given that A is a compact subset of Rn and B is a subset of A, show that B is compact if and only if it is closed. Question 4 If S1, . . . , Sk are compact subsets of Rn, show that S = S1 ∪ · · · ∪ Sn is also compact. Question 5 If A is a compact subset of Rm, B is a compact subset of Rn, show that A×B is a compact subset of Rm+n. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 181 3.4 Applications of Compactness In this section, we consider the applications of compactness. We are going to use repeatedly the fact that a subset S of Rn is compact if and only if it is closed and bounded, if and only if it is sequentially compact. 3.4.1 The Extreme Value Theorem First we define bounded functions. Definition 3.15 Bounded Functions Let D be a subset of Rn, and let F : D → Rm be a function defined on D. We say that the function F is bounded if the set F(D) is a bounded subset of Rm. In other words, the function F : D → Rm is bounded if there is positive number M such that ∥F(x)∥ ≤M for all x ∈ D. Example 3.28 Let D = {(x, y, z) | 0 < x2 + y2 + z2 < 4}, and let F : D → R2 be the function defined as F(x, y, z) = ( 1 x2 + y2 + z2 , x+ y + z ) . For k ∈ Z+, the point uk = (1/k, 0, 0) is in D and F(uk) = ( k2, 1 k ) . Thus, ∥F(uk)∥ ≥ k2. This shows that F is not bounded, even though D is a bounded set. Theorem 3.24 gives the following. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 182 Theorem 3.30 Let D be a compact subset of Rn. If the function F : D → Rm is continuous, then it is bounded. Proof By Theorem 3.29, F(D) is compact. Hence, it is bounded. Example 3.29 Let D = {(x, y, z) | 1 < x2 + y2 + z2 < 4}, and let F : D → R2 be the function defined as F(x, y, z) = ( 1 x2 + y2 + z2 , x+ y + z ) . Show that F : D → R2 is a bounded function. Solution Notice that the set D is not closed. Therefore, we cannot apply Theorem 3.30 directly. Consider the set U = {(x, y, z) | 1 ≤ x2 + y2 + z2 ≤ 4}. For any u = (x, y, z) in U , ∥u∥ ≤ 2. Hence, U is bounded. The function f : R3 → R defined as f(x, y, z) = x2 + y2 + z2 is continuous, and U = f−1([1, 4]). Since [1, 4] is closed in R, U is closed in R3. Since f(x, y, z) ̸= 0 on U , F1(x, y, z) = 1 x2 + y2 + z2 is continuous on U . Being a polynomial function, F2(x, y, z) = x + y + z is continuous. Thus, F : U → R2 is continuous. Since U is closed and bounded, Theorem 3.30 implies that F : U → R2 is bounded. Since D ⊂ U , F : D → R2 is also a bounded function. Recall that if S is a subset of R, S has maximum value if and only if S is bounded above and supS is in S; while S has minimum value if and only if S is bounded below and inf S is in S. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 183 Definition 3.16 Extremizer and Extreme Values Let D be a subset of Rn, and let f : D → R be a function defined on D. 1. The function f has maximum value if there is a point x0 in D such that f(x0) ≥ f(x) for all x ∈ D. The point x0 is called a maximizer of f ; and f(x0) is the maximum value of f . 2. The function f has minimum value if there is a point x0 in D such that f(x0) ≤ f(x) for all x ∈ D. The point x0 is called a minimizer of f ; and f(x0) is the minimum value of f . We have proved in volume I that a sequentially compact subset of R has a maximum value and a minimum value. This gives us the extreme value theorem. Theorem 3.31 Extreme Value Theorem Let D be a compact subset of Rn. If the function f : D → R is continuous, then it has a maximum value and a minimum value. Proof By Theorem 3.24, f(D) is a sequentially compact subset of R. Therefore, f has a maximum value and a minimum value. Example 3.30 Let D = {(x, y) |x2 + 2x+ y2 ≤ 3}, and let f : D → R be the function defined by f(x, y) = x2 + xy3 + ex−y. Show that f has a maximum value and a minimum value. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 184 Solution Notice that D = { (x, y) |x2 + 2x+ y2 ≤ 3 } = { (x, y) | (x+ 1)2 + y2 ≤ 4 } is a closed ball. Thus, it is closed and bounded. The function f1(x, y) = x2+xy3 and the function g(x, y) = x−y are polynomial functions. Hence, they are continuous. The exponential function h(x) = ex is continuous. Hence, the function f2(x, y) = (h ◦ g)(x, y) = ex−y is continuous. Since f = f1 + f2, the function f : D → R is continuous. Since D is compact, the function f : D → R has a maximum value and a minimum value. Remark 3.6 Extreme Value Property Let S be a subset of Rn. We say that S has extreme value property provided that whenever f : S → R is a continuous function, then f has maximum and minimum values. The extreme value theorem says that if S is compact, then it has extreme value property. Now let us show the converse. Namely, if S has extreme value property, then it is compact, or equivalently, it is closed and bounded. If S is not bounded, the function f : S → R, f(x) = ∥x∥ is continuous, but it does not have maximum value. If S is not closed, there is a sequence {xk} in S that converges to a point x0 that is not in S. The function g : S → R, g(x) = ∥x− x0∥ is continuous and g(x) ≥ 0 for all x ∈ S. Since lim k→∞ g(xk) = 0, we find that inf g(S) = 0. Since x0 is not in S, there is no point x in S such that g(x) = 0. Hence, g does not have minimum value. This shows that for S to have extreme value property, it is necessary that S is closed and bounded. Therefore, a subset S of Rn has extreme value property if and only if it is compact. 3.4.2 Distance Between Sets The distance between two sets is defined in the following way. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 185 Definition 3.17 Distance Between Two Sets LetA andB be two subsets of Rn. The distance betweenA andB is defined as d(A,B) = inf {d(a,b) | a ∈ A,b ∈ B} . The distance between two sets is always well-defined and nonnegative. If A and B are not disjoint, then their distance is 0. Example 3.31 Let A = {(x, y) |x2 + y2 < 1} and let B = [1, 3] × [−1, 1]. Find the distance between the two sets A and B. Solution For k ∈ Z+, let ak be the point in A given by ak = ( 1− 1 k , 0 ) . Let b = (1, 0). Then b is in B. Notice that d(ak,b) = ∥ak − b∥ = 1 k . Hence, d(A,B) ≤ 1 k for all k ∈ Z+. This shows that the distance between A and B is 0. Figure 3.15: The sets A and B in Example 3.31. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 186 In Example 3.31, we find that the distance between two disjoint sets can be 0, even thoughthey are both bounded. Example 3.32 Let A = {(x, y) | y = 0} and let B = {(x, y) |xy = 1}. Find the distance between the two sets A and B. Solution For k ∈ Z+, let ak = (k, 0) and bk = (k, 1/k). Then ak is in A and bk is in B. Notice that d(ak,bk) = ∥ak − bk∥ = 1 k . Hence, d(A,B) ≤ 1 k for all k ∈ Z+. This shows that the distance between A and B is 0. Figure 3.16: The sets A and B in Example 3.32. In Example 3.32, we find that the distance between two disjoint sets can be 0, even though both of them are closed. When B is the one-point set {x0}, the distance between A and B is the distance from the point x0 to the set A. We denote this distance as d(x0, A). In other words, d(x0, A) = inf {d(a,x0) | a ∈ A} . If x0 is a point in A, then d(x0, A) = 0. However, the distance from a point x0 to a set A can be 0 even though x0 is not in A. For example, the distance between Chapter 3. Continuous Functions on Connected Sets and Compact Sets 187 the point x0 = (1, 0) and the set A = {(x, y) |x2 + y2 < 1} is 0, even thought x0 is not in A. The following proposition says that this cannot happen if A is closed. Proposition 3.32 LetA be a closed subset of Rn and let x0 be a point in Rn. Then d(x0, A) = 0 if and only if x0 is in A. Proof If x0 is in A, it is obvious that d(x0, A) = 0. Conversely, if x0 is not in A, x0 is in the open set Rn \ A. Therefore, there is an r > 0 such that B(x0, r) ⊂ Rn \ A. For any a ∈ A, a /∈ B(x0, r). Therefore, ∥x0 − a∥ ≥ r. Taking infimum over a ∈ A, we find that d(x0, A) ≥ r. Hence, d(x0, A) ̸= 0. Figure 3.17: A point outside a closed set has positive distance from the set. Proposition 3.33 Given a subset A of Rn, define the function f : Rn → R by f(x) = d(x, A). Then f is a continuous function. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 188 Proof We prove something stronger. For any u and v in Rn, we claim that |f(u)− f(v)| ≤ ∥u− v∥. This means that f is a Lipschitz function with Lipschitz constant 1, which implies that it is continuous. Given u and v in Rn, if a is in A, then d(u, A) ≤ ∥u− a∥ ≤ ∥v − a∥+ ∥u− v∥. This shows that ∥v − a∥ ≥ d(u, A)− ∥u− v∥. Taking infimum over a ∈ A, we find that d(v, A) ≥ d(u, A)− ∥u− v∥. Therefore, f(u)− f(v) ≤ ∥u− v∥. Interchanging u and v, we obtain f(v)− f(u) ≤ ∥u− v∥. This proves that |f(u)− f(v)| ≤ ∥u− v∥. Now we can prove the following. Theorem 3.34 Let A and C be disjoint subsets of Rn. If A is compact and C is closed, then the distance between A and C is positive. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 189 Proof By Proposition 3.33, the function f : A → R, f(a) = d(a, C) is continuous. Since A is compact, f has a minimum value. Namely, there is a point a0 in A such that d(a0, C) ≤ d(a, C) for all a ∈ A. For any a in A and c ∈ C, d(a, c) ≥ d(a, C) ≥ d(a0, C). Taking infimum over all a ∈ A and c ∈ C, we find that d(A,C) ≥ d(a0, C). By definition, we also have d(A,C) ≤ d(a0, C). Thus, d(A,C) = d(a0, C). Since A and C are disjoint and C is closed, Proposition 3.32 implies that d(A,C) = d(a0, C) > 0. An equivalent form of Theorem 3.34 is the following important theorem. Theorem 3.35 Let A be a compact subset of Rn, and let U be an open subset of Rn that contains A. Then there is a positive number δ such that if x is a point in Rn that has a distance less than δ from the set A, then x is in U . Figure 3.18: A compact set has a positive distance from the boundary of the open set that contains it. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 190 Proof Let C = Rn \ U . Then C is a closed subset of Rn that is disjoint from A. By Theorem 3.34, δ = d(A,C) > 0. If x is in Rn and d(x, A) < δ, then x cannot be in C. Therefore, x is in U . As a corollary, we have the following. Corollary 3.36 Let A be a compact subset of Rn, and let U be an open subset of Rn that contains A. Then there is a positive number r and a compact set K such that A ⊂ K ⊂ U , and if x is a point in Rn that has a distance less than r from the set A, then x is in K. Proof By Theorem 3.35, there is a positive number δ such that if x is a point in Rn that has a distance less than δ from the set A, then x is in U . Take r = δ/2, and let K = V , where V = ⋃ u∈A B(u, r). Since A is compact, it is bounded. There is a positive number M such that ∥u∥ ≤ M for all u ∈ A. If x ∈ V , then there is an u ∈ A such that ∥x − u∥ < r. This implies that ∥x∥ ≤ M + r. Hence, the set V is also bounded. Since K is the closure of a bounded set, K is compact. Since A ⊂ V , A ⊂ K. If w ∈ K, since K is the closure of V , there is a point v in V that lies in B(w, r). By the definition of V , there is a point u in A such that v ∈ B(u, r). Thus, ∥w − u∥ ≤ ∥w − v∥+ ∥v − u∥ < r + r = δ. This implies that w has a distance less than δ from A. Hence, w is in U . This shows that K ⊂ U . Now if x is a point that has distance d less than r from the set A, there is a point u is A such that ∥x− u∥ < r. This implies that x ∈ B(u, r) ∈ V ⊂ K. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 191 3.4.3 Uniform Continuity In Section 2.4, we have discussed uniform continuity. Let D be a subset of Rn and let F : D → Rm be a function defined on D. We say that F : D → Rm is uniformly continuous provided that for any ε > 0, there exists δ > 0 such that for any points u and v in D, if ∥u− v∥ < δ, then ∥F(u)− F(v)∥ < ε. If a function is uniformly continuous, it is continuous. The converse is not true. However, a continuous function that is defined on a compact subset of Rn is uniformly continuous. This is an important theorem in analysis. Theorem 3.37 Let D be a subset of Rn, and let F : D → Rm be a continuous function defined on D. If D is compact, then F : D → Rm is uniformly continuous. Proof Assume to the contrary that F : D → Rm is not uniformly continuous. Then there exists an ε > 0, for any δ > 0, there exist points u and v in D such that ∥u − v∥ < δ and ∥F(u) − F(v)∥ ≥ ε. This implies that for any k ∈ Z+, there exist uk and vk in D such that ∥uk − vk∥ < 1/k and ∥F(uk) − F(vk)∥ ≥ ε. Since D is sequentially compact, there is a subsequence {ukj} of {uk} that converges to a point u0 in D. Consider the sequence {vkj} in D. It has a subsequence {vkjl } that converges to a point v0 in D. Being a subsequence of {ukj}, the sequence {ukjl } also converges to u0. Since F : D → Rm is continuous, the sequences {F(ukjl )} and {F(vkjl )} converge to F(u0) and F(v0) respectively. Notice that by construction, ∥F(ukjl )− F(vkjl )∥ ≥ ε for all l ∈ Z+. Thus, ∥F(u0) − F(v0)∥ ≥ ε. This implies that F(u0) ̸= F(v0), and so u0 ̸= v0. Since kj1 , kj2 , . . . is a strictly increasing sequence of positive integers, kjl ≥ l. Thus, ∥ukjl − vkjl ∥ < 1 kjl ≤ 1 l . Chapter 3. Continuous Functions on Connected Sets and Compact Sets 192 Taking l → ∞ implies that u0 = v0. This gives a contradiction. Thus, F : D → Rm must be uniformly continuous. Example 3.33 Let D = (−1, 4)× (−7, 5] and let F : D → R3 be the function defined as F(x, y) = ( sin(x+ y), √ x+ y + 8, exy ) . Show that F is uniformly continuous. Solution Let U = [−1, 4] × [−7, 5]. Then U is a closed and bounded subset of R2 that contains D. The functions f1(x, y) = x+ y, f2(x, y) = x+ y + 8 and f3(x, y) = xy are polynomial functions. Hence, they are continuous. If (x, y) ∈ U , x ≥ −1, y ≥ −7 and so f2(x, y) = x+ y+8 ≥ 0. Thus, f2(U) is contained in the domain of the square root function. Since the square root function, the sine function and the exponential function are continuous on their domains, we find that the functions F1(x, y) = sin(x+ y), F2(x, y) = √ x+ y + 8, F3(x, y) = exy are continuous on U . Since U is closed and bounded, F : U → R3 is uniformly continuous. Since D ⊂ U , F : D → R3 is uniformly continuous. 3.4.4 Linear Transformations and Quadratic Forms In Chapter 2, we have seen that a linear transformation T : Rn → Rm is a matrix transformation. Namely, there exists an m× n matrixsuch that T(x) = Ax for all x ∈ Rn. A linear transformation is continuous. Theorem 2.34 says that a linear transformation is Lipschitz. More precisely, there exists a positive constant c > 0 such that ∥T(x)∥ ≤ c∥x∥ for all x ∈ Rn. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 193 Theorem 2.5 says that when m = n, a linear transformation T : Rn → Rn is invertible if and only if it is one-to-one, if and only if the matrix A is invertible, if and only if detA ̸= 0. Here we want to give a stronger characterization of a linear transformation T : Rn → Rn that is invertible. Recall that to show that a linear transformation T : Rn → Rm is one-to-one, it is sufficient to show that T(x) = 0 implies that x = 0. Theorem 3.38 Let T : Rn → Rn be a linear transformation. The following are equivalent. (a) T is invertible. (b) There is a positive constant a such that ∥T(x)∥ ≥ a∥x∥ for all x ∈ Rn. Proof (b) implies (a) is easy. Notice that (b) says that ∥x∥ ≤ 1 a ∥T(x)∥ for all x ∈ Rn. (3.3) If T(x) = 0, then ∥T(x)∥ = 0. Eq. (3.3) implies that ∥x∥ = 0. Thus, x = 0. This proves that T is one-to-one. Hence, it is invertible. Conversely, assume that T : Rn → Rn is invertible. Let Sn−1 = { (x1, . . . , xn) |x21 + · · ·+ x2n = 1 } be the standard unit (n − 1)-sphere in Rn. We have seen that Sn−1 is compact. For any u ∈ Sn−1, u ̸= 0. Therefore, T(u) ̸= 0 and so ∥T(u)∥ > 0. The function f : Sn−1 → Rn, f(u) = ∥T(u)∥ is continuous. Hence, it has a mimimum value at some u0 on Sn−1. Let a = ∥T(u0)∥. Then a > 0. Since a is the minimum value of f , ∥T(u)∥ ≥ a for all u ∈ Sn−1. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 194 Notice that if x = 0, ∥T(x)∥ ≥ a∥x∥ holds trivially. If x is in Rn and x ̸= 0, let u = αx, where α = 1/∥x∥. Then u is in Sn−1. Therefore, ∥T(u)∥ ≥ a. Since T(u) = αT(x), and α > 0, we find that ∥T(u)∥ = α∥T(x)∥. Hence, α∥T(x)∥ ≥ a. This gives ∥T(x)∥ ≥ a α = a∥x∥. In Section 2.1.5, we have reviewed some theories of quadratic forms from linear algebra. In Theorem 2.7, we state for a quadratic form QA : Rn → R, QA(x) = xTAx defined by the symmetric matrix A, we have λn∥x∥2 ≤ QA(x) ≤ λ1∥x∥2 for all x ∈ Rn. Here λn is the smallest eigenvalue of A, and λ1 is the largest eigenvalue of A. We have used Theorem 2.7 to prove that a linear transformation is Lipschitz in Theorem 2.34. It boils down to the fact that if T(x) = Ax, then ∥T(x)∥2 = xT (ATA)x, and ATA is a positive semi-definite quadractic form. In fact, we can also use Theorem 2.7 to prove Theorem 3.38, using the fact that if A is invertible, then ATA is positive definite. Let us prove a weaker version of Theorem 2.7 here, which is sufficient to establish Theorem 3.38 and the theorem which says that a linear transformation is Lipschitz. Theorem 3.39 Let A be an n×n symmetric matrix, and let QA : Rn → R be the quadratic form QA(x) = xTAx defined by A. There exists constants a and b such that a∥x∥2 ≤ QA(x) ≤ b∥x∥2 for all x ∈ Rn, QA(u) = a∥u∥2 and QA(v) = b∥v∥2 for some u and v in Rn. Therefore, (i) if A is positive semi-definite, b ≥ a ≥ 0; (ii) if A is positive definite, b ≥ a > 0. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 195 Proof As in the proof of Theorem 3.38, consider the continuous function QA : Sn−1 → R. Since Sn−1 is compact, there exsits u and v in Sn−1 such that QA(u) ≤ QA(w) ≤ QA(v) for all w ∈ Sn−1. Let a = QA(u) and b = QA(v). If x = 0, a∥x∥2 ≤ QA(x) ≤ b∥x∥2 holds trivially. Now if x is in Rn and x ̸= 0, let w = αx, where α = 1/∥x∥. Then w in on Sn−1. Notice that QA(w) = α2QA(x). Hence, a ≤ 1 ∥x∥2 QA(x) ≤ b. This proves that a∥x∥2 ≤ QA(x) ≤ b∥x∥2. 3.4.5 Lebesgue Number Lemma Now let us prove the following important theorem. Theorem 3.40 Lebesgue Number Lemma Let A be a subset of Rn, and let A = {Uα |α ∈ J} be an open covering of A. If A is compact, there exists a positive number δ such that if S is a subset of A and diamS < δ, then S is contained in one of the elements of A . Such a positive number δ is called the Lebesgue number of the covering A . We give two proofs of this theorem. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 196 First Proof of the Lebesgue Number Lemma We use proof by contradiction. Assume that there does not exist a positive number δ such that any subset S of A that has diameter less than δ lies inside an open set in A . Then for any k ∈ Z+, there is a subset Sk of A whose diameter is less than 1/k, but Sk is not contained in any element of A . For each k ∈ Z+, the set Sk cannot be empty. Let xk be any point in Sk. Then {xk} is a sequence of points in A. Since A is sequentially compact, there is a subsequence {xkm} that converges to a point x0 in A. Since A is an open covering of A, there exists β ∈ J such that x0 ∈ Uβ . Since Uβ is open, there exists r > 0 such that B(x0, r) ⊂ Uβ . Since the sequence {xkm} converges x0, there is a positive integer M such that for all m ≥M , xkm ∈ B(x0, r/2). There exists an integer j ≥M such that 1/kj < r/2. If x ∈ Akj , then ∥x− xkj∥ ≤ diamAkj < 1 kj < r 2 . Since xkj ∈ B(x0, r/2), ∥xkj −x0∥ < r/2. Therefore, ∥x−x0∥ < r. This proves that x ∈ B(x0, r) ⊂ Uβ . Thus, we have shown that Akj ⊂ Uβ . But this contradicts to Akj does not lie in any element of A . Second Proof of the Lebesgue Number Lemma Since A is compact, there are finitely many indices α1, . . . , αm in J such that A ⊂ m⋃ j=1 Uαj . For 1 ≤ j ≤ m, let Cj = Rn \ Uαj . Then Cj is a closed set and m⋂ j=1 Cj is disjoint from A. By Theorem 3.33, the function fj : A → R, fj(x) = d(x, Cj) is continuous. Define f : A→ R by f(x) = f1(x) + · · ·+ fm(x) m . Chapter 3. Continuous Functions on Connected Sets and Compact Sets 197 Then f is also a continuous function. Since A is compact, there is a point a0 in A such that f(a0) ≤ f(a) for all a ∈ A. Notice that fj(a0) ≥ 0 for all 1 ≤ j ≤ m. Since m⋂ j=1 Cj is disjoint from A, there is an 1 ≤ k ≤ m such that a0 /∈ Ck. Proposition 3.32 says that fk(a0) = d(a0, Ck) > 0. Hence, f(a0) > 0. Let δ = f(a0). It is the minimum value of the function f : A→ R. Now let S be a nonempty subset of A such that diamS < δ. Take a point x0 in S. Let 1 ≤ l ≤ m be an integer such that fl(x0) ≥ fj(x0) for all 1 ≤ j ≤ m. Then δ ≤ f(x0) ≤ fl(x0) = d(x0, Cl). For any u ∈ Cl, d(x0,u) ≥ d(x0, Cl) ≥ δ. If x ∈ S, then d(x,x0) ≤ diamS < δ. This implies that x is not in Cl. Hence, it must be in Uαl . This shows that S is contained in Uαl , which is an element of A . This completes the proof of the theorem. The Lebesgue number lemma can be used to give an alternative proof of Theorem 3.37, which says that a continuous function defined on a compact subset of Rn is uniformly continuous. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 198 Alternative Proof of Theorem 3.37 Fixed ε > 0. We want to show that there exists δ > 0 such that if u and v are in D and ∥u− v∥ < δ, then ∥F(u)− F(v)∥ < ε. We will construct an open covering of D indexed by J = D. Since F : D → Rm is continuous, for each x ∈ D, there is a positive number δx (depending on x), such that if u is in D and ∥u − x∥ < δx, then ∥F(u) − F(x)∥ < ε/2. Let Ux = B(x, δx). Then Ux is an open set. If u and v are in Ux, ∥F(u) − F(x)∥ < ε/2 and ∥F(v) − F(x)∥ < ε/2. Thus, ∥F(u)− F(v)∥ < ε. Now A = {Ux |x ∈ D} is an open covering of D. Since D is compact, the Lebesgue number lemma implies that there exists a number δ > 0 such that if S is a subset of D that has diameter less than δ, then S is contained in one of the Ux for some x ∈ D. We claim that this is the δ that we need. If u and v are two points in D and ∥u − v∥ < δ, then S = {u,v} is a set with diameter less than δ. Hence, there is an x ∈ D such that S ⊂ Ux. This implies that u and v are in Ux. Hence, ∥F(u)−F(v)∥ < ε. This completes the proof. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 199 Exercises 3.4 Question 1 Let D = {(x, y) | 2 < x2 + 4y2 < 10}, and let F : D → R3 be the functiondefined as F(x, y) = ( x x2 + y2 , y x2 + y2 , x2 − y2 x2 + y2 ) . Show that the function F : D → R3 is bounded. Question 2 Let D = {(x, y, z) | 1 ≤ x2 + 4y2 ≤ 10, 0 ≤ z ≤ 5}, and let f : D → R be the function defined as f(x, y, z) = x2 − y2 x2 + y2 + z2 . Show that the function f : D → R has a maximum value and a minimum value. Question 3 Let A = {(x, y) |x2 + 4y2 ≤ 16} and B = {(x, y) |x+ y ≥ 10}. Show that the distance between the sets A and B is positive. Question 4 Let D = {(x, y, z) |x2 + y2 + z2 ≤ 20} and let f : D → R be the function defined as f(x, y, z) = ex 2+4z2 . Show that f : D → R is uniformly continuous. Chapter 3. Continuous Functions on Connected Sets and Compact Sets 200 Question 5 Let D = (−1, 2)× (−6, 0) and let f : D → R be the function defined as f(x, y) = √ x+ y + 7 + ln(x2 + y2 + 1). Show that f : D → R is uniformly continuous. Chapter 4. Differentiating Functions of Several Variables 201 Chapter 4 Differentiating Functions of Several Variables In this chapter, we study differential calculus of functions of several variables. 4.1 Partial Derivatives When f : (a, b) → R is a function defined on an open interval (a, b), the derivative of the function at a point x0 in (a, b) is defined as f ′(x0) = lim h→0 f(x0) + h)− f(x0) h , provided that the limit exists. The derivative gives the instantaneous rate of change of the function at the point x0. Geometrically, it is the slope of the tangent line to the graph of the function f : (a, b) → R at the point (x0, f(x0)). Figure 4.1: Derivative as slope of tangent line. Now consider a function f : O → R that is defined on an open subset O of Rn, where n ≥ 2. What is the natural way to extend the concept of derivatives to this function? Chapter 4. Differentiating Functions of Several Variables 202 From the perspective of rate of change, we need to consider the change of f in various different directions. This leads us to consider directional derivatives. Another perspective is to regard existence of derivatives as differentiability and first-order approximation. Later we will see that all these are closely related. First let us consider the rates of change of the function f : O → R at a point x0 in O along the directions of the coordinate axes. These are called partial derivatives. Definition 4.1 Partial Derivatives Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. For 1 ≤ i ≤ n, we say that the function f : O → R has a partial derivative with respect to its ith component at the point x0 if the limit lim h→0 f(x0 + hei)− f(x0) h exists. In this case, we denote the limit by ∂f ∂xi (x0), and call it the partial derivative of f : O → R with respect to xi at x0. We say that the function f : O → R has partial derivatives at x0 if ∂f ∂xi (x0) exists for all 1 ≤ i ≤ n. Remark 4.1 When we consider partial derivatives of a function, we always assume that the domain of the function is an open set O, so that each point x0 in the domain is an interior point of O, and a limit point of O\{x0}. By definition of open sets, there exists r > 0 such that B(x0, r) is contained in O. This allows us to compare the function values of f in a neighbourhood of x0 from various different directions. By definition, ∂f ∂xi (x0) measures the rate of change of f at x0 in the direction of ei. It can also be interpreted as the slope of a curve at the point (x0, f(x0)) on the surface xn+1 = f(x), as shown in Figure 4.2 Chapter 4. Differentiating Functions of Several Variables 203 Notations for Partial Derivatives An alternative notation for ∂f ∂xi (x0) is fxi (x0). Figure 4.2: Partial derivative. Remark 4.2 Partial Derivatives Let x0 = (a1, a2, . . . , an) and define the function g : (−r, r) → R by g(h) = f(x0 + hei) = f(a1, . . . , ai−1, ai + h, ai+1, . . . , an). Then lim h→0 f(x0 + hei)− f(x0) h = lim h→0 g(h)− g(0) h = g′(0). Thus, fxi (x0) exists if and only if g(h) is differentiable at h = 0. Moreover, to find fxi (x0), we regard the variables x1, . . . , xi−1, xi+1, . . . , xn as constants, and differentiate with respect to xi. Hence, the derivative rules such as sum rule, product rule and quotient rule still work for partial derivatives, as long as one is clear which variable to take derivative, which variable to be regarded as constant. Chapter 4. Differentiating Functions of Several Variables 204 Example 4.1 Let f : R2 → R be the function defined as f(x, y) = x2y. Find fx(1, 2) and fy(1, 2). Solution ∂f ∂x = 2xy, ∂f ∂y = x2. Therefore, fx(1, 2) = 4, fy(1, 2) = 1. Example 4.2 Let f : R2 → R be the function defined as f(x, y) = |x + y|. Determine whether fx(0, 0) exists. Solution By definition, fx(0, 0) is given by the limit lim h→0 f(h, 0)− f(0, 0) h if it exists. Since lim h→0 f(h, 0)− f(0, 0) h = lim h→0 |h| h , and lim h→0− |h| h = −1 and lim h→0+ |h| h = 1, the limit lim h→0 f(h, 0)− f(0, 0) h does not exist. Hence, fx(0, 0) does not exist. Chapter 4. Differentiating Functions of Several Variables 205 Definition 4.2 Let O be an open subset of Rn, and let f : O → R be a function defined on O. If the function f : O → R has partial derivative with respect to xi at every point of O, this defines the function fxi : O → R. In this case, we say that the partial derivative of f with respect to xi exists. If fxi : O → R exists for all 1 ≤ i ≤ n, we say that the function f : O → R has partial derivatives. Example 4.3 Find the partial derivatives of the function f : R3 → R defined as f(x, y, z) = sin(xy + z) + 3x y2 + z2 + 1 . Solution ∂f ∂x (x, y, z) = y cos(xy + z) + 3 y2 + z2 + 1 , ∂f ∂y (x, y, z) = x cos(xy + z)− 6xy (y2 + z2 + 1)2 , ∂f ∂z (x, y, z) = cos(xy + z)− 6xz (y2 + z2 + 1)2 . For a function defined on an open subset of Rn, there are n partial derivatives with respect to the n directions defined by the coordinate axes. These define a vector in Rn. Definition 4.3 Gradient Let O be an open subset of Rn, and let x0 be a point in O. If the function f : O → R has partial derivatives at x0, we define the gradient of the function f at x0 as the vector in Rn given by ∇f(x0) = ( ∂f ∂x1 (x0), ∂f ∂x2 (x0), · · · , ∂f ∂xn (x0) ) . Let us revisit Example 4.3. Chapter 4. Differentiating Functions of Several Variables 206 Example 4.4 The gradient of the function f : R3 → R defined as f(x, y, z) = sin(xy + z) + 3x y2 + z2 + 1 in Example 4.3 is the function ∇f : R3 → R3, ∇f(x, y, z) = y cos(xy + z) + 3 y2 + z2 + 1 x cos(xy + z)− 6xy (y2 + z2 + 1)2 cos(xy + z)− 6xz (y2 + z2 + 1)2 . In particular, ∇f(1,−1, 1) = ( 0, 5 3 , 1 3 ) . It is straightforward to extend the definition of partial derivative to a function F : O → Rm whose codomain is Rm with m ≥ 2. Definition 4.4 Let O be an open subset of Rn, and let F : O → Rm be a function defined on O. Given x0 in O and 1 ≤ i ≤ n, we say that F : O → Rm has partial derivative with respect to xi at the point x0 if the limit ∂F ∂xi (x0) = lim h→0 F(x0 + hei)− F(x0) h exists. We say that F : O → Rm has partial derivative at the point x0 if ∂F ∂xi (x0) exists for each 1 ≤ i ≤ n. We say that F : O → Rm has partial derivative if it has partial derivative at each point of O. Since the limit of a function G : (−r, r) → Rm when h → 0 exists if and only if the limit of each component function Gj : (−r, r) → R, 1 ≤ j ≤ m when h→ 0 exists, we have the following. Chapter 4. Differentiating Functions of Several Variables 207 Proposition 4.1 Let O be an open subset of Rn, and let F : O → Rm be a function defined on O. Given x0 in O and 1 ≤ i ≤ n, F : O → Rm has partial derivative with respect to xi at the point x0 if and only if if each component function Fj : O → R, 1 ≤ j ≤ m has partial derivative with respect to xi at the point x0. In this case, we have ∂F ∂xi (x0) = ( ∂F1 ∂xi (x0), . . . , ∂Fm ∂xi (x0) ) . To capture all the partial derivatives, we define a derivative matrix.Definition 4.5 The Derivative Matrix Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. If F : O → Rm has partial derivative at the point x0, the derivative matrix of F : O → Rm at x0 is the m× n matrix DF(x0) = ∇F1(x0) ∇F2(x0) ... ∇Fm(x0) = ∂F1 ∂x1 (x0) ∂F1 ∂x2 (x0) · · · ∂F1 ∂xn (x0) ∂F2 ∂x1 (x0) ∂F2 ∂x2 (x0) · · · ∂F2 ∂xn (x0) ... ... . . . ... ∂Fm ∂x1 (x0) ∂Fm ∂x2 (x0) · · · ∂Fm ∂xn (x0) . When m = 1, the derivative matrix is just the gradient of the function as a row matrix. Example 4.5 Let F : R3 → R2 be the function defined as F(x, y, z) = ( xy2z3, x+ 3y − 7z ) . Find the derivative matrix of F at the point (1,−1, 2). Chapter 4. Differentiating Functions of Several Variables 208 Solution DF(x, y, z) = [ y2z3 2xyz3 3xy2z2 1 3 −7 ] . Thus, the derivative matrix of F at the point (1,−1, 2) is DF(1,−1, 2) = [ 8 −16 12 1 3 −7 ] . Since the partial derivatives of a function is defined componentwise, we can focus on functions f : O → R whose codomain is R. One might wonder why we have not mentioned the word ”differentiable” so far. For single variable functions, we have seen in volume I that if a function is differentiable at a point, then it is continuous at that point. For multivariable functions, the existence of partial derivatives is not enough to guarantee continuity, as is shown in the next example. Example 4.6 Let f : R2 → R be the function defined as f(x, y) = xy x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). Show that f is not continuous at (0, 0), but it has partial derivatives at (0, 0). Solution Consider the sequence {uk} with uk = ( 1 k , 1 k ) . It is a sequence in R2 that converges to (0, 0). Since f(uk) = 1 2 for all k ∈ Z+, Chapter 4. Differentiating Functions of Several Variables 209 Figure 4.3: The function f(x, y) defined in Example 4.6. the sequence {f(uk)} converges to 1/2. But f(0, 0) = 0 ̸= 1/2. Since there is a sequence {uk} that converges to (0, 0), but the sequence {f(uk)} does not converge to f(0, 0), f is not continuous at (0, 0). To find partial derivatives at (0, 0), we use definitions. fx(0, 0) = lim h→0 f(h, 0)− f(0, 0) h = lim h→0 0− 0 h = 0, fy(0, 0) = lim h→0 f(0, h)− f(0, 0) h = lim h→0 0− 0 h = 0. These show that f has partial derivatives at (0, 0), and fx(0, 0) = fy(0, 0) = 0. For the function defined in Example 4.6, it has partial derivatives at all points. In fact, when (x, y) ̸= (0, 0), we can apply derivative rules directly and find that ∂f ∂x (x, y) = (x2 + y2)y − 2x2y (x2 + y2)2 = y(y2 − x2) (x2 + y2)2 . Similarly, ∂f ∂y (x, y) = x(x2 − y2) (x2 + y2)2 . Let us highlight again our conclusion. Chapter 4. Differentiating Functions of Several Variables 210 Partial Derivative vs Continuity The existence of partial derivatives does not imply continuity. This prompts us to find a better definition of differentiability, which can imply continuity. This will be considered in a latter section. When the function f : O → R has partial derivative with respect to xi, we obtain the function fxi : O → R. Then we can discuss whether the function fxi has partial derivative at a point in O. Definition 4.6 Second Order Partial Derivatives Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. Given that 1 ≤ i ≤ n, 1 ≤ j ≤ n, we say that the second order partial derivative ∂2f ∂xj∂xi exists at x0 provided that there exists an open ball B(x0, r) that is contained in O such that ∂f ∂xi : B(x0, r) → R exists, and it has partial derivative with respect to xj at the point x0. In this case, we define the second order partial derivative ∂2f ∂xj∂xi (x0) of f at x0 as ∂2f ∂xj∂xi (x0) = ∂fxi ∂xj (x0) = lim h→0 fxi (x0 + hej)− fxi (x0) h . We say that the function f : O → R has second order partial derivatives at x0 provided that ∂2f ∂xj∂xi (x0) exists for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. In the same way, one can also define second order partial derivatives for a function F : O → Rm with codomain Rm when m ≥ 2. Chapter 4. Differentiating Functions of Several Variables 211 Remark 4.3 In the definition of the second order partial derivative ∂2f ∂xj∂xi (x0), instead of assuming fxi (x) exists for all x in a ball of radius r centered at x0, it is sufficient to assume that there exists r > 0 such that fxi (x0 + hej) exists for all |h| < r. Definition 4.7 Given 1 ≤ i ≤ n, 1 ≤ j ≤ n, we say that the function f : O → R has the second order partial derivative ∂2f ∂xj∂xi provided that ∂2f ∂xj∂xi (x0) exists for all x0 in O. We say that the function f : O → R has second order partial derivatives provided that ∂2f ∂xj∂xi exists for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. Notations for Second Order Partial Derivatives Alternative notations for second order partial derivatives are ∂2f ∂xj∂xi = (fxi )xj = fxixj . Notice that the orders of xi and xj are different in different notations. Remark 4.4 Given 1 ≤ i ≤ n, 1 ≤ j ≤ n, the function f : O → R has the second order partial derivative ∂2f ∂xj∂xi provided that fxi : O → R exists, and fxi has partial derivative with respect to xj . Example 4.7 Find the second order partial derivatives of the function f : R2 → R defined as f(x, y) = xe2x+3y. Chapter 4. Differentiating Functions of Several Variables 212 Solution We find the first order partial derivatives first. ∂f ∂x (x, y) = e2x+3y + 2xe2x+3y = (1 + 2x)e2x+3y, ∂f ∂y (x, y) = 3xe2x+3y. Then we compute the second order partial derivatives. ∂2f ∂x2 (x, y) = 2e2x+3y + 2(1 + 2x)e2x+3y = (4 + 4x)e2x+3y, ∂2f ∂y∂x (x, y) = 3(1 + 2x)e2x+3y = (3 + 6x)e2x+3y, ∂2f ∂x∂y (x, y) = 3e2x+3y + 6xe2x+3y = (3 + 6x)e2x+3y, ∂2f ∂y2 (x, y) = 9xe2x+3y. Definition 4.8 The Hessian Matrix Let O be an open subset of Rn that contains the point x0. If f : O → R is a function that has second order partial derivatives at x0, the Hessian matrix of f at x0 is the n× n matrix defined as Hf (x0) = [ ∂2f ∂xi∂xj (x0) ] = ∂2f ∂x21 (x0) ∂2f ∂x1∂x2 (x0) · · · ∂2f ∂x1∂xn (x0) ∂2f ∂x2∂x1 (x0) ∂2f ∂x22 (x0) · · · ∂2f ∂x2∂xn (x0) ... ... . . . ... ∂2f ∂xn∂x1 (x0) ∂2f ∂xn∂x2 (x0) · · · ∂2f ∂x2n (x0) . We do not define Hessian matrix for a function F : O → Rm with codomain Rm when m ≥ 2. Chapter 4. Differentiating Functions of Several Variables 213 Example 4.8 For the function f : R2 → R defined as f(x, y) = xe2x+3y in Example 4.7, Hf (x, y) = [ (4 + 4x)e2x+3y (3 + 6x)e2x+3y (3 + 6x)e2x+3y 9xe2x+3y ] . In Example 4.7, we notice that ∂2f ∂y∂x (x, y) = ∂2f ∂x∂y (x, y) for all (x, y) ∈ R2. The following example shows that this is not always true. Example 4.9 Consider the function f : R2 → R defined as f(x, y) = xy(x2 − y2) x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). Find fxy(0, 0) and fyx(0, 0). Figure 4.4: The function f(x, y) defined in Example 4.9. Chapter 4. Differentiating Functions of Several Variables 214 Solution To compute fxy(0, 0), we need to compute fx(0, h) for all h in a neighbourhood of 0. To compute fyx(0, 0), we need to compute fy(h, 0) for all h in a neighbourhood of 0. Notice that for any h ∈ R, f(0, h) = f(h, 0) = 0. By considering h = 0 and h ̸= 0 separately, we find that fx(0, h) = lim t→0 f(t, h)− f(0, h) t = lim t→0 h(t2 − h2) t2 + h2 = −h, fy(h, 0) = lim t→0 f(h, t)− f(h, 0) t = lim t→0 h(h2 − t2) h2 + t2 = h. It follows that fxy(0, 0) = lim h→0 fx(0, h)− fx(0, 0) h = lim h→0 −h h = −1, fyx(0, 0) = lim h→0 fy(h, 0)− fy(0, 0) h = lim h→0 h h = 1. Example 4.9 shows that there exists a function f : R2 → R which has second order partial derivatives at (0, 0) but ∂2f ∂x∂y (0, 0) ̸= ∂2f ∂y∂x (0, 0). Remark 4.5 If O is an open subset of Rn that contains the point x0, there exists r > 0 such that B(x0, r) ⊂ O. Given that f : O → R is a function defined on O, and 1 ≤ i < j ≤ n, let D be the ball with centerat (0, 0) and radius r in R2. Define the function g : D → R by g(u, v) = f(x0 + uei + vej). Then ∂2f ∂xj∂xi (x0) exists if and only if ∂2g ∂v∂u (0, 0) exists. In such case, we have ∂2f ∂xj∂xi (x0) = ∂2g ∂v∂u (0, 0). Chapter 4. Differentiating Functions of Several Variables 215 The following gives a sufficient condition to interchange the order of taking partial derivatives. Theorem 4.2 Clairaut’s Theorem or Schwarz’s Theorem Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. Assume that 1 ≤ i < j ≤ n, and the second order partial derivatives ∂2f ∂xj∂xi : O → R and ∂2f ∂xi∂xj : O → R exist. If the functions ∂2f ∂xj∂xi and ∂2f ∂xi∂xj : O → R are continuous at x0, then ∂2f ∂xj∂xi (x0) = ∂2f ∂xi∂xj (x0). Proof Since O is an open set that contains the point x0, there exists r > 0 such that B(x0, r) ⊂ O. Let D = { (u, v)|u2 + v2 < r2 } , and define the function g : D → R by g(u, v) = f(x0 + uei + vej). By Remark 4.5, g has second order partial derivatives, and ∂2g ∂v∂u and ∂2g ∂u∂v are continuous at (0, 0). We need to show that ∂2g ∂v∂u (0, 0) = ∂2g ∂u∂v (0, 0). Consider the function G(u, v) = g(u, v)− g(u, 0)− g(0, v) + g(0, 0). Notice that G(u, v) = Hv(u)−Hv(0) = Su(v)− Su(0), Chapter 4. Differentiating Functions of Several Variables 216 where Hv(u) = g(u, v)− g(u, 0), Su(v) = g(u, v)− g(0, v). For fixed v with |v| < r, the function Hv(u) is defined for those u with |u| < √ r2 − v2, such that (u, v) is in D. It is differentiable with H ′ v(u) = ∂g ∂u (u, v)− ∂g ∂u (u, 0). Hence, if (u, v) is in D, mean value theorem for single variable functions implies that there exists cu,v ∈ (0, 1) such that G(u, v) = Hv(u)−Hv(0) = uH ′ v(cu,vu) = u ( ∂g ∂u (cu,vu, v)− ∂g ∂u (cu,vu, 0) ) . Regard this now as a function of v, the mean value theorem for single variable functions implies that there exists du,v ∈ (0, 1) such that G(u, v) = uv ∂2g ∂v∂u (cu,vu, du,vv). (4.1) Using the same reasoning, we find that for (u, v) ∈ D, there exists d̃u,v ∈ (0, 1) such that G(u, v) = vS ′ u(d̃u,vv) = v ( ∂g ∂v (u, d̃u,vv)− ∂g ∂v (0, d̃u,vv) ) . Regard this as a function of u, mean value theorem implies that there exists c̃u,v ∈ (0, 1) such that G(u, v) = uv ∂2g ∂u∂v (c̃u,vu, d̃u,vv). (4.2) Comparing (4.1) and (4.2), we find that ∂2g ∂v∂u (cu,vu, du,vv) = ∂2g ∂u∂v (c̃u,vu, d̃u,vv). Chapter 4. Differentiating Functions of Several Variables 217 When (u, v) → (0, 0), (cu,vu, du,vv) → (0, 0) and (c̃u,vu, d̃u,vv) → (0, 0). The continuities of guv and gvu at (0, 0) then imply that ∂2g ∂v∂u (0, 0) = ∂2g ∂u∂v (0, 0). This completes the proof. Example 4.10 Consider the function f : R2 → R in Example 4.9 defined as f(x, y) = xy(x2 − y2) x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). When (x, y) ̸= (0, 0), we find that ∂f ∂x (x, y) = y(x4 + 4x2y2 − y4) (x2 + y2)2 , ∂f ∂y (x, y) = x(x4 − 4x2y2 − y4) (x2 + y2)2 . It follows that ∂2f ∂y∂x (x, y) = x6 + 9x4y2 − 9x2y4 − y6 (x2 + y2)3 = ∂2f ∂x∂y (x, y). Indeed, both fxy and fyx are continuous on R2 \ {(0, 0)}. Corollary 4.3 Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. If all the second order partial derivatives of the function f : O → R at x0 are continuous, then the Hessian matrix Hf (x0) of f at x0 is a symmetric matrix. Chapter 4. Differentiating Functions of Several Variables 218 Remark 4.6 One can define partial derivatives of higher orders following the same rationale as we define the second order partial derivatives. Extension of Clairaut’s theorem to higher order partial derivatives is straightforward. The key point is the continuity of the partial derivatives involved. Chapter 4. Differentiating Functions of Several Variables 219 Exercises 4.1 Question 1 Let f : R3 → R be the function defined as f(x, y, z) = xz ey + 1 . Find ∇f(1, 0,−1), the gradient of f at the point (1, 0,−1). Question 2 Let F : R2 → R3 be the function defined as F(x, y) = ( x2y, xy2, 3x2 + 4y2 ) . Find DF(2,−1), the derivative matrix of F at the point (2,−1). Question 3 Let f : R3 → R be the function defined as f(x, y, z) = x2 + 3xyz + 2y2z3. Find Hf (1,−1, 2), the Hessian matrix of f at the point (1,−1, 2). Question 4 Let f : R2 → R be the function defined as f(x, y) = 3xy x2 + 4y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). Show that f is not continuous at (0, 0), but it has partial derivatives at (0, 0). Question 5 Let f : R2 → R be the function defined as f(x, y) = |x2 + y|. Determine whether fy(1,−1) exists. Chapter 4. Differentiating Functions of Several Variables 220 Question 6 Let f : R2 → R be the function defined as f(x, y) = x2y x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). Show that f is continuous, it has partial derivatives, but the partial derivatives are not continuous. Question 7 Consider the function f : R2 → R defined as f(x, y) = xy(x2 + 9y2) 4x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). Find the Hessian matrix Hf (0, 0) of f at (0, 0). Chapter 4. Differentiating Functions of Several Variables 221 4.2 Differentiability and First Order Approximation Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. As we have seen in the previous section, even if F has partial derivatives at x0, it does not imply that F is continuous at x0. Heuristically, this is because the partial derivatives only consider the change of the function along the n directions defined by the coordinate axes, while continuity of F requires us to consider the change of F along all directions. 4.2.1 Differentiability In this section, we will give a suitable definition of differentiability to ensure that we can capture the change of F in all directions. Let us first revisit an alternative perpective of differentiability for a single variable function f : (a, b) → R, which we have discussed in volume I. If x0 is a point in (a, b), then the function f : (a, b) → R is differentiable at x0 if and only if there is a number c such that lim h→0 f(x0 + h)− f(x0)− ch h = 0. (4.3) In fact, if f is differentiable at x0, then this number c has to equal to f ′(x0). Now for a function F : O → Rm defined on an open subset O of Rn, to consider the differentiability of F at x0 ∈ O, we should compare F(x0) to F(x0+ h) for all h in a neighbourhood of 0. But then a reasonable substitute of the number c should be a linear transformation T : Rn → Rm, so that for each h in a neighbourhood of 0, it gives a vector T(h) in Rm. As now h is a vector in Rn, we cannot divide by h in (4.3). It should be replaced with ∥h∥, the norm of h. Definition 4.9 Differentiability Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. The function F : O → Rm is differentiable at x0 provided that there exists a linear transformation T : Rn → Rm so that lim h→0 F(x0 + h)− F(x0)−T(h) ∥h∥ = 0. F : O → Rm is differentiable if it is differentiable at each point of O. Chapter 4. Differentiating Functions of Several Variables 222 Remark 4.7 The differentiability of F : O → Rm at x0 amounts to the existence of a linear transformation T : Rn → Rm so that F(x0 + h) = F(x0) +T(h) + ε(h)∥h∥, where ε(h) → 0 as h → 0. The following is obvious from the definition. Proposition 4.4 Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. The function F : O → Rm is differentiable at x0 if and only if each of its component functions Fj : O → R, 1 ≤ j ≤ m is differentiable at x0. Proof Let the components of the function ε(h) = F(x0 + h)− F(x0)−T(h) ∥h∥ be ε1(h), ε2(h), . . . , εm(h). Then for 1 ≤ j ≤ m, εj(h) = Fj(x0 + h)− Fj(x0)− Tj(h) ∥h∥ . The assertion of the proposition follows from the fact that lim h→0 ε(h) = 0 if and only if lim h→0 εj(h) = 0 for all 1 ≤ j ≤ m, while lim h→0 εj(h)= 0 if and only if Fj : O → R is differentiable at x0. Let us look at a simple example of differentiable functions. Chapter 4. Differentiating Functions of Several Variables 223 Example 4.11 Let A be an m× n matrix, and let b be a point in Rm. Define the function F : Rn → Rm by F(x) = Ax+ b. Show that F : Rn → Rm is differentiable. Solution Given x0 and h in Rn, notice that F(x0 + h)− F(x0) = A(x0 + h) + b− Ax0 − b = Ah. (4.4) The map T : Rn → Rm defined as T(h) = Ah is a linear transformation. Eq. (4.4) says that F(x0 + h)− F(x0)−T(h) = 0. Thus, lim h→0 F(x0 + h)− F(x0)−T(h) ∥h∥ = 0. Therefore, F is differentiable at x0. Since the point x0 is arbitrary, the function F : Rn → Rm is differentiable. The next theorem says that differentiability implies continuity. Theorem 4.5 Differentiability Implies Continuity Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. If the function F : O → Rm is differentiable at x0, then it is continuous at x0. Proof Since F : O → Rm is differentiable at x0, there exists a linear transformation T : Rn → Rm such that ε(h) = F(x0 + h)− F(x0)−T(h) ∥h∥ h→0−−−−→ 0. Chapter 4. Differentiating Functions of Several Variables 224 By Theorem 2.34, there is a positive constant c such that ∥T(h)∥ ≤ c∥h∥ for all h ∈ Rn. Therefore, ∥F(x0 + h)− F(x0)∥ ≤ ∥T(h)∥+ ∥h∥∥ε(h)∥ ≤ ∥h∥ (c+ ∥ε(h)∥) . This implies that lim h→0 F(x0 + h) = F(x0). Thus, F : O → Rm is continuous at x0. Example 4.12 The function f : R2 → R defined as f(x, y) = xy x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0) in Example 4.6 is not differentiable at (0, 0) since it is not continuous at (0, 0). However, we have shown that it has partial derivatives at (0, 0). Let us study the function F : Rn → Rm, F(x) = Ax + b that is defined in Example 4.11. The component functions of F are F1(x1, x2, . . . , xn) = a11x1 + a12x2 + · · ·+ a1nxn + b1, F2(x1, x2, . . . , xn) = a21x1 + a22x2 + · · ·+ a2nxn + b2, ... Fm(x1, x2, . . . , xn) = am1x1 + am2x2 + · · ·+ amnxn + bm. Notice that ∇F1(x) = a1 = (a11, a12, . . . , a1n) , ∇F2(x) = a2 = (a21, a22, . . . , a2n) , ... ∇Fm(x) = am = (am1, am2, . . . , amn) Chapter 4. Differentiating Functions of Several Variables 225 are the row vectors of A. Hence, the derivative matrix of F is a given by DF(x) = ∇F1(x) ∇F2(x) ... ∇Fm(x) = a11 a12 · · · a1n a21 a22 · · · a2n ... ... . . . ... am1 am2 · · · amn , which is the matrix A itself. Observe that DF(x)h = a11h1 + a12h2 + · · ·+ a1nhn a21h1 + a22h2 + · · ·+ a2nhn ... am1h1 + am2h2 + · · ·+ amnhn = ⟨∇F1(x),h⟩ ⟨∇F2(x),h⟩ ... ⟨∇Fm(x),h⟩ . From Example 4.11, we suspect that the linear transformation T : Rn → Rm that appears in the definition of differentiability of a function should be the linear transformation defined by the derivative matrix. In fact, this is the case. Theorem 4.6 Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. The following are equivalent. (a) The function F : O → Rm is differentiable at x0. (b) The function F : O → Rm has partial derivatives at x0, and lim h→0 F(x0 + h)− F(x0)−DF(x0)h ∥h∥ = 0. (4.5) (c) For each 1 ≤ j ≤ m, the component function Fj : O → R has partial derivatives at x0, and lim h→0 Fj(x0 + h)− Fj(x0)− ⟨∇Fj(x0),h⟩ ∥h∥ = 0. Chapter 4. Differentiating Functions of Several Variables 226 Proof The equivalence of (b) and (c) is Proposition 4.4, the componentwise differentiability. Thus, we are left to prove the equivalence of (a) and (b). First, we prove (b) implies (a). If (b) holds, let T : Rn → Rm be the linear transformation defined by the derivative matrix DF(x0). Then (4.5) says that F : O → Rm is differentiable at x0. Conversely, assume that F : O → Rm is differentiable at x0. Then there exists a linear transformation T : Rn → Rm such that lim h→0 F(x0 + h)− F(x0)−T(h) ∥h∥ = 0. (4.6) Let A be a m × n matrix so that T(h) = Ah. For 1 ≤ i ≤ n, eq. (4.6) implies that lim h→0 F(x0 + hei)− F(x0)− A(hei) h = 0. This gives Aei = lim h→0 F(x0 + hei)− F(x0) h . This shows that ∂F ∂xi (x0) exists and ∂F ∂xi (x0) = Aei. Therefore, F : O → Rm has partial derivatives at x0. Since A = [ Ae1 Ae2 · · · Aen ] = [ ∂F ∂x1 (x0) ∂F ∂x2 (x0) · · · ∂F ∂xn (x0) ] = DF(x0), eq. (4.6) says that lim h→0 F(x0 + h)− F(x0)−DF(x0)h ∥h∥ = 0. This proves (a) implies (b). Chapter 4. Differentiating Functions of Several Variables 227 Corollary 4.7 Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. If the partial derivatives of F : O → Rm exist at x0, but lim h→0 F(x0 + h)− F(x0)−DF(x0)h ∥h∥ ̸= 0, then F is not differentiable at x0. Proof If F is differentiable at x0, Theorem 4.6 says that we must have lim h→0 F(x0 + h)− F(x0)−DF(x0)h ∥h∥ = 0. By contrapositive, since lim h→0 F(x0 + h)− F(x0)−DF(x0)h ∥h∥ ̸= 0, we find that F is not differentiable at x0. Example 4.13 Let f : R2 → R be the function defined as f(x, y) = x3 x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). Determine whether f is differentiable at (0, 0). Solution One can show that f is continuous at 0 = (0, 0). Hence, we cannot use continuity to determine whether f is differentiable at x0. Notice that fx(0, 0) = lim h→0 f(h, 0)− f(0, 0) h = lim h→0 h− 0 h = 1, Chapter 4. Differentiating Functions of Several Variables 228 Figure 4.5: The function f(x, y) defined in Example 4.13. fy(0, 0) = lim h→0 f(0, h)− f(0, 0) h = lim h→0 0− 0 h = 0. Therefore, f has partial derivatives at 0, and ∇f(0) = (1, 0). Now we consider the function ε(h) = f(h)− f(0)− ⟨∇f(0),h⟩ ∥h∥ = − h1h 2 2 (h21 + h22) 3/2 . Let {hk} be the sequence with hk = ( 1 k , 1 k ) . It converges to 0. Since ε(hk) = − 1 2 √ 2 for all k ∈ Z+, The sequence {ε(hk)} does not converge to 0. Hence, lim h→0 f(h)− f(0)− ⟨∇f(0),h⟩ ∥h∥ ̸= 0. Therefore, f is not differentiable at (0, 0). Example 4.13 gives a function which is continuous and has partial derivatives at a point, yet it fails to be differentiable at that point. In the following, we are going to give a sufficient condition for differentiability. We begin with a lemma. Chapter 4. Differentiating Functions of Several Variables 229 Lemma 4.8 Let x0 be a point in Rn and let f : B(x0, r) → R be a function defined on an open ball centered at x0. Assume that f : B(x0, r) → R has first order partial derivatives. For each h in Rn with ∥h∥ < r, there exists z1, . . . , zn in B(x0, r) such that f(x0 + h)− f(x0) = n∑ i=1 hi ∂f ∂xi (zi), and ∥zi − x0∥ < ∥h∥ for all 1 ≤ i ≤ n. Proof We will take a zigzag path from x0 to x0 + h, which is a union of paths parallel to the coordinate axes. For 1 ≤ i ≤ n, let xi = x0 + i∑ k=1 hkek = x0 + h1e1 + · · ·+ hiei. Then xi is in B(x0, r). Notice that B(x0, r) is a convex set. Therefore, for any 1 ≤ i ≤ n, the line segment between xi−1 and xi = xi−1 + hiei lies entirely inside B(x0, r). Since f : B(x0, r) → R has first order partial derivative with respect to xi, the function gi : [0, 1] → R, gi(t) = f(xi−1 + thiei) is differentiable and g′i(t) = hi ∂f ∂xi (xi−1 + thiei). By mean value theorem, there exists ci ∈ (0, 1) such that f(xi)− f(xi−1) = gi(1)− gi(0) = g′i(ci) = hi ∂f ∂xi (xi−1 + cihiei). Chapter 4. Differentiating Functions of Several Variables 230 Let zi = xi−1 + cihiei = x0 + i−1∑ k=1 hkek + cihiei. Then zi is a point in B(x0, r). Moreover, f(x0 + h)− f(x0) = n∑ i=1 (f(xi)− f(xi−1)) = n∑ i=1 hi ∂f ∂xi (zi). For 1 ≤ i ≤ n, since ci ∈ (0, 1), we have ∥zi − x0∥ = √ h21 + · · ·+ h2i−1 + c2ih 2 i < √ h21 + · · ·+ h2i−1 + h2i ≤ ∥h∥. This completes the proof. Figure 4.6: A zigzag path from x0 to x0 + h. Theorem 4.9 Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. If the partial derivatives of F : O → Rm exists and are continuous at x0, thenF is differentiable at x0. Chapter 4. Differentiating Functions of Several Variables 231 Proof By Proposition 4.4, it suffices to prove the theorem for a function f : O → R with codomain R. Since O is an open set that contains the point x0, there exists r > 0 such that B(x0, r) ⊂ O. By Lemma 4.8, for each h that satisfies 0 < ∥h∥ < r, there exists z1, z2, . . . , zn such that f(x0 + h)− f(x0) = n∑ i=1 hi ∂f ∂xi (zi), and ∥zi − x0∥ < ∥h∥ for all 1 ≤ i ≤ n. Therefore, f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ ∥h∥ = n∑ i=1 hi ∥h∥ ( ∂f ∂xi (zi)− ∂f ∂xi (x0) ) . Fixed ε > 0. For 1 ≤ i ≤ n, since fxi : B(x0, r) → R is continuous at x0, there exists 0 < δi ≤ r such that if 0 < ∥z− x0∥ < δi, then |fxi (z)− fxi (x0)| < ε n . Take δ = min{δ1, . . . , δn}. Then δ > 0. If ∥h∥ < δ, then for 1 ≤ i ≤ n, ∥zi − x0∥ < ∥h∥ < δ ≤ δi. Thus, |fxi (zi)− fxi (x0)| < ε n . This implies that∣∣∣∣f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ ∥h∥ ∣∣∣∣ ≤ n∑ i=1 |hi| ∥h∥ ∣∣∣∣ ∂f∂xi (zi)− ∂f ∂xi (x0) ∣∣∣∣ < ε. Hence, lim h→0 f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ ∥h∥ = 0. This proves that f is differentiable at x0. Chapter 4. Differentiating Functions of Several Variables 232 Theorem 4.9 says that a function which has continuous partial derivatives is differentiable. This prompts us to make the following definition. Definition 4.10 Continuously Differentiable Let O be an open subset of Rn, and let F : O → Rm be a function defined on O. We say that F : O → Rm is continuously differentiable, or C1, provided that it has partial derivatives that are continuous. Theorem 4.9 says that a continuously differentiable function is differentiable. Analogously, we define Ck for any k ≥ 1. Definition 4.11 CkCkCk Functions Let O be an open subset of Rn, and let F : O → Rm be a function defined on O. We say that F : O → Rm is k-times continuously differentiable, or Ck, provided that it has all partial derivatives of order k, and each of them is continuous. Definition 4.12 C∞C∞C∞ Functions Let O be an open subset of Rn, and let F : O → Rm be a function defined on O. We say that F : O → Rm is infinitely differentiable, orC∞, provided that it is Ck for all positive integers k. Proposition 4.10 Polynomials and rational functions are infinitely differentiable functions. Sketch of Proof A partial derivative of a rational function is still a rational function, which is continuous. Obviously, for any k ∈ Z+, a Ck+1 function is Ck. Chapter 4. Differentiating Functions of Several Variables 233 Remark 4.8 Higher Order Differentiability We can define second order differentiability in the following way. We say that a function F : O → R is twice differentiable at a point x0 in O if there is a neighbourhood of x0 which F has first order partial derivatives, and each of them is differentiable at the point x0. Theorem 4.9 says that a C2 function is twice differentiable. Similarly, we can define higher order differentiability. 4.2.2 First Order Approximations First we extend the concept of order of approximation to multivariable functions. Definition 4.13 Order of Approximation Let O be an open subset of Rn that contains the point x0, and let k be a positive integer. We say that the two functions F : O → Rm and G : O → Rm are kth-order of approximations of each other at x0 provided that lim h→0 F(x0 + h)−G(x0 + h) ∥h∥k = 0. Recall that a mapping G : O → Rm is a polynomial mapping of degree at most one if it has the form G(x) = a11x1 + a12x2 + · · ·+ a1nxn + b1 a21x1 + a22x2 + · · ·+ a2nxn + b2 ... am1x1 + am2x2 + · · ·+ amnxn + bm = Ax+ b, where A = [aij] and b = (b1, . . . , bm). The mapping G is a linear transformation if and only if b = 0. The following theorem shows that first order approximation is closely related to differentiability. It is a consequence of Theorem 4.6. Chapter 4. Differentiating Functions of Several Variables 234 Theorem 4.11 First Order Approximation Theorem Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. (a) If F : O → Rm is continuous at x0, and there is a polynomial mapping G : O → Rm of degree at most one which is a first order approximation of F : O → Rm at the point x0, then F : O → Rm is differentiable at x0. (b) If F : O → Rm is differentiable at x0, then there is a unique polynomial mapping G : O → Rm of degree at most one which is a first order approximation of F at x0. It is given by G(x) = F(x0) +DF(x0)(x− x0). Proof First we prove (a). Assume that G : O → Rm is a polynomial mapping of degree at most one which is a first order approximation of F : O → Rm at the point x0. There exists an m × n matrix A and a vector b in Rm such that G(x) = Ax+ b. By assumption, lim h→0 F(x0 + h)− A(x0 + h)− b ∥h∥ = 0. (4.7) This implies that lim h→0 (F(x0 + h)− A(x0 + h)− b) = 0, which gives Ax0 + b = lim h→0 F(x0 + h) = F(x0). Substitute back into (4.7), we find that lim h→0 F(x0 + h)− F(x0)− Ah ∥h∥ = 0. Chapter 4. Differentiating Functions of Several Variables 235 Since T(h) = Ah is a linear transformation, this shows that F : O → Rm is differentiable at x0. Next, we prove (b). If F : O → Rm is differentiable at x0, Theorem 4.6 says that lim h→0 F(x0 + h)− F(x0)−DF(x0)h ∥h∥ = 0. This precisely means that the polynomial mapping G : O → Rm, G(x) = F(x0) +DF(x0)(x− x0), is a first order approximation of F : O → Rm at x0. By definition, the polynomial mapping G has degree at most one. The uniqueness of G is also asserted in Theorem 4.6. Remark 4.9 The first order approximation theorem says that if the function F : O → Rm is differentiable at the point u, then there is a unique polynomial mapping G : O → Rm of degree at most one which is a first order approximation of F : O → Rm at the point u. The components of the mapping G : O → Rm are given by Gj(x1, . . . , xn) = Fj(u1, . . . , un) + n∑ i=1 ∂Fj ∂xi (u1, . . . , un)(xi − ui). Notice that this is a (generalization) of Taylor polynomial of order 1. Example 4.14 Let F : R3 → R2 be the function defined as F(x, y, z) = (xyz2, x+ 2y + 3z), and let x0 = (1,−1, 1). Find a vector b in R2 and a 2 × 3 matrix A such that lim h→0 F(x0 + h)− Ah− b ∥h∥ = 0. Chapter 4. Differentiating Functions of Several Variables 236 Solution The function F : R3 → R2 is itself a polynomial mapping. Hence, it is differentiable. The derivative matrix is given by DF(x) = [ yz2 xz2 2xyz 1 2 3 ] . By the first order approximation theorem, b = F(x0) = (−1, 2) and A = DF(1,−1, 1) = [ −1 1 −2 1 2 3 ] . Example 4.15 Determine whether the limit lim (x,y)→(0,0) ex+2y − 1− x− 2y√ x2 + y2 exists. Solution Let f(x, y) = ex+2y. Then ∂f ∂x (x, y) = ex+2y, ∂f ∂y (x, y) = 2ex+2y. It follows that f(0, 0) = 1, ∂f ∂x (0, 0) = 1, ∂f ∂y (0, 0) = 2. Since the function g(x, y) = x + 2y is continuous and the exponential function is also continuous, f has continuous first order partial derivatives. Hence, f is differentiable. By first order approximation theorem, lim (x,y)→(0,0) f(x, y)− f(0, 0)− x ∂f ∂x (0, 0)− y ∂f ∂y (0, 0)√ x2 + y2 = 0. Chapter 4. Differentiating Functions of Several Variables 237 Since f(x, y)− f(0, 0)− x ∂f ∂x (0, 0)− y ∂f ∂y (0, 0) = ex+2y − 1− x− 2y, we find that lim (x,y)→(0,0) ex+2y − 1− x− 2y√ x2 + y2 = 0. 4.2.3 Tangent Planes The tangent plane to a graph is closely related to the concept of differentiability and first order approximations. Recall that the graph of a function f : O → R defined on a subset of Rn is the subset of Rn+1 consists of all the points of the form (x, f(x)) where x ∈ O. Definition 4.14 Tangent Planes Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. The graph of f has a tangent plane at x0 if it is differentiable at x0. In this case, the tangent plane is the hyperplane of Rn+1 that satisfies the equation xn+1 = f(x0) + ⟨∇f(x0),x− x0⟩, where x = (x1, . . . , xn). The tangent plane is the graph of the polynomial function of degree at most onewhich is the first order approximation of the function f at the point x0. Example 4.16 Find the equation of the tangent plane to the graph of the function f : R2 → R, f(x, y) = x2 + 4xy + 5y2 at the point where (x, y) = (1,−1). Solution The function f is a polynomial. Hence, it is a differentiable function with ∇f(x, y) = (2x+ 4y, 4x+ 10y). Chapter 4. Differentiating Functions of Several Variables 238 Figure 4.7: The tangent plane to the graph of a function. From this, we find that ∇f(1,−1) = (−2,−6). Together with f(1,−1) = 2, we find that the equation of the tangent plane to the graph of f at the point where (x, y) = (1,−1) is z = 2− 2(x− 1)− 6(y + 1) = −2x− 6y − 2. 4.2.4 Directional Derivatives As we mentioned before, the partial derivatives measure the rate of change of the function when it varies along the directions of the coordinate axes. To capture the rate of change of a function along other directions, we define the concept of directional derivatives. Notice that a direction in Rn is specified by a unit vector. Definition 4.15 Directional Derivatives Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. Given a unit vector u in Rn, we say that F has directional derivative in the direction of u at the point x0 provided that the limit lim h→0 F(x0 + hu)− F(x0) h exists. This limit, denoted as DuF(x0), is called the directional derivative of F in the direction of u at the point x0. Chapter 4. Differentiating Functions of Several Variables 239 Whenm = 1, it is customary to denote the directional derivative of f : O → R in the direction of u at the point x0 as Duf(x0). Remark 4.10 For any nonzero vector v in Rn, we can also define DvF(x0) as DvF(x0) = lim h→0 F(x0 + hv)− F(x0) h . However, we will not call it a directional derivative unless v is a unit vector. Remark 4.11 From the definition, it is obvious that when u is one of the standard unit vectors e1, . . ., en, then the directional derivative in the direction of u is a partial derivative. More precisely, DeiF(x0) = ∂F ∂xi (x0), 1 ≤ i ≤ n. The following is obvious. Proposition 4.12 Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. Given a nonzero vector v in Rn, DvF(x0) exists if only if DvFj(x0) exists for all 1 ≤ j ≤ m. Moreover, DvF(x0) = (DvF1(x0), DvF2(x0), . . . , DvFm(x0)) . Example 4.17 Let f : R2 → R be the function defined as f(x, y) = x2y. Given that v = (v1, v2) is a nonzero vector in R2, find Dvf(3, 2). Chapter 4. Differentiating Functions of Several Variables 240 Solution By definition, Dvf(3, 2) = lim h→0 f(3 + hv1, 2 + hv2)− f(3, 2) h = g′(0), where g(h) = f(3 + hv1, 2 + hv2) = (3 + hv1) 2(2 + hv2). Since g′(h) = 2v1(3 + hv1)(2 + hv2) + v2(3 + hv1) 2, we find that Dvf(3, 2) = g′(0) = 12v1 + 9v2. Take v = e1 = (1, 0) and v = e2 = (0, 1) respectively, we find that fx(3, 2) = 12 and fy(3, 2) = 9. For general v = (v1, v2), we notice that Dvf(3, 2) = ⟨∇f(3, 2),v⟩. Example 4.18 Consider the function f : R2 → R defined as f(x, y) = xy x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0) in Example 4.6. Find all the nonzero vectors v for which Dvf(0, 0) exists. Solution Given a nonzero vector v = (v1, v2), v21 + v22 ̸= 0. By definition, Dvf(0, 0) = lim h→0 f(hv1, hv2)− f(0, 0) h = lim h→0 1 h v1v2 v21 + v22 . This limit exists if and only if v1v2 = 0, which is the case if v1 = 0 or v2 = 0. Chapter 4. Differentiating Functions of Several Variables 241 Figure 4.8: The function f(x, y) in Example 4.18. Example 4.19 Let f : R2 → R be the function defined as f(x, y) = y √ x2 + y2 |x| , if x ̸= 0, 0, if x = 0. Find all the nonzero vectors v for which Dvf(0, 0) exists. Figure 4.9: The function f(x, y) in Example 4.19. Chapter 4. Differentiating Functions of Several Variables 242 Solution Given a nonzero vector v = (v1, v2), we consider two cases. Case I: v1 = 0. Then v = (0, v2). In this case, Dvf(0, 0) = lim h→0 f(0, hv2)− f(0, 0) h = lim h→0 0− 0 h = 0. Case 2: v1 ̸= 0. Dvf(0, 0) = lim h→0 f(hv1, hv2)− f(0, 0) h = lim h→0 1 h hv2 |hv1| √ h2(v21 + v22) = v2 √ v21 + v22 |v1| . We conclude that Dvf(0, 0) exists for all nonzero vectors v. Remark 4.12 For the function considered in Example 4.19, by taking v to be (1, 0) and (0, 1) respectively, we find that fx(0, 0) = 0 and fy(0, 0) = 0. Notice that lim h→0 f(h)− f(0)− ⟨∇f(0),h⟩ ∥h∥ = lim h→0 h2 |h1| . This limit does not exist. By Corollary 4.7, f is not differentiable at (0, 0). This gives an example of a function which is not differentiable at (0, 0) but has directional derivatives at (0, 0) in all directions. In fact, one can show that f is not continuous at (0, 0). The following theorem says that differentiability of a function implies existence of directional derivatives. Chapter 4. Differentiating Functions of Several Variables 243 Theorem 4.13 Let O be an open subset of Rn that contains the point x0, and let F : O → Rm be a function defined on O. If F is differentiable at x0, then for any nonzero vector v, DvF(x0) exists and DvF(x0) = DF(x0)v = ⟨∇F1(x0),v⟩ ⟨∇F2(x0),v⟩ ... ⟨∇Fm(x0),v⟩ . Proof Again, it is sufficient to consider a function f : O → R with codomain R. By definition, Dvf(x0) is given by the limit lim h→0 f(x0 + hv)− f(x0) h if it exists. Since f is differentiable at x0, it has partial derivatives at x0 and lim h→0 f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ ∥h∥ = 0. As h→ 0, hv → 0. By limit law for composite functions, we find that lim h→0 f(x0 + hv)− f(x0)− ⟨∇f(x0), hv⟩ |h|∥v∥ = 0. This implies that lim h→0 f(x0 + hv)− f(x0)− h⟨∇f(x0),v⟩ h = 0. Thus, Dvf(x0) = lim h→0 f(x0 + hv)− f(x0) h = ⟨∇f(x0),v⟩. Chapter 4. Differentiating Functions of Several Variables 244 Example 4.20 Consider the function F : R2 → R2 defined as F(x, y) = (x2y, xy2). Find DvF(2, 3) when v = (−1, 2). Solution Since F is a polynomial mapping, it is differentiable. The derivative matrix is DF(x, y) = [ 2xy x2 y2 2xy ] . Therefore, DvF(2, 3) = DF(2, 3) [ −1 2 ] = [ 12 4 9 12 ][ −1 2 ] = [ −4 15 ] . Theorem 4.13 can be used to determine the direction which a differentiable function increase fastest at a point. Corollary 4.14 Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. If f is differentiable at x0 and ∇f(x0) ̸= 0, then at the point x0, the function f increases fastest in the direction of ∇f(x0). Proof Let u be a unit vector. Then the rate of change of the function f at the point x0 in the direction of u is given by Duf(x0) = ⟨∇f(x0),u⟩. By Cauchy-Schwarz inequality, ⟨∇f(x0),u⟩ ≤ ∥∇f(x0)∥∥u∥ = ∥∇f(x0)∥, and the equality holds if and only if u has the same direction as ∇f(x0). Chapter 4. Differentiating Functions of Several Variables 245 Exercises 4.2 Question 1 Let f : R3 → R be the function defined as f(x, y, z) = xey 2+4z. Find a vector c in R3 and a constant b such that lim h→0 f(x0 + h)− ⟨c,h⟩ − b ∥h∥ = 0, where x0 = (3, 2,−1). Question 2 Let F : R2 → R3 be the function defined as F(x, y) = (x2 + 4y2, 7xy, 2x+ y). Find a polynomial mapping G : R2 → R3 of degree at most one which is a first order approximation of F : R2 → R3 at the point (1,−1). Question 3 Let x0 = (1, 2, 0,−1), and let F : R4 → R3 be the function defined as F(x1, x2, x3, x4) = ( x2x 2 3, x3x 3 4 + x2, x4 + 2x1 + 1 ) . Find a 3× 4 matrix A and a vector b in R3 such that lim x→x0 F(x)− Ax− b ∥x− x0∥ = 0. Chapter 4. Differentiating Functions of Several Variables 246 Question 4 Let f : R2 → R be the function defined as f(x, y) = sin(x2 + y) + 5xy2. Find Dvf(1,−1) for any nonzero vector v = (v1, v2). Question 5 Let f : R2 → R be the function defined as f(x, y) = x2y2 x2 + y2 , if (x, y) ̸= (0, 0) 0, if (x, y) = (0, 0). Show that f : R2 → R is continuously differentiable. Question 6 Find the equation of the tangent plane to the graph of the functionf : R2 → R, f(x, y) = 4x2 + 3xy − y2 at the point where (x, y) = (2,−1). Question 7 Let f : R2 → R be the function defined as f(x, y) = x2y x2 + y2 , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). (a) Show that f : R2 → R is continuous. (b) Show that f : R2 → R has partial derivatives. (c) Show that f : R2 \ {(0, 0)} → R is differentiable. (d) Show that f : R2 → R is not differentiable at (0, 0). (e) Find all the nonzero vectors v = (v1, v2) for which Dvf(0, 0) exists. Chapter 4. Differentiating Functions of Several Variables 247 Question 8 Let f : R2 → R be the function defined as f(x, y) = |x| √ x2 + y2 y , if y ̸= 0, 0, if y = 0. (a) Show that f : R2 → R is not continuous at (0, 0). (b) Show that Dvf(0, 0) exists for all nonzero vectors v. Question 9 Let f : R2 → R be the function defined as f(x, y) = (x2 + y2) sin ( 1√ x2 + y2 ) , if (x, y) ̸= (0, 0), 0, if (x, y) = (0, 0). (a) Show that f : R2 → R is differentiable at (0, 0). (b) Show that f : R2 → R is not continuously differentiable at (0, 0). Chapter 4. Differentiating Functions of Several Variables 248 4.3 The Chain Rule and the Mean Value Theorem In volume I, we have seen that the chain rule plays an important role in calculating the derivative of a composite function. Given that f : (a, b) → R and g : (c, d) → R are functions such that f((a, b)) ⊂ (c, d), the chain rule says that if f is differentiable at x0, g is differentiable at y0 = f(x0), then the composite function (g ◦ f) : (a, b) → R is differentiable at x0, and (g ◦ f)′(x0) = g′(f(x0))f ′(x0). For multivariable functions, the chain rule takes the following form. Theorem 4.15 The Chain Rule Let O be an open subset of Rn, and let U be an open subset of Rk. Assume that F : O → Rk and G : U → Rm are functions such that F(O) ⊂ U . If F is differentiable at x0, G is differentiable at y0 = F(x0), then the composite function H = (G ◦ F) : O → Rm is differentiable at x0 and DH(x0) = D(G ◦ F)(x0) = DG(F(x0))DF(x0). Notice that on the right hand side, DG(F(x0)) is an m× k matrix, DF(x0) is an k × n matrix. Hence, the product DG(F(x0))DF(x0) makes sense, and it is an m× n matrix, which is the correct size for the derivative matrix DH(x0). Let us spell out more explicitly. Assume that F(x1, x2, . . . , xn) = (F1(x1, x2, . . . , xn), F2(x1, x2, . . . , xn), . . . , Fk(x1, x2, . . . , xn)), G(y1, y2, . . . , yk) = (G1(y1, y2, . . . , yk), G2(y1, y2, . . . , yk), . . . , Gm(y1, y2, . . . , yk)), H(x1, x2, . . . , xn) = (H1(x1, x2, . . . , xn), H2(x1, x2, . . . , xn), . . . , Hm(x1, x2, . . . , xn)). Then for 1 ≤ j ≤ m, Hj(x1, x2, . . . , xn) = Gj (F1(x1, x2, . . . , xn), F2(x1, x2, . . . , xn), . . . , Fk(x1, x2, . . . , xn)) . Chapter 4. Differentiating Functions of Several Variables 249 For 1 ≤ l ≤ k, let yl = Fl (x1, x2, . . . , xn) . The chain rule says that if 1 ≤ q ≤ n, ∂Hj ∂xq (x1, x2, . . . , xn) = k∑ l=1 ∂Gj ∂yl (y1, y2, . . . , yk) ∂Fl ∂xq (x1, x2, . . . , xn) = ∂Gj ∂y1 (y1, y2, . . . , yk) ∂F1 ∂xq (x1, x2, . . . , xn) + ∂Gj ∂y2 (y1, y2, . . . , yk) ∂F2 ∂xq (x1, x2, . . . , xn) ... + ∂Gj ∂yk (y1, y2, . . . , yk) ∂Fk ∂xq (x1, x2, . . . , xn). Namely, to differentiate Hj = Gj ◦ F with respect to xq, we differentiate Gj with respect to each of the variables y1, . . . , yk, multiply each by the partial derivatives of F1, . . . , Fk with respect to xq, then take the sum. Let us illustrate this with a simple example. Example 4.21 Consider the function h : R2 → R defined as h(x, y) = sin(2x+ 3y) + exy. It is straightforward to find that ∂h ∂x = 2 cos(2x+ 3y) + yexy, ∂h ∂y = 3 cos(2x+ 3y) + xexy. Notice that we can write h = g ◦ F, where F : R2 → R2 is the function F(x, y) = (2x+ 3y, xy), and g : R2 → R is the function g(u, v) = sinu+ ev. Chapter 4. Differentiating Functions of Several Variables 250 Obviously, F and g are continuously differentiable functions. DF(x, y) = [ 2 3 y x ] , Dg(u, v) = [ cosu ev ] . Taking u = 2x+ 3y and v = xy, we find that Dg(u, v)DF(x, y) = [ cos(2x+ 3y) exy ] [2 3 y x ] = [ 2 cos(2x+ 3y) + yexy 3 cos(2x+ 3y) + xexy ] = Dh(x, y). Now let us prove the chain rule. Proof of the Chain Rule Since F is differentiable at x0 and G is differentiable at y0 = F(x0), DF(x0) and DG(y0) exist. There exists positive numbers r1 and r2 such that B(x0, r1) ⊂ O and B(y0, r2) ⊂ U . Let ε1(h) = F(x0 + h)− F(x0)−DF(x0)h ∥h∥ , h ∈ B(0, r1), ε2(v) = G(y0 + v)−G(y0)−DG(y0)v ∥v∥ , v ∈ B(0, r2). Since F is differentiable at x0 and G is differentiable at y0, lim h→0 ε1(h) = 0, lim v→0 ε2(v) = 0. There exist positive constants c1 and c2 such that ∥DF(x0)h∥ ≤ c1∥h∥ for all h ∈ Rn, ∥DG(y0)v∥ ≤ c2∥v∥ for all v ∈ Rk. Now since F is differentiable at x0, it is continuous at x0. Hence, there exists a positive number r such that r ≤ r1 and F(B(x0, r)) ⊂ B(y0, r2). Chapter 4. Differentiating Functions of Several Variables 251 For h ∈ B(0, r), let v = F(x0 + h)− F(x0). Then v ∈ B(0, r2) and v = DF(x0)h+ ∥h∥ε1(h). It follows that ∥v∥ ≤ ∥DF(x0)h∥+ ∥h∥∥ε1(h)∥ ≤ ∥h∥ (c1 + ∥ε1(h)∥) . In particular, we find that when h → 0, v → 0. Now, H(x0 + h)−H(x0) = G(F(x0 + h))−G(F(x0)) = G(y0 + v)−G(y0) = DG(y0)v + ∥v∥ε2(v) = DG(y0)DF(x0)h+ ∥h∥DG(y0)ε1(h) + ∥v∥ε2(v). Therefore, for h ∈ B(0, r) \ {0}, H(x0 + h)−H(x0)−DG(y0)DF(x0)h ∥h∥ = DG(y0)ε1(h) + ∥v∥ ∥h∥ ε2(v). This implies that∥∥∥∥H(x0 + h)−H(x0)−DG(y0)DF(x0)h ∥h∥ ∥∥∥∥ ≤ ∥DG(y0)ε1(h)∥+ ∥v∥ ∥h∥ ∥ε2(v)∥ ≤ c2∥ε1(h)∥+ (c1 + ∥ε1(h)∥) ∥ε2(v)∥. Since v → 0 when h → 0, we find that ε2(v) → 0 when h → 0. Thus, we find that lim h→0 H(x0 + h)−H(x0)−DG(y0)DF(x0)h ∥h∥ = 0. Chapter 4. Differentiating Functions of Several Variables 252 This concludes that H is differentiable at x0 and DH(x0) = DG(y0)DF(x0). Example 4.22 Let F : R3 → R2 be the function defined as F(x, y, z) = (x2 + 4y2 + 9z2, xyz). Find a vector b in R2 and a 2× 3 matrix A such that lim (u,v,w)→(1,−1,0) F(2u+ v, v + w, u+ w)− b− Ap√ (u− 1)2 + (v + 1)2 + w2 = 0, where p = uv w . Solution Let p0 = (1,−1, 0), and let G : R3 → R3 be the mapping G(u, v, w) = (2u+ v, v + w, u+ w). Then H(p) = H(u, v, w) = F(2u+ v, v + w, u+ w) = (F ◦G)(u, v, w). Notice that F and G are polynomial mappings. Hence, they are infinitely differentiable. To have lim p→p0 H(v)− b− Ap ∥p− p0∥ = lim (u,v,w)→(1,−1,0) F(2u+ v, v + w, u+ w)− b− Ap√ (u− 1)2 + (v + 1)2 + w2 = 0, the first order approximation theorem says that b+ Ap = H(p0) +DH(p0) (p− p0) . Therefore, A = DH(p0) and b = H(p0)− Ap0. Chapter 4. Differentiating Functions of Several Variables 253 Notice that G(p0) = G(1,−1, 0) = (1,−1, 1), H(p0) = H(1,−1, 0) = F(1,−1, 1) = (14,−1), DG(u, v, w) = 2 1 0 0 1 1 1 0 1 , DF(x, y, z) = [ 2x 8y 18z yz xz xy ] . By chain rule, A = DF(1,−1, 1)DG(1,−1, 0) = [ 2 −8 18 −1 1 −1 ]2 1 0 0 1 1 1 0 1 = [ 22 −6 10 −3 0 0 ] . It follows that b = [ 14 −1 ] − [ 22 −6 10 −3 0 0 ] 1 −1 0 = [ −14 2 ] . Example 4.23 Let α be a positive number, and let f : Rn → R be the function defined as f(x) = ∥x∥α. Find the values of α so that f is differentiable. Solution Let g : Rn → R be the function g(x) = ∥x∥2 = x21 + x22 + · · ·+ x2n. Then g(Rn) = [0,∞), and g(x) = 0 if and only if x = 0. Chapter 4. Differentiating Functions of Several Variables 254 Since g is a polynomial, it is infinitely differentiable. Let h : [0,∞) → R be the function h(u) = uα/2. Then h is differentiable on (0,∞). Since f(x) = (h ◦ g)(x), chain rule implies that for all x0 ∈ Rn \ {0}, f is differentiable at x0. Now consider the point x = 0. Notice that for 1 ≤ i ≤ n, fxi (0) exists provided that the limit lim h→0 f(hei)− f(0) h = lim h→0 |h|α h exists. This is the case if α > 1. Therefore, f is not differentiable at x = 0 if α ≤ 1. If α > 1, we find that fxi (0) = 0 for all 1 ≤ i ≤ n. Hence, ∇f(0) = 0. Since lim h→0 f(h)− f(0)− ⟨∇f(0),h⟩ ∥h∥ = lim h→0 ∥h∥α−1 = 0, we conclude that when α > 1, f is differentiableat x = 0. Therefore, f is differentiable if and only if α > 1. Example 4.24 Let f : R2 → R be a twice continuously differentiable function, and let g : R2 → R be the function defined as g(r, θ) = f(r cos θ, r sin θ). Show that ∂2g ∂r2 + 1 r ∂g ∂r + 1 r2 ∂2g ∂θ2 = ∂2f ∂x2 + ∂2f ∂y2 . Solution Let H : R2 → R2 be the mapping defined by H(r, θ) = (r cos θ, r sin θ). Chapter 4. Differentiating Functions of Several Variables 255 Then H is infinitely differentiable, and g = f ◦ H. Let x = H1(r, θ) = r cos θ and y = H2(r, θ) = r sin θ. By chain rule, ∂g ∂r = ∂f ∂x ∂x ∂r + ∂f ∂y ∂y ∂r = cos θ ∂f ∂x + sin θ ∂f ∂y , ∂g ∂θ = ∂f ∂x ∂x ∂θ + ∂f ∂y ∂y ∂θ = −r sin θ∂f ∂x + r cos θ ∂f ∂y . Using product rule and chain rule, we then have ∂2g ∂r2 = cos θ ( ∂2f ∂x2 ∂x ∂r + ∂2f ∂y∂x ∂y ∂r ) + sin θ ( ∂2f ∂x∂y ∂x ∂r + ∂2f ∂y2 ∂y ∂r ) . Since f has continuous second order partial derivatives, fxy = fyx. Therefore, ∂2g ∂r2 = cos2 θ ∂2f ∂x2 + 2 sin θ cos θ ∂2f ∂x∂y + sin2 θ ∂2f ∂y2 . Similarly, we have ∂2g ∂θ2 = −r sin θ ( ∂2f ∂x2 ∂x ∂θ + ∂2f ∂y∂x ∂y ∂θ ) + r cos θ ( ∂2f ∂x∂y ∂x ∂θ + ∂2f ∂y2 ∂y ∂θ ) − r cos θ ∂f ∂x − r sin θ ∂f ∂y = r2 sin2 θ ∂2f ∂x2 − 2r2 sin θ cos θ ∂2f ∂x∂y + r2 cos2 θ ∂2f ∂y2 − r ∂g ∂r . From these, we obtain ∂2g ∂r2 + 1 r ∂g ∂r + 1 r2 ∂2g ∂θ2 = ∂2f ∂x2 + ∂2f ∂y2 . Example 4.24 gives the Laplacian ∆f = ∂2f ∂x2 + ∂2f ∂y2 of f in polar coordinates. It is customary that one would abuse notation and write g = f , so that the formula takes the form ∂2f ∂x2 + ∂2f ∂y2 = ∂2f ∂r2 + 1 r ∂f ∂r + 1 r2 ∂2f ∂θ2 . Chapter 4. Differentiating Functions of Several Variables 256 Remark 4.13 We can use the chain rule to prove Theorem 4.13. Given that O is an open subset of Rn that contains the point x0, and F : O → Rm is a function that is differentiable at x0, we want to show that DvF(x0) exists for any nonzero vector v, and DvF(x0) = DF(x0)v. Since O is an open set that contains the point x0, there is an r > 0 such that B(x0, r) ⊂ O. By definition, DvF(x0) = lim h→0 F(x0 + hv)− F(x0) h = g′(0), where g : (−r, r) → Rm is the function g(h) = F(x0 + hv). Let γ : (−r, r) → Rn be the function defined as γ(h) = x0 + hv. Then γ is a differentiable function with γ ′(h) = v. Since g = F ◦ γ, and γ(0) = x0, the chain rule implies that g is differentiable at h = 0 and g′(0) = DF(x0)γ ′(0) = DF(x0)v. This completes the proof. Definition 4.16 Tangent Line to a Curve A curve in Rn is a continuous function γ : [a, b] → Rn. Let c0 be a point in (a, b). If the curve γ is differentiable at c0, the tangent vector to the curve γ at the point γ(c0) is the vector γ ′(c0) in Rn, while the tangent line to the curve γ at the point γ(c0) is the line in Rn given by x : R → Rn, x(t) = γ(c0) + tγ ′(c0). Remark 4.14 Tangent Lines and Tangent Planes Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function that is differentiable at x0. We have seen that the tangent plane to the graph of f at the point (x0, f(x0)) has equation Chapter 4. Differentiating Functions of Several Variables 257 xn+1 = f(x0) + ⟨∇f(x0),x− x0⟩. Now assume that r > 0 and γ : (−r, r) → Rn+1 is a differentiable curve in Rn+1 that lies on the graph of f , and γ(0) = (x0, f(x0)). For all t ∈ (−r, r), γn+1(t) = f(γ1(t), . . . , γn(t)). By chain rule, we find that γ′n+1(0) = ⟨∇f(x0),v⟩, where v = (γ′1(0), . . . , γ ′ n(0)). The vector w = (v, γ′n+1(0)) is the tangent vector to the curve γ at the point (x0, f(x0)). The equation of the tangent line is (x1(t), . . . , xn(t), xn+1(t)) = (x0, f(x0)) + t(γ′1(0), . . . , γ ′ n(0), γ ′ n+1(0)). Thus, we find that (x1(t), . . . , xn(t)) = x(t) = x0 + tv, and xn+1(t) = f(x0) + tγ′n+1(0). These imply that xn+1(t) = f(x0) + t⟨∇f(x0),v⟩ = f(x0) + ⟨∇f(x0),x(t)− x0⟩. Thus, the tangent line to the curve γ lies in the tangent plane. In fact, the tangent plane to the graph of a function f at a point can be characterized as the unique plane that contains all the tangent lines to the differentiable curves that lie on the graph and passing through that point. Now we turn to the mean value theorem. For a single variable function, the mean value theorem says that given that f : I → R is a differentiable function defined on the open interval I , if x0 and x0 + h are two points in I , there exists Chapter 4. Differentiating Functions of Several Variables 258 c ∈ (0, 1) such that f(x0 + h)− f(x0) = hf ′(x0 + ch). Notice that the point x0 + ch is a point strictly in between x0 and x0 + h. To generalize this theorem to multivariable functions, one natural question to ask is the following. If F : O → Rm is a differentiable function defined on the open subset O of Rn, x0 and x0 +h are points in O such that the line segment between them lies entirely in O, does there exist a constant c ∈ (0, 1) such that F(x0 + h)− F(x0) = DF(x0 + ch)h? When m ≥ 2, the answer is no in general. Let us look at the following example. Example 4.25 Consider the function F : R2 → R2 defined as F(x, y) = (x2y, xy). Show that there does not exist a contant c ∈ (0, 1) such that F(x0 + h)− F(x0) = DF(x0 + ch)h, when x0 = (0, 0) and h = (1, 1). Solution Notice that DF(x, y) = [ 2xy x2 y x ] . When x0 = (0, 0) and h = (1, 1), x0+ch = (c, c). If there exists a constant c ∈ (0, 1) such that F(x0 + h)− F(x0) = DF(x0 + ch)h, Chapter 4. Differentiating Functions of Several Variables 259 then [ 1 1 ] = [ 2c2 c2 c c ][ 1 1 ] . This gives 3c2 = 1 and 2c = 1. But 2c = 1 gives c = 1/2. When c = 1/2, 3c2 = 3/4 ̸= 1. Hence, no such c can exist. However, when m = 1, we indeed have a mean value theorem. Theorem 4.16 The Mean Value Theorem Let O be an open subset of Rn, and let x0 and x0 + h be two points in O such that the line segment between them lies entirely in O. If f : O → R is a differentiable function, there exist a constant c ∈ (0, 1) such that f(x0 + h)− f(x0) = ⟨∇f(x0 + ch),h⟩ = n∑ i=1 hi ∂f ∂xi (x0 + ch). Proof Define the function γ : [0, 1] → R by γ(t) = x0 + th. Then γ is a differentiable function with γ′(t) = h. Let g = (f ◦ γ) : [0, 1] → R. Then g(t) = (f ◦ γ)(t) = f(x0 + th). Since f and γ are differentiable, the chain rule implies that g is also differentiable and g′(t) = ⟨∇f(x0 + th), γ′(t)⟩ = ⟨∇f(x0 + th),h⟩. By mean value theorem for single variable functions, we find that there exists c ∈ (0, 1) such that g(1)− g(0) = g′(c). Chapter 4. Differentiating Functions of Several Variables 260 In other words, there exists c ∈ (0, 1) such that f(x0 + h)− f(x0) = ⟨∇f(x0 + ch),h⟩. This completes the proof. As in the single variable case, the mean value theorem has the following application. Corollary 4.17 Let O be an open connected subset of Rn, and let f : O → R be a function defined on O. If f is differentiable and ∇f(x) = 0 for all x ∈ O, then f is a constant function. Proof If u and v are two points in O such that the line segment between them lies entirely in O, then the mean value theorem implies that f(u) = f(v). Since O is an open connected subset of Rn, Theorem 3.16 says that any two points u and v in O can be joined by a polygonal path in O. In other words, there are points x0,x1, . . . ,xk in O such that x0 = u, xk = v, and for 1 ≤ i ≤ k, the line segment between xi−1 and xi lies entirely in O. Therefore, f(xi−1) = f(xi) for all 1 ≤ i ≤ k. This proves that f(u) = f(v). Hence, f is a constant function. Chapter 4. Differentiating Functions of Several Variables 261 Exercises 4.3 Question 1 Let F : R2 → R3 be the function defined as F(x, y) = (x2 + y2, xy, x+ y). Find a vector b in R3 and a 3× 2 matrix A such that lim (u,v)→(1,−1) F(5u+ 3v, u− 2v)− b− Aw√ (u− 1)2 + (v + 1)2 = 0, where w = [ u v ] . Question 2 Let ϕ : R → R and ψ : R → R be functions that have continuous second order derivatives, and let c be a constant. Define the function f : R2 → R by f(t, x) = ϕ(x+ ct) + ψ(x− ct). Show that ∂2f ∂t2 − c2 ∂2f ∂x2= 0. Question 3 Let α be a constant, and let f : Rn \ {0} → R be the function defined by f(x) = ∥x∥α. Find the value(s) of α such that ∆f(x) = n∑ i=1 ∂2f ∂x2i (x) = ∂2f ∂x21 (x) + ∂2f ∂x22 (x) + · · ·+ ∂2f ∂x2n (x) = 0. Chapter 4. Differentiating Functions of Several Variables 262 Question 4 Let f : R2 → R be a function such that f(0, 0) = 2 and ∂f ∂x (x, y) = 11 and ∂f ∂y = −7 for all (x, y) ∈ R2. Show that f(x, y) = 2 + 11x− 7y for all (x, y) ∈ R2. Question 5 Let O be an open subset of R2, and let u : O → R and v : O → R be twice continuously differentiable functions. Define the function F : O → R2 by F(x, y) = (u(x, y), v(x, y)). Let U be an open subset of R2 that contains F(O), and let f : U → R be a twice continuously differentiable function. Define the function g : O → R by g(x, y) = (f ◦ F)(x, y) = f(u(x, y), v(x, y)). Find gxx, gxy and gyy in terms of the first and second order partial derivatives of u, v and f . Chapter 4. Differentiating Functions of Several Variables 263 4.4 Second Order Approximations In this section, we turn to consider second order approximations. We only consider a function f : O → R defined on an open subset O of Rn and whose codomain is R. The function is said to be twice differentiable if it has first order partial derivatives, and each fxi : O → R, 1 ≤ i ≤ n, is a differentiable function. Notice that a twice differentiable function has continuous first order partial derivatives. Hence, it is differentiable. The differentiability of each fxi , 1 ≤ i ≤ n also implies that f has second order partial derivatives. Lemma 4.18 Let O be an open subset of Rn, and let f : O → R be a twice differentiable function defined on O. If x0 and x0 + h are two points in O such that the line segment between them lies entirely in O, then there is a c ∈ (0, 1) such that f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ = 1 2 hTHf (x0 + ch)h = 1 2 n∑ i=1 n∑ j=1 hihj ∂2f ∂xj∂xi (x0 + ch). Proof Given x0 ∈ O, let r be a positive number such that B(x0, r) ⊂ O. Define the function g : (−r, r) → R by g(t) = f(x0 + th). Since f : O → R is differentiable, chain rule implies that g : (−r, r) → R is differentiable and g′(t) = n∑ i=1 hi ∂f ∂xi (x0 + th) = ⟨∇f(x0 + th),h⟩. Since each fxi : O → R, 1 ≤ i ≤ n is differentiable, chain rule again implies that g′ is differentiable and Chapter 4. Differentiating Functions of Several Variables 264 g′′(t) = n∑ i=1 n∑ j=1 hihj ∂f ∂xj∂xi (x0 + th) = hTHf (x0 + th)h. By Lagrange’s remainder theorem, there is a c ∈ (0, 1) such that g(1)− g(0)− g′(0)(1− 0) = g′′(c) 2 (1− 0)2. This gives f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ = 1 2 n∑ i=1 n∑ j=1 hihj ∂2f ∂xj∂xi (x0 + ch). If a function has continuous second order partial derivatives, then it is twice differentiable, and Clairaut’s theorem implies that its Hessian matrix is symmetric. For such a function, we can prove the second order approximation theorem. Theorem 4.19 Second Order Approximation Theorem Let O be an open subset of Rn that contains the point x0, and let f : O → R be a twice continuously differentiable function defined on O. We have the followings. (a) lim h→0 f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ − 1 2 hTHf (x0)h ∥h∥2 = 0. (b) If Q(x) is a polynomial of degree at most two such that lim h→0 f(x0 + h)−Q(x0 + h) ∥h∥2 = 0, then Q(x) = f(x0)+⟨∇f(x0),x−x0⟩+ 1 2 (x−x0) THf (x0)(x−x0). (4.8) Combining (a) and (b), the second order approximation theorem says that for a twice continuously differentiable function, there exists a unique polynomial of degree at most 2 which is a second order approximation of the function. Chapter 4. Differentiating Functions of Several Variables 265 Proof Let us prove part (a) first. Since O is open, there is an r > 0 such that B(x0, r) ⊂ O. For each h in Rn with ∥h∥ < r, Lemma 4.18 says that there is a ch ∈ (0, 1) such that f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ = 1 2 hTHf (x0 + ch)h. Therefore, if 0 < ∥h∥ < r,∣∣∣∣∣∣∣ f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ − 1 2 hTHf (x0)h ∥h∥2 ∣∣∣∣∣∣∣ = 1 2 ∣∣∣∣∣ n∑ i=1 n∑ j=1 hihj ∥h∥2 ( ∂2f ∂xj∂xi (x0 + chh)− ∂2f ∂xj∂xi (x0) )∣∣∣∣∣ ≤ 1 2 n∑ i=1 n∑ j=1 |hi||hj| ∥h∥2 ∣∣∣∣ ∂2f ∂xj∂xi (x0 + chh)− ∂2f ∂xj∂xi (x0) ∣∣∣∣ ≤ 1 2 n∑ i=1 n∑ j=1 ∣∣∣∣ ∂2f ∂xj∂xi (x0 + chh)− ∂2f ∂xj∂xi (x0) ∣∣∣∣ . Since ch ∈ (0, 1), lim h→0 (x0 + chh) = x0. For all 1 ≤ i ≤ n, 1 ≤ j ≤ n, fxjxi is continuous. Hence, lim h→0 ∂2f ∂xj∂xi (x0 + chh) = ∂2f ∂xj∂xi (x0). This proves that lim h→0 f(x0 + h)− f(x0)− ⟨∇f(x0),h⟩ − 1 2 hTHf (x0)h ∥h∥2 = 0. To prove part (b), let P (x) = f(x0) + ⟨∇f(x0),x− x0⟩+ 1 2 (x− x0) THf (x0)(x− x0). Part (a) says that lim h→0 f(x0 + h)− P (x0 + h) ∥h∥2 = 0. (4.9) Chapter 4. Differentiating Functions of Several Variables 266 Since Q(x) is a polynomial of degree at most two in x, Q(x0 + h) is a polynomial of degree at most two in h. Therefore, we can write Q(x0 +h) as Q(x0 + h) = c+ n∑ i=1 bihi + 1 2 n∑ i=1 aiih 2 i + ∑ 1≤i<j≤n aijhihj. Since lim h→0 f(x0 + h)−Q(x0 + h) ∥h∥2 = 0, subtracting (4.9) gives lim h→0 P (x0 + h)−Q(x0 + h) ∥h∥2 = 0. (4.10) It follows that lim h→0 (P (x0 + h)−Q(x0 + h)) = 0, (4.11) and lim h→0 P (x0 + h)−Q(x0 + h) ∥h∥ = 0. (4.12) Since f has continuous second order partial derivatives, fxjxi (x0) = fxixj (x0). Thus, P (x0 + h)−Q(x0 + h) = (f(x0)− c) + n∑ i=1 hi ( ∂f ∂xi (x0)− bi ) + 1 2 n∑ i=1 h2i ( ∂2f ∂x2i (x0)− aii ) + ∑ 1≤i<j≤n hihj ( ∂2f ∂xj∂xi (x0)− aij ) . Eq. (4.11) implies that c = f(x0). Then eq. (4.12) implies that bi = ∂f ∂xi (x0) for all 1 ≤ i ≤ n. Finally, (4.10) implies that for any 1 ≤ i ≤ j ≤ n, aij = ∂2f ∂xi∂xj (x0). This completes the proof that Q(x) = P (x). Chapter 4. Differentiating Functions of Several Variables 267 Example 4.26 Find a polynomial Q(x, y) of degree at most 2 such that lim (x,y)→(1,2) sin(4x2 − y2)−Q(x, y) (x− 1)2 + (y − 2)2 = 0. Solution Since g(x, y) = 4x2 − y2 is a polynomial function, it is infinitely differentiable. Since the sine function is also infinitely differentiable, the function f(x, y) = sin(4x2 − y2) is infinitely differentiable. fx(x, y) = 8x cos(4x2 − y2), fy(x, y) = −2y cos(4x2 − y2), fxx(x, y) = 8 cos(4x2 − y2)− 64x2 sin(4x2 − y2), fxy(x, y) = fyx(x, y) = 16xy sin(4x2 − y2), fyy(x, y) = −2 cos(4x2 − y2)− 4y2 sin(4x2 − y2). Hence, f(1, 2) = 0, fx(1, 2) = 8, fy(1, 2) = −4, fxx(1, 2) = 8, fxy(1, 2) = 0, fyy(1, 2) = −2. By the second order approximation theorem, Q(x, y) = f(1, 2) + fx(1, 2)(x− 1) + fy(1, 2)(y − 2) + 1 2 fxx(1, 2)(x− 1)2 + fxy(1, 2)(x− 1)(y − 2) + 1 2 fyy(1, 2)(y − 2)2 = 8(x− 1)− 4(y − 2) + 4(x− 1)2 − (y − 2)2 = 4x2 − y2. Example 4.27 Determine whether the limit lim (x,y)→(0,0) ex+y − 1− x− y x2 + y2 exists. If yes, find the limit. Chapter 4. Differentiating Functions of Several Variables 268 Solution Since the exponential funtion and the function g(x, y) = x+y are infinitely differentiable, the function f(x, y) = ex+y is infinitely differentiable. By the second order approximation theorem, lim (x,y)→(0,0) f(x, y)−Q(x, y) x2 + y2 = 0, where Q(x, y) = f(0, 0) + x ∂f ∂x (0, 0) + y ∂f ∂y (0, 0) + 1 2 x2 ∂2f ∂x2 (0, 0) + xy ∂2f ∂x∂y (0, 0) + 1 2 y2 ∂2f ∂y2 (0, 0). Now ∂f ∂x (x, y) = ∂f ∂y (x, y) = ∂2f ∂x2 (x, y) = ∂2f ∂x∂y (x, y) = ∂2f ∂y2 (x, y) = ex+y. Thus, f(0, 0) = ∂f ∂x (0, 0) = ∂f ∂y (0, 0) = ∂2f ∂x2 (0, 0) = ∂2f ∂x∂y (0, 0) = ∂2f ∂y2 (0, 0) = 1. It follows that Q(x, y) = 1 + x+ y + 1 2 x2 + xy + 1 2 y2. Hence, lim (x,y)→(0,0) ex+y − 1− x− y − 1 2 x2 − xy − 1 2 y2 x2 + y2 = 0. (4.13) If lim (x,y)→(0,0) ex+y − 1− x− y x2 + y2 = a exists, subtracting (4.13) shows that a = lim (x,y)→(0,0) h(x, y), where h(x, y) = 1 2 x2 + xy + 1 2 y2 x2 + y2 . Chapter 4. Differentiating Functions of Several Variables 269 This implies that if {wk} is a sequence in R2 \ {0} that converges to (0, 0), then the sequence {h(wk)} converges to a. For k ∈ Z+, let uk = ( 1 k , 0 ) , vk = ( 1 k , 1 k ) . Then {uk} and {vk} are sequencesin R2 \ {0} that converge to (0, 0). Hence, the sequences {h(uk)} and {h(vk)} both converge to a. Since h(uk) = 1 2 , h(vk) = 1 for all k ∈ Z+, the sequence {h(uk)} converges to 1 2 , while the sequence {h(vk)} converges to 1. This gives a contradiction. Hence, the limit lim (x,y)→(0,0) ex+y − 1− x− y x2 + y2 does not exist. Chapter 4. Differentiating Functions of Several Variables 270 Exercises 4.4 Question 1 Let f : R2 → R be the function f(x, y) = x2y + 4xy2. Find a polynomial Q(x, y) of degree at most 2 such that lim (x,y)→(1,−1) f(x, y)−Q(x, y) (x− 1)2 + (y + 1)2 = 0. Question 2 Determine whether the limit lim (x,y)→(0,0) sin(x+ y)− x− y x2 + y2 exists. If yes, find the limit. Question 3 Determine whether the limit lim (x,y)→(0,0) cos(x+ y)− 1 x2 + y2 exists. If yes, find the limit. Chapter 4. Differentiating Functions of Several Variables 271 4.5 Local Extrema In this section, we use differential calculus to study local extrema of a function f : O → R that is defined on an open subset O of Rn. The definition of local extrema that we give here is only restricted to such functions. Definition 4.17 Local Maximum and Local Minimum Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. 1. The point x0 is called a local maximizer of f provided that there is a δ > 0 such that B(x0, δ) ⊂ O and for all x ∈ B(x0, δ), f(x) ≤ f(x0). The value f(x0) is called a local maximum value of f . 2. The point x0 is called a local minimizer of f provided that there is a δ > 0 such that B(x0, δ) ⊂ O and for all x ∈ B(x0, δ), f(x) ≥ f(x0). The value f(x0) is called a local minimum value of f . 3. The point x0 is called a local extremizer if it is either a local maximizer or a local minimizer. The value f(x0) is called a local extreme value if it is either a local maximum value or a local minimum value. From the definition, it is obvious that x0 is a local minimizer of the function f : O → R if and only if it is a local maximizer of the function −f : O → R. Example 4.28 (a) For the function f : R2 → R, f(x, y) = x2 + y2, (0, 0) is a local minimizer. (b) For the function g : R2 → R, g(x, y) = −x2 − y2, (0, 0) is a local maximizer. Chapter 4. Differentiating Functions of Several Variables 272 (c) For the function h : R2 → R, h(x, y) = x2 − y2, 0 = (0, 0) is neither a local maximizer nor a local minimizer. For any δ > 0, let r = δ/2. The points u = (r, 0) and v = (0, r) are in B(0, δ), but h(u) = r2 > 0 = h(0), h(v) = −r2 < 0 = h(0). Figure 4.10: The functions f(x, y), g(x, y) and h(x, y) defined in Example 4.28. The following theorem gives a necessary condition for a point to be a local extremum if the function has partial derivatives at that point. Theorem 4.20 Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. If x0 is a local extremizer and f has partial derivatives at x0, then the gradient of f at x0 is the zero vector, namely, ∇f(x0) = 0. Proof Without loss of generality, assume that x0 is a local minimizer. Then there is a δ > 0 such that B(x0, δ) ⊂ O and f(x) ≥ f(x0) for all x ∈ B(x0, δ). (4.14) For 1 ≤ i ≤ n, consider the function gi : (−δ, δ) → R defined by gi(t) = f(x0 + tei). By the definition of partial derivatives, gi is differentiable at t = 0 and Chapter 4. Differentiating Functions of Several Variables 273 g′i(0) = ∂f ∂xi (x0). Eq. (4.14) implies that gi(t) ≥ gi(0) for all t ∈ (−δ, δ). In other words, t = 0 is a local minimizer of the function gi : (−δ, δ) → R. From the theory of single variable analysis, we must have g′i(0) = 0. Hence, fxi (x0) = 0 for all 1 ≤ i ≤ n. This proves that ∇f(x0) = 0. Theorem 4.20 prompts us to make the following definition. Definition 4.18 Stationary Points Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. If f has partial derivatives at x0 and ∇f(x0) = 0, we call x0 a stationary point of f . Theorem 4.20 says that if f : O → R has partial derivatives at x0, a necessary condition for x0 to be a local extremizer is that it is a stationary point. Example 4.29 For all the three functions f , g and h defined in Example 4.28, the point 0 = (0, 0) is a stationary point. However, 0 is local minimizer of f , a local maximizer of g, but neither a local maximizer nor a local minimizer of h. The behavior of the function h(x, y) = x2 − y2 in Example 4.28 prompts us to make the following definition. Chapter 4. Differentiating Functions of Several Variables 274 Definition 4.19 Saddle Points Let O be an open subset of Rn that contains the point x0, and let f : O → R be a function defined on O. The point x0 is a saddle point of the function f if it is a stationary point of f , but it is not a local extremizer. In other words, ∇f(x0) = 0, but for any δ > 0, there exist x1 and x2 in B(x0, δ) ∩O such that f(x1) > f(x0) and f(x2) < f(x0). Example 4.30 (0, 0) is a saddle point of the function h : R2 → R, h(x, y) = x2 − y2. By definition, if x0 is a stationary point of the function f : O → R, then it is either a local maximizer, a local minimizer, or a saddle point. If f : O → R has continuous second order partial derivatives at x0, we can use the second derivative test to partially determine whether x0 is a local maximizer, a local minimizer, or a saddle point. When n = 1, we have seen that a stationary point x0 of a function f is a local minimum if f ′′(x0) > 0. It is a local maximum if f ′′(x0) < 0. For multivariable functions, it is natural to expect that whether x0 is a local extremizer depends on the definiteness of the Hessian matrix Hf (x0). In Section 2.1, we have discussed the classification of a symmetric matrix. It is either positive semi-definite, negative semi-definite or indefinite. Among the positive semi-definite ones, there are those that are positive definite. Among the negative semi-definite matrices, there are those which are negative definite. Theorem 4.21 Second Derivative Test Let O be an open subset of Rn, and let f : O → R be a twice continuously differentiable function defined on O. Assume that x0 is a stationary point of f : O → R. (i) If Hf (x0) is positive definite, then x0 is a local minimizer of f . (ii) If Hf (x0) is negative definite, then x0 is a local maximizer of f . (iii) If Hf (x0) is indefinite, then x0 is a saddle point. Chapter 4. Differentiating Functions of Several Variables 275 The cases that are not covered in the second derivative test are the cases where Hf (x0) is positive semi-definite but not positive definite, or Hf (x0) is negative semi-definite but not negative definite. These are the inconclusive cases. Proof of the Second Derivative Test Notice that (i) and (ii) are equivalent since x0 is a local minimizer of f if and only if it is a local maximizer of −f , and H−f = −Hf . A symmetric matrix A is positive definite if and only if −A is negative definite. Thus, we only need to prove (i) and (iii). Since x0 is a stationary point, ∇f(x0) = 0. It follows from the second order approximation theorem that lim h→0 f(x0 + h)− f(x0)− 1 2 hTHf (x0)h ∥h∥2 = 0. (4.15) To prove (i), asume that Hf (x0) is positive definite. By Theorem 2.9, there is a positive number c such that hTHf (x0)h ≥ c∥h∥2 for all h ∈ Rn. Eq. 4.15 implies that there is a δ > 0 such that B(x0, δ) ⊂ O and for all h with 0 < ∥h∥ < δ,∣∣∣∣f(x0 + h)− f(x0)− 1 2 hTHf (x0)h ∥h∥2 ∣∣∣∣ < c 3 . Therefore,∣∣∣∣f(x0 + h)− f(x0)− 1 2 hTHf (x0)h ∣∣∣∣ ≤ c 3 ∥h∥2 for all ∥h∥ < δ. This implies that for all h with ∥h∥ < δ, f(x0 + h)− f(x0) ≥ 1 2 hTHf (x0)h− c 3 ∥h∥2 ≥ c 6 ∥h∥2 ≥ 0. Thus, f(x) ≥ f(x0) for all x ∈ B(x0, δ). This shows that x0 is a local minimizer of f . Chapter 4. Differentiating Functions of Several Variables 276 Now to prove (iii), assume that Hf (x0) is indefinite. Then there exist unit vectors u1 and u2 so that ε1 = uT 1Hf (x0)u1 < 0, ε2 = uT 2Hf (x0)u2 > 0. Let ε = 1 2 min{|ε1|, ε2}. Eq. (4.15) implies thatthere is a δ0 > 0 such that B(x0, δ0) ⊂ O and for all h with 0 < ∥h∥ < δ0,∣∣∣∣f(x0 + h)− f(x0)− 1 2 hTHf (x0)h ∣∣∣∣ < ε∥h∥2. (4.16) For any δ > 0, let r = 1 2 min{δ, δ0}. Then the points x1 = x0 + ru1 and x2 = x0 + ru2 are in the ball B(x0, δ) and the ball B(x0, δ0). Eq. (4.16) implies that for i = 1, 2, −r2ε ≤ f(x0 + rui)− f(x0)− r2 2 uT i Hf (x0)ui < r2ε. Therefore, f(x0 + ru1)− f(x0) < r2 ( 1 2 uT 1Hf (x0)u1 + ε ) = r2 ( 1 2 ε1 + ε ) ≤ 0 since ε ≤ −1 2 ε1; while f(x0 + ru2)− f(x0) > r2 ( 1 2 uT 2Hf (x0)u2 − ε ) = r2 ( 1 2 ε2 − ε ) ≥ 0 since ε ≤ 1 2 ε2. Thus, x1 and x2 are points in B(x0, δ), but f(x1) < f(x0) while f(x2) > f(x0). These show that x0 is a saddle point. A symmetric matrix is positive definite if and only if all its eigenvalues are positive. It is negative definite if and only if all its eigenvalues are negative. It is indefinite if it has at least one positive eigenvalue, and at least one negative eigenvalue. For a diagonal matrix, its eigenvalues are the entries on the diagonal. Let us revisit Example 4.28. Chapter 4. Differentiating Functions of Several Variables 277 Example 4.31 For the functions considered in Example 4.28, we have seen that (0, 0) is a stationary point of each of them. Notice that Hf (0, 0) = [ 2 0 0 2 ] is positive definite, Hg(0, 0) = [ −2 0 0 −2 ] is negative definite, Hh(0, 0) = [ 2 0 0 −2 ] is indefinite. Therefore, (0, 0) is a local minimizer of f , a local maximizer of g, and a saddle point of h. Now let us look at an example which shows that when the Hessian matrix is positive semi-definite but not positive definite, we cannot make any conclusion about the nature of a stationary point. Example 4.32 Consider the functions f : R2 → R and g : R2 → R given respectively by f(x, y) = x2 + y4, g(x, y) = x2 − y4. These are infinitely differentiable functions. It is easy to check that (0, 0) is a stationary point of both of them. Now, Hf (0, 0) = Hg(0, 0) = [ 2 0 0 0 ] is a positive semi-definite matrix. However, (0, 0) is a local minimizer of f , but a saddle point of g. To determine the definiteness of an n × n symmetric matrix by looking at the sign of its eigenvalues is ineffective when n ≥ 3. There is an easier way to determine whether a symmetric matrix is positive definite. Let us first introduce the definition of principal submatrices. Chapter 4. Differentiating Functions of Several Variables 278 Definition 4.20 Principal Submatrices Let A be an n × n matrix. For 1 ≤ k ≤ n, the kth-principal submatrix Mk of A is the k × k matrix consists of the first k rows and first k columns of A. Example 4.33 For the matrix A = 1 2 3 4 5 6 7 8 9 , the first, second and third principal submatrices are M1 = [ 1 ] , M2 = [ 1 2 4 5 ] , M3 = 1 2 3 4 5 6 7 8 9 respectively. Theorem 4.22 Sylvester’s Criterion for Positive Definiteness An n× n symmetric matrix A is positive definite if and only if detMk > 0 for all 1 ≤ k ≤ n, where Mk is its kth principal submatrix. The proof of this theorem is given in Appendix A. Using the fact that a symmetric matrix A is negative definite if and only if −A is positive definite, it is easy to obtain a criterion for a symmetric matrix to be negative definite in terms of the determinants of its principal submatrices. Theorem 4.23 Sylvester’s Criterion for Negative Definiteness An n × n symmetric matrix A is negative definite if and only if (−1)k detMk > 0 for all 1 ≤ k ≤ n, where Mk is its kth principal submatrix. Chapter 4. Differentiating Functions of Several Variables 279 Example 4.34 Consider the matrix A = 1 2 −3 −1 4 2 −3 5 8 . Since detM1 = 1, detM2 = 6, detM3 = detA = 5 are all positive, A is positive definite. For a function f : O → R defined on an open subset O of R2, we have the following. Theorem 4.24 Let O be an open subset of R2. Suppose that (x0, y0) is a stationary point of the twice continuously differentiable function f : O → R. Let D(x0, y0) = ∂2f ∂x2 (x0, y0) ∂2f ∂y2 (x0, y0)− [ ∂2f ∂x∂y (x0, y0) ]2 . (i) If ∂2f ∂x2 (x0, y0) > 0 and D(x0, y0) > 0, then the point (x0, y0) is a local minimizer of f . (ii) If ∂2f ∂x2 (x0, y0) < 0 and D(x0, y0) > 0, then the point (x0, y0) is a local maximizer of f . (iii) If D(x0, y0) < 0, the point (x0, y0) is a saddle point of f . Proof We notice that Hf (x0, y0) = ∂2f ∂x2 (x0, y0) ∂2f ∂x∂y (x0, y0) ∂2f ∂x∂y (x0, y0) ∂2f ∂y2 (x0, y0) . Chapter 4. Differentiating Functions of Several Variables 280 Hence, ∂2f ∂x2 (x0, y0) is the determinant of the first principal submatrix of Hf (x0, y0), while D(x0, y0) is the determinant of Hf (x0, y0), the second principal submatrix of Hf (x0, y0). Thus, (i) and (ii) follow from the Sylvester criteria as well as the second derivative test. For (iii), we notice that the 2× 2 matrix Hf (x0, y0) is indefinite if and only if it has one positive eigenvalue and one negative eigenvalue, if and only if D(x0, y0) = detHf (x0, y0) < 0. Now we look at some examples of the applications of the second derivative test. Example 4.35 Let f : R2 → R be the function defined as f(x, y) = x4 + y4 + 4xy. Find the stationary points of f and classify them. Solution Since f is a polynomial function, it is infinitely differentiable. ∇f(x, y) = (4x3 + 4y, 4y3 + 4x). To find the stationary points, we need to solve the system of equationsx3 + y = 0 y3 + x = 0 . From the first equation, we have y = −x3. Substitute into the second equation gives −x9 + x = 0, or equivalently, x(x8 − 1) = 0. Chapter 4. Differentiating Functions of Several Variables 281 Thus, x = 0 or x = ±1. When x = 0, y = 0. When x = ±1, y = ∓1. Therefore, the stationary points of f are u1 = (0, 0), u2 = (1,−1) and u3 = (−1, 1). Now, Hf (x, y) = [ 12x2 4 4 12y2 ] . Therefore, Hf (u1) = [ 0 4 4 0 ] , Hf (u2) = Hf (u3) = [ 12 4 4 12 ] . It follows that D(u1) = −16 < 0, D(u2) = D(u3) = 128 > 0. Since fxx(u2) = fxx(u3) = 12 > 0, we conclude that u1 is a saddle point, u2 and u3 are local minimizers. Figure 4.11: The function f(x, y) = x4 + y4 + 4xy. Chapter 4. Differentiating Functions of Several Variables 282 Example 4.36 Consider the function f : R3 → R defined as f(x, y, z) = x3 − xy2 + 5x2 − 4xy − 2xz + y2 + 6yz + 37z2. Show that (0, 0, 0) is a local minimizer of f . Solution Since f is a polynomial function, it is infinitely differentiable. Since ∇f(x, y, z) = (3x2−y2+10x−4y−2z,−2xy−4x+2y+6z,−2x+6y+74z), we find that ∇f(0, 0, 0) = (0, 0, 0). Hence, (0, 0, 0) is a stationary point. Now, Hf (x, y, z) = 6x+ 10 −2y − 4 −2 −2y − 4 −2x+ 2 6 −2 6 74 . Therefore, Hf (0, 0, 0) = 10 −4 −2 −4 2 6 −2 6 74 . The determinants of the three principal submatrices of Hf (0, 0, 0) are detM1 = 10, detM2 = ∣∣∣∣∣10 −4 −4 2 ∣∣∣∣∣ = 4, detM3 = ∣∣∣∣∣∣∣ 10 −4 −2 −4 2 6 −2 6 74 ∣∣∣∣∣∣∣ = 24. This shows that Hf (0, 0, 0) is positive definite. Hence, (0, 0, 0) is a local minimizer of f . Chapter 4. Differentiating Functions of Several Variables 283 Exercises 4.5 Question 1 Let f : R2 → R be the function defined as f(x, y) = x2 + 4y2 + 5xy − 8x− 11y + 7. Find the stationary points of f and classify them. Question 2 Let f : R2 → R be the function defined as f(x, y) = x2 + 4y2 + 3xy − 5x− 18y + 1. Find the stationary points of f and classify them. Question 3 Let f : R2 → R be the function defined as f(x, y) = x3 + y3 + 12xy. Find the stationary points of f and classify them. Question 4 Consider the function f : R3 → R defined as f(x, y, z) = z3 − 2z2 − x2 − y2 − xy + x− y. Show that (1,−1, 0) is a stationary point of f and determine the nature of this stationary point. Chapter 4. Differentiating Functions of Several Variables 284 Question 5 Consider the function f : R3 → R defined as f(x, y, z) = z3 + 2z2 − x2 − y2 − xy + x− y. Show that (1,−1, 0) is a stationary point of f and determine the nature of this stationary point. Chapter 5. The Inverse and Implicit Function Theorems 285 Chapter5 The Inverse and Implicit Function Theorems In this chapter, we discuss the inverse function theorem and implicit function theorem, which are two important theorems in multivariable analysis. Given a function that maps a subset of Rn to Rn, the inverse function theorem gives sufficient conditions for the existence of a local inverse and its differentiability. Given a system ofm equations with n+m variables, the implicit function theorem gives sufficient conditions to solve m of the variables in terms of the other n variables locally such that the solutions are differentiable functions. We want to emphasize that these theorems are local, in the sense that each of them asserts the existence of a function defined in a neighbourhood of a point. In some sense, the two theorems are equivalent, which means one can deduce one from the other. In this book, we will prove the inverse function theorem first, and use it to deduce the implicit function theorem. 5.1 The Inverse Function Theorem Let D be a subset of Rn. If the function F : D → Rn is one-to-one, we can define the inverse function F−1 : F(D) → Rn. The question we want to study here is the following. If D is an open set and F is differentiable at the point x0 in D, is the inverse function F−1 differentiable at y0 = F(x0)? For this, we also want the point y0 to be an interior point of F(D). More precisely, is there a neighbourhood U of x0 that is mapped bijectively by F to a neighbourhood V of y0? If the answer is yes, and F−1 is differentiable at y0, then the chain rule would imply that DF−1(y0)DF(x0) = In. Hence, a necessary condition for F−1 to be differentiable at y0 is that the derivative matrix DF(x0) has to be invertible. Chapter 5. The Inverse and Implicit Function Theorems 286 Let us study the map f : R → R given by f(x) = x2. The range of the function is [0,∞). Notice that if x0 > 0, then I = (0,∞) is a neighbourhood of x0 that is mapped bijectively by f to the neighbourhood J = (0,∞) of f(x0). If x0 < 0, then I = (−∞, 0) is a neighbourhood of x0 that is mapped bijectively by f to the neighbourhood J = (0,∞) of f(x0). However, if x0 = 0, the point f(x0) = 0 is not an interior point of f(R) = [0,∞). Notice that f ′(x) = 2x. Therefore, x = 0 is the point which f ′(x) = 0. If x0 > 0, take I = (0,∞) and J = (0,∞). Then f : I → J has an inverse given by f−1 : J → I , f−1(x) = √ x. It is a differentiable function with (f−1)′(x) = 1 2 √ x . In particular, at y0 = f(x0) = x20, (f−1)′(y0) = 1 2 √ y0 = 1 2x0 = 1 f ′(x0) . Similarly, if x0 < 0, take I = (−∞, 0) and J = (0,∞). Then f : I → J has an inverse given by f−1 : J → I , f−1(x) = − √ x. It is a differentiable function with (f−1)′(x) = − 1 2 √ x . In particular, at y0 = f(x0) = x20, (f−1)′(y0) = − 1 2 √ y0 = 1 2x0 = 1 f ′(x0) . For a single variable function, the inverse function theorem takes the following form. Theorem 5.1 (Single Variable) Inverse Function Theorem Let O be an open subset of R that contains the point x0, and let f : O → R be a continuously differentiable function defined on O. Suppose that f ′(x0) ̸= 0. Then there exists an open interval I containing x0 such that f maps I bijectively onto the open interval J = f(I). The inverse function f−1 : J → I is continuously differentiable. For any y ∈ J , if x is the point in I such that f(x) = y, then (f−1)′(y) = 1 f ′(x) . Chapter 5. The Inverse and Implicit Function Theorems 287 Figure 5.1: The function f : R → R, f(x) = x2. Proof Without loss of generality, assume that f ′(x0) > 0. Since O is an open set and f ′ is continuous at x0, there is an r1 > 0 such that (x0−r1, x0+r1) ⊂ O and for all x ∈ (x0 − r1, x0 + r1), |f ′(x)− f ′(x0)| < f ′(x0) 2 . This implies that f ′(x) > f ′(x0) 2 > 0 for all x ∈ (x0 − r1, x0 + r1). Therefore, f is strictly increasing on (x0− r1, x0+ r1). Take any r > 0 that is less that r1. Then [x − r, x + r] ⊂ (x0 − r1, x0 + r1). By intermediate value theorem, the function f maps [x − r, x + r] bijectively onto [f(x − r), f(x + r)]. Let I = (x− r, x + r) and J = (f(x− r), f(x + r)). Then f : I → J is a bijection and f−1 : J → I exists. In volume I, we have proved that f−1 is differentiable, and (f−1)′(y) = 1 f ′(f−1(y)) for all y ∈ J. This formula shows that (f−1)′ : J → R is continuous. Chapter 5. The Inverse and Implicit Function Theorems 288 Remark 5.1 In the inverse function theorem, we determine the invertibility of the function in a neighbourhood of a point x0. The theorem says that if f is continuously differentiable and f ′(x0) ̸= 0, then f is locally invertible at x0. Here the assumption that f ′ is continuous is essential. In volume I, we have seen that for a continuous function f : I → R defined on an open interval I to be one-to-one, it is necessary that it is strictly monotonic. The function f : R → R, f(x) = x+ x2 sin ( 1 x ) , if x ̸= 0, 0, if x = 0, is an example of a differentiable function where f ′(0) = 1 ̸= 0, but f fails to be strictly monotonic in any neighbourhood of the point x = 0. This annoying behavior can be removed if we assume that f ′ is continuous. If f ′(x0) ̸= 0 and f ′ is continuous, there is a neighbourhood I of x0 such that f ′(x) has the same sign as f ′(x0) for all x ∈ I . This implies that f is strictly monotonic on I . Example 5.1 Let f : R → R be the function defined as f(x) = 2x+ 4 cosx. Show that there is an open interval I containing 0 such that f : I → R is one-to-one, and f−1 : f(I) → R is continuously differentiable. Determine (f−1)′(f(0)). Chapter 5. The Inverse and Implicit Function Theorems 289 Solution The function f is infinitely differentiable and f ′(x) = 2 − 4 sinx. Since f ′(0) = 2 ̸= 0, the inverse function theorem says that there is an open interval I containing 0 such that f : I → R is one-to-one, and f−1 : f(I) → R is continuously differentiable. Moreover, (f−1)′(f(0)) = 1 f ′(0) = 1 2 . Now let us consider functions defined on open subsets of Rn, where n ≥ 2. We first consider a linear transformation T : Rn → Rn. There is an n× n matrix A such that T(x) = Ax. The mapping T : Rn → Rn is one-to-one if and only if A is invertible, if and only if detA ̸= 0. In this case, T is a bijection and T−1 : Rn → Rn is the linear transformation given by T−1(x) = A−1x. Notice that for any x and y in Rn, DT(x) = A, DT−1(y) = A−1. The content of the inverse function theorem is to extend this to nonlinear mappings. Theorem 5.2 Inverse Function Theorem Let O be an open subset of Rn that contains the point x0, and let F : O → Rn be a continuously differentiable function defined on O. If detDF(x0) ̸= 0, then we have the followings. (i) There exists a neighbourhood U of x0 such that F maps U bijectively onto the open set V = F(U). (ii) The inverse function F−1 : V → U is continuously differentiable. (iii) For any y ∈ V , if x is the point in U such that F(x) = y, then DF−1(y) = DF(F−1(y))−1 = DF(x)−1. Chapter 5. The Inverse and Implicit Function Theorems 290 Figure 5.2: The inverse function theorem. For a linear transformation which is a degree one polynomial mapping, the inverse function theorem holds globally. For a general continuously differentiable mapping, the inverse function theorem says that the first order approximation of the function at a point can determine the local invertibility of the function at that point. When n ≥ 2, the proof of the inverse function theorem is substantially more complicated than the n = 1 case, as we do not have the monotonicity argument used in the n = 1 case. The proof will be presented in Section 5.2. We will discuss the examples and applications in this section. Example 5.2 Let F : R2 → R2 be the mapping defined by F(x, y) = (3x− 2y + 7, 4x+ 5y − 2). Show that F is a bijection, and find F−1(x, y) and DF−1(x, y). Solution The mapping F : R2 → R2 can be written as F(x) = T(x) + b, where T : R2 → R2 is the linear transformation T(x, y) = (3x− 2y, 4x+ 5y), Chapter 5. The Inverse and Implicit Function Theorems 291 and b = (7,−2).For u = (x, y), T(u) = Au, where A = [ 3 −2 4 5 ] . Since detA = 23 ̸= 0, the linear transformation T : R2 → R2 is one- to-one. Hence, F : R2 → R2 is also one-to-one. Given v ∈ R2, let u = A−1(v − b). Then F(u) = v. Hence, F is also onto. The inverse F−1 : R2 → R2 is given by F−1(v) = A−1(v − b). Since A−1 = 1 23 [ 5 2 −4 3 ] , we find that F−1(x, y) = ( 5(x− 7) + 2(y + 2) 23 , −4(x− 7) + 3(y + 2) 23 ) = ( 5x+ 2y − 31 23 , −4x+ 3y + 34 23 ) , and DF−1(x, y) = 1 23 [ 5 2 −4 3 ] . Example 5.3 Determine the values of a such that the mapping F : R3 → R3 defined by F(x, y, z) = (2x+ y + az, x− y + 3z, 3x+ 2y + z + 7) is invertible. Solution The mapping F : R3 → R3 can be written as F(x) = T(x) + b, where T : R3 → R3 is the linear transformation T(x, y, z) = (2x+ y + az, x− y + 3z, 3x+ 2y + z), Chapter 5. The Inverse and Implicit Function Theorems 292 and b = (0, 0, 7). Thus, F is a degree one polynomial mapping with DF(x) = 2 1 a 1 −1 3 3 2 1 . The mapping F is invertible if and only if it is one-to-one, if and only if T is one-to-one, if and only if detDF(x) ̸= 0. Since detDF(x) = 5a− 6, the mapping F is invertible if and only if a ̸= 6/5. Example 5.4 Let Φ : R2 → R2 be the mapping defined as Φ(r, θ) = (r cos θ, r sin θ). Determine the points (r, θ) ∈ R2 where the inverse function theorem can be applied to this mapping. Explain the significance of this result. Solution Since sin θ and cos θ are infinitely differentiable functions, the mapping Φ is infinitely differentiable with DΦ(r, θ) = [ cos θ −r sin θ sin θ r cos θ ] . Since detDΦ(r, θ) = r cos2 θ + r sin2 θ = r, the inverse function theorem is not applicable at the point (r, θ) if r = 0. The mapping Φ is a change from polar coordinates to rectangular coordinates. The result above shows that the change of coordinates is locally one-to-one away from the origin of the xy-plane. Chapter 5. The Inverse and Implicit Function Theorems 293 Example 5.5 Consider the mapping F : R2 → R2 given by F(x, y) = (x2 − y2, xy). Show that there is a neighbourhood U of the point u0 = (1, 1) such that F : U → R2 is one-to-one, V = F(U) is an open set, and G = F−1 : V → U is continuously differentiable. Then find ∂G1 ∂y (0, 1). Solution The mapping F is a polynomial mapping. Thus, it is continuously differentiable. Notice that F(u0) = (0, 1) and DF(x, y) = [ 2x −2y y x ] , DF(u0) = [ 2 −2 1 1 ] . Since detDF(u0) = 4 ̸= 0, the inverse function theorem implies that there is a neighbourhood U of the point u0 such that F : U → R2 is one-to- one, V = F(U) is an open set, and G = F−1 : V → U is continuously differentiable. Moreover, DG(0, 1) = DF(1, 1)−1 = 1 4 [ 1 2 −1 2 ] . From here, we find that ∂G1 ∂y (0, 1) = 2 4 = 1 2 . Example 5.6 Consider the system of equations sin(x+ y) + x2y + 3xy2 = 2, 2xy + 5x2 − 2y2 = 1. Chapter 5. The Inverse and Implicit Function Theorems 294 Observe that (x, y) = (1,−1) is a solution of this system. Show that there is a neighbourhood U of u0 = (1,−1) and an r > 0 such that for all (a, b) satisfying (a− 2)2 + (b− 1)2 < r2, the system sin(x+ y) + x2y + 3xy2 = a, 2xy + 5x2 − 2y2 = b has a unique solution (x, y) that lies in U . Solution Let F : R2 → R2 be the function defined by F(x, y) = ( sin(x+ y) + x2y + 3xy2, 2xy + 5x2 − 2y2 ) . Since the sine function is infinitely differentiable, sin(x + y) is infinitely differentiable. The functions g(x, y) = x2y + 3xy2 and F2(x, y) = 2xy + 5x2 − 2y2 are polynomial functions. Hence, they are also infinitely differentiable. This shows that F is infinitely differentiable. Since DF(x, y) = [ cos(x+ y) + 2xy + 3y2 cos(x+ y) + x2 + 6xy 2y + 10x 2x− 4y ] , we find that DF(1,−1) = [ 2 −4 8 6 ] . It follows that detDF(1,−1) = 44 ̸= 0. By the inverse function theorem, there exists a neighbourhood U1 of u0 such that F : U1 → R2 is one-to-one and V = F(U1) is an open set. Since F(u0) = (2, 1), the point v0 = (2, 1) is a point in the open set V . Hence, there exists r > 0 such that B(v0, r) ⊂ V . Since B(v0, r) is open and F is continuous, U = F−1 (B(v0, r)) is an open subset of R2. The map F : U → B(v0, r) is a bijection. For all (a, b) satisfying (a − 2)2 + (b − 1)2 < r2, (a, b) is in B(v0, r). Hence, there is a unique (x, y) in U such that F(x, y) = (a, b). This means that the system Chapter 5. The Inverse and Implicit Function Theorems 295 sin(x+ y) + x2y + 3xy2 = a, 2xy + 5x2 − 2y2 = b has a unique solution (x, y) that lies in U . At the end of this section, let us prove the following theorem. Theorem 5.3 Let A be an n×n matrix, and let x0 and y0 be two points in Rn. Define the mapping F : Rn → Rn by F(x) = y0 + A (x− x0) . Then F is infinitely differentiable with DF(x) = A. It is one-to-one and onto if and only if detA ̸= 0. In this case, F−1(y) = x0 + A−1 (y − y0) , and DF−1(y) = A−1. In particular, F−1 is also infinitely differentiable. Proof Obviously, F is a polynomial mapping. Hence, F is infinitely differentiable. By a straightforward computation, we find that DF = A. Notice that F = F2 ◦ T ◦ F1, where F1 : Rn → Rn is the translation F1(x) = x − x0, T : Rn → Rn is the linear transformation T(x) = Ax, and F2 : Rn → Rn is the translation F2(y) = y+y0. Since translations are bijective mappings, F is one-to-one and onto if and only if T : Rn → Rn is one-to-one and onto, if and only if detA ̸= 0. If y = y0 + A (x− x0) , then x = x0 + A−1 (y − y0) . This gives the formula for F−1(y). The formula for DF−1(y) follows. Chapter 5. The Inverse and Implicit Function Theorems 296 Exercises 5.1 Question 1 Let f : R → R be the function defined as f(x) = e2x + 4x sinx+ 2 cosx. Show that there is an open interval I containing 0 such that f : I → R is one-to-one, and f−1 : f(I) → R is continuously differentiable. Determine (f−1)′(f(0)). Question 2 Let F : R2 → R2 be the mapping defined by F(x, y) = (3x+ 2y − 5, 7x+ 4y − 3). Show that F is a bijection, and find F−1(x, y) and DF−1(x, y). Question 3 Consider the mapping F : R2 → R2 given by F(x, y) = (x2 + y2, xy). Show that there is a neighbourhood U of the point u0 = (2, 1) such that F : U → R2 is one-to-one, V = F(U) is an open set, and G = F−1 : V → U is continuously differentiable. Then find ∂G2 ∂x (5, 2). Question 4 Let Φ : R3 → R3 be the mapping defined as Φ(ρ, ϕ, θ) = (ρ sinϕ cos θ, ρ sinϕ sin θ, ρ cosϕ). Determine the points (ρ, ϕ, θ) ∈ R3 where the inverse function theorem can be applied to this mapping. Explain the significance of this result. Chapter 5. The Inverse and Implicit Function Theorems 297 Question 5 Consider the system of equations 4x+ y − 5xy = 2, x2 + y2 − 3xy2 = 5. Observe that (x, y) = (−1, 1) is a solution of this system. Show that there is a neighbourhood U of u0 = (−1, 1) and an r > 0 such that for all (a, b) satisfying (a− 2)2 + (b− 5)2 < r2, the system 4x+ y − 5xy = a, x2 + y2 − 3xy2 = b has a unique solution (x, y) that lies in U . Chapter 5. The Inverse and Implicit Function Theorems 298 5.2 The Proof of the Inverse Function Theorem In this section, we prove the inverse function theorem stated in Theorem 5.2. The hardest part of the proof is the first statement, which asserts that there is a neighbourhood U of x0 such that restricted to U , F is one-to-one, and the image of U under F is open in Rn. In the statement of the inverse function theorem, we assume that the derivative matrix of the continuously differentiable mapping F : O → Rn is invertible at the point x0. The continuities of the partial derivatives of F then implies that there is a neighbourhood N of x0 such that the derivative matrix of F at any x in N is also invertible. Theorem 3.38 asserts that a linear transformation T : Rn → Rn is invertible if and only if there is a positive constant c such that ∥T(u)−T(v)∥ ≥ c∥u− v∥ for all u,v ∈ Rn. Definition 5.1 Stable Mappings A mapping F : D → Rn is stable if there is a positive constant c such that ∥F(u)− F(v)∥ ≥ c∥u− v∥ for all u,v ∈ D. In otherwords, a linear transformation T : Rn → Rn is invertible if and only if it is stable. Remark 5.2 Stable Mappings vs Lipschitz Mappings Let D be a subset of Rn. Observe that if F : D → Rn is a stable mapping, there is a constant c > 0 such that ∥F(u1)− F(u2)∥ ≥ c∥u1 − u2∥ for all u1,u2 ∈ D. This implies that F is one-to-one, and thus the inverse F−1 : F(D) → Rn exists. Notice that for any v1 and v2 in F(D), ∥F−1(v1)− F−1(v2)∥ ≤ 1 c ∥v1 − v2∥. This means that F−1 : F(D) → Rn is a Lipschitz mapping. Chapter 5. The Inverse and Implicit Function Theorems 299 For a mapping F : D → Rn that satisfies the assumptions in the statement of the inverse function theorem, it is stable in a neighbourhood of x0. Theorem 5.4 Let O be an open subset of Rn that contains the point x0, and let F : O → Rn be a continuously differentiable function defined on O. If detDF(x0) ̸= 0, then there exists a neighbourhood U of x0 such that DF(x) is invertible for all x ∈ U , F maps U bijectively onto the open set V = F(U), and the map F : U → V is stable. Recall that when A is a subset of Rn, u is a point in Rn, A+ u = {a+ u | a ∈ A} is the translate of the set A by the vector u. The set A is open if and only if A+u is open, A is closed if and only if A+ u is closed. Lemma 5.5 It is sufficient to prove Theorem 5.4 when x0 = 0, F(x0) = 0 and DF(x0) = In. Proof of Lemma 5.5 Assume that Theorem 5.4 holds when x0 = 0, F(x0) = 0 and DF(x0) = In. Now given that F : O → Rn is a continuously differentiable mapping with detDF(x0) ̸= 0, let y0 = F(x0) and A = DF(x0). Then A is invertible. Define the open set D as D = O − x0. It is a neighbourhood of the point 0. Let G : D → Rn be the mapping G(x) = A−1 (F(x+ x0)− y0) . Then G(0) = 0. Using the same reasoning as the proof of Theorem 5.3, we find that G is continuously differentiable and DG(x) = A−1DF(x+ x0). Chapter 5. The Inverse and Implicit Function Theorems 300 This gives DG(0) = A−1DF(x0) = In. By assumption, Theorem 5.4 holds for the mapping G. Namely, there exist neighbourhoods U and V of 0 such that G : U → V is a bijection and DG(x) is invertible for all x ∈ U . Moreover, there is a positive constant a such that ∥G(u1)−G(u2)∥ ≥ a∥u1 − u2∥ for all u1,u2 ∈ U . Let U be the neighbourhood of x0 given by U = U + x0. By Theorem 5.3, the mapping H : Rn → Rn, H(y) = A−1(y − y0) is a continuous bijection. Therefore, V = H−1(V) is an open subset of Rn that contains y0. By definition, F maps U bijectively to V . Since F(x) = y0 + AG(x− x0), we find that DF(x) = A (DG(x− x0)) . Since A is invertible, DF(x) is invertible for all x ∈ U . Theorem 3.38 says that there is a positive constant α such that ∥Ax∥ ≥ α∥x∥ for all x ∈ Rn. Therefore, for any u1 and u2 in U , ∥F(u1)− F(u2)∥ = ∥A (G(u1 − x0)−G(u2 − x0)) ∥ ≥ α∥G(u1 − x0)−G(u2 − x0)∥ ≥ aα∥u1 − u2∥. This shows that F : U → V is stable, and thus completes the proof of the lemma. Now we prove Theorem 5.4. Chapter 5. The Inverse and Implicit Function Theorems 301 Proof of Theorem 5.4 By Lemma 5.5, we only need to consider the case where x0 = 0, F(x0) = 0 and DF(x0) = In. Since F : O → Rn is continuously differentiable, the map DF : O → Mn is continuous. Since det : Mn → R is also continuous, and detDF(0) = 1, there is an r0 > 0 such that B(0, r0) ⊂ O and for all x ∈ B(0, r0), detDF(x) > 1 2 . In particular, DF(x) is invertible for all x ∈ B(0, r0). Let G : O → Rn be the mapping defined as G(x) = F(x)− x, so that F(x) = x+G(x). The mapping G is continuosly differentiable. It satisfies G(0) = 0 and DG(0) = DF(0)− In = 0. Since G is continuously differentiable, for any 1 ≤ i ≤ n, 1 ≤ j ≤ n, there exists ri,j > 0 such that B(0, ri,j) ⊂ O and for all x ∈ B(0, ri,j),∣∣∣∣∂Gi ∂xj (x) ∣∣∣∣ = ∣∣∣∣∂Gi ∂xj (x)− ∂Gi ∂xj (0) ∣∣∣∣ < 1 2n . Let r = min ({ri,j | 1 ≤ i ≤ n, 1 ≤ j ≤ n} ∪ {r0}) . Then r > 0,B(0, r) ⊂ B(0, r0) andB(0, r) ⊂ B(0, ri,j) for all 1 ≤ i ≤ n, 1 ≤ j ≤ n. The ball B(0, r) is a convex set. If u and v are two points in B(0, r), mean value theorem implies that for 1 ≤ i ≤ n, there exists zi ∈ B(0, r) such that Gi(u)−Gi(v) = n∑ j=1 (uj − vj) ∂Gi ∂xj (zi). It follows that |Gi(u)−Gi(v)| ≤ n∑ j=1 |uj − vj| ∣∣∣∣∂Gi ∂xj (zi) ∣∣∣∣ ≤ 1 2n n∑ j=1 |uj − vj| ≤ 1 2 √ n ∥u− v∥. Chapter 5. The Inverse and Implicit Function Theorems 302 Therefore, ∥G(u)−G(v)∥ = √√√√ n∑ i=1 (Gi(u)−Gi(v)) 2 ≤ 1 2 ∥u− v∥. This shows that G : B(0, r) → Rn is a map satisfying G(0) = 0, and ∥G(u)−G(v)∥ ≤ 1 2 ∥u− v∥ for all u,v ∈ B(0, r). By Theorem 2.44, the map F : B(0, r) → Rn is one-to-one, and its image contains the open ball B(0, r/2). Let V = B(0, r/2). Then V is an open subset of Rn that is contained in the image of F. Since F : B(0, r) → Rn is continuous, U = F|−1 B(0,r)(V ) is an open set. By definition, F : U → V is a bijection. Since U is contained in B(0, r0), DF(x) is invertible for all x in U . Finally, for any u and v in U , ∥F(u)− F(v)∥ ≥ ∥u− v∥ − ∥G(u)−G(v)∥ ≥ 1 2 ∥u− v∥. This completes the proof of the theorem. To complete the proof of the inverse function theorem, it remains to prove that F−1 : V → U is continuously differentiable, and DF−1(y) = DF(F−1(y))−1. Theorem 5.6 Let O be an open subset of Rn that contains the point x0, and let F : O → Rn be a continuously differentiable function defined on O. If detDF(x0) ̸= 0, then there exists a neighbourhood U of x0 such that F maps U bijectively onto the open set V = F(U), the inverse function F−1 : V → U is continuously differentiable, and for any y ∈ V , if x is the point in U such that F(x) = y, then DF−1(y) = DF(x)−1. Chapter 5. The Inverse and Implicit Function Theorems 303 Proof Theorem 5.4 asserts that there exists a neighbourhood U of x0 such that F maps U bijectively onto the open set V = F(U), DF(x) is invertible for all x in U , and there is a positive constant c such that ∥F(u1)− F(u2)∥ ≥ c∥u1 − u2∥ for all u1,u2 ∈ U. (5.1) Now given y in V , we want to show that F−1 is differentiable at y and DF−1(y) = DF(x)−1, where x = F−1(y). Since V is open, there is an r > 0 such that B(y, r) ⊂ V . For k ∈ Rn such that ∥k∥ < r, let h(k) = F−1(y + k)− F−1(y). Then F(x) = y and F(x+ h) = y + k. Eq. (5.1) implies that ∥h∥ ≤ 1 c ∥k∥. (5.2) Let A = DF(x). By assumption, A is invertible. Notice that F−1(y + k)− F−1(y)− A−1k = −A−1 (k− Ah) = −A−1 (F(x+ h)− F(x)− Ah) . There is a positive constant β such that ∥A−1y∥ ≤ β∥y∥ for all y ∈ Rn. Therefore, ∥∥∥∥F−1(y + k)− F−1(y)− A−1k ∥k∥ ∥∥∥∥ ≤ β ∥k∥ ∥ (F(x+ h)− F(x)− Ah) ∥ ≤ β c ∥∥∥∥F(x+ h)− F(x)− Ah ∥h∥ ∥∥∥∥ . (5.3) Since F is differentiable at x, lim h→0 F(x+ h)− F(x)− Ah ∥h∥ = 0. Chapter 5. The Inverse and Implicit Function Theorems 304 Eq. (5.2) implies that lim k→0 h = 0. Eq. (5.3) then implies that lim k→0 F−1(y + k)− F−1(y)− A−1k ∥k∥ = 0. This proves that F−1 is differentiable at y and DF−1(y) = A−1 = DF(x)−1. Now the map DF−1 : V → GL (n,R) is the compositions of the maps F−1 : V → U , DF : U → GL (n,R) and I : GL (n,R) → GL (n,R) which takes A to A−1. Since each of these maps is continuous, the map DF−1 : V → GL (n,R) is continuous. This completes the proof that F−1 : V → U is continuously differentiable. At the end of this section, let us give a brief discussion about the concept of homeomorphism and diffeomorphism. Definition 5.2 Homeomorphism Let A be a subset of Rm and let B be a subset of Rn. We say that A and B are homeomorphic if there exists a continuous bijective function F : A → B whose inverse F−1 : B → A is also continuous. Such a function F is called a homeomorphism between A and B. Definition 5.3 Diffeomorphism Let O and U be open subsets of Rn. We say that U and O are diffeomorphic if there exists a homeomorphism F : O → U between O and U such that F and F−1 are differentiable. Example 5.7 Let A = {(x, y) |x2 + y2 < 1} and B = {(x, y) | 4x2 + 9y2 < 36}. Define the map F : R2 → R2 by F(x, y) = (3x, 2y). Chapter 5. The Inverse and Implicit Function Theorems 305 Then F is an invertible linear transformationwith F−1(x, y) = (x 3 , y 2 ) . The mappings F and F−1 are continuously differentiable. It is easy to show that F maps A bijectively onto B. Hence, F : A → B is a diffeomorphism between A and B. Figure 5.3: A = {(x, y) |x2 + y2 < 1} and B = {(x, y) |, 4x2 + 9y2 < 36} are diffeomorphic. Theorem 5.3 gives the following. Theorem 5.7 Let A be an invertible n× n matrix, and let x0 and y0 be two points in Rn. Define the mapping F : Rn → Rn by F(x) = y0 + A (x− x0) . If O is an open subset of Rn, then F : O → F(O) is a diffeomorphism. The inverse function theorem gives the following. Chapter 5. The Inverse and Implicit Function Theorems 306 Theorem 5.8 Let O be an open subset of Rn, and let F : O → Rn be a continuously differentiable mapping such that DF(x) is invertible for all x ∈ O. If U is an open subset contained in O such that F : U → Rn is one-to-one, then F : U → F(U) is a diffeomorphism. The proof of this theorem is left as an exercise. Chapter 5. The Inverse and Implicit Function Theorems 307 Exercises 5.2 Question 1 Let F : R2 → R2 be the mapping given by F(x, y) = (xey + xy, 2x2 + 3y2). Show that there is a neighbourhood U of (−1, 0) such that the mapping F : U → R2 is stable. Question 2 Let O be an open subset of Rn, and let F : O → Rn be a continuously differentiable mapping such that detDF(x) ̸= 0 for all x ∈ O. Show that F(O) is an open set. Question 3 Let O be an open subset of Rn, and let F : O → Rn be a continuously differentiable mapping such that DF(x) is invertible for all x ∈ O. If U is an open subset contained in O such that F : U → Rn is one-to-one, then F : U → F(U) is a diffeomorphism. Question 4 Let O be an open subset of Rn, and let F : O → Rn be a differentiable mapping. Assume that there is a positive constant c such that ∥F(u)− F(v)∥ ≥ c∥u− v∥ for all u,v ∈ O. Use first order approximation theorem to show that for any x ∈ O and any h ∈ Rn, ∥DF(x)h∥ ≥ c∥h∥. Chapter 5. The Inverse and Implicit Function Theorems 308 Question 5 Let O be an open subset of Rn, and let F : O → Rn be a continuously differentiable mapping. (a) If F : O → Rn is stable, show that the derivative matrix DF(x) is invertible at every x in O. (b) Assume that the derivative matrix DF(x) is invertible at every x in O. If C is a compact subset of O, show that the mapping F : C → Rn is stable. Chapter 5. The Inverse and Implicit Function Theorems 309 5.3 The Implicit Function Theorem The implicit function theorem is about the possibility of solving m variables from a system of m equations with n+m variables. Let us study some special cases. Consider the function f : R2 → R given by f(x, y) = x2 + y2 − 1. For a point (x0, y0) that satisfies f(x0, y0) = 0, we want to ask whether there is a neighbourhood I of x0, a neighbourhood J of y0, and a function g : I → R such that for (x, y) ∈ I × J , f(x, y) = 0 if and only if y = g(x). Figure 5.4: The points in the (x, y) plane satisfying x2 + y2 − 1 = 0. If (x0, y0) is a point with y0 > 0 and f(x0, y0) = 0, then we can take the neighbourhoods I = (−1, 1) and J = (0,∞) of x0 and y0 respectively, and define the function g : I → R by g(x) = √ 1− x2. We then find that for (x, y) ∈ I × J , f(x, y) = 0 if and only if y = √ 1− x2 = g(x). If (x0, y0) is a point with y0 < 0 and f(x0, y0) = 0, then we can take the neighbourhoods I = (−1, 1) and J = (−∞, 0) of x0 and y0 respectively, and define the function g : I → R by g(x) = − √ 1− x2. We then find that for (x, y) ∈ I × J , f(x, y) = 0 if and only if y = − √ 1− x2 = g(x). However, if (x0, y0) = (1, 0), any neighbourhood J of y0 must contain an interval of the form (−r, r). If I is a neighbourhood of 1, (x, y) is a point in Chapter 5. The Inverse and Implicit Function Theorems 310 I × (−r, r) such that f(x, y) = 0, then (x,−y) is another point in I × (−r, r) satisfying f(x,−y) = 0. This shows that there does not exist any function g : I → R such that when (x, y) ∈ I × J , f(x, y) = 0 if and only if y = g(x). We say that we cannot solve y as a function of x in a neighbourhood of the point (1, 0). Similarly, we cannot solve y as a function of x in a neighbourhood of the point (−1, 0). However, in a neighbourhood of the points (1, 0) and (−1, 0), we can solve x as a function of y. For a function f : O → R defined on an open subset O of R2, the implicit function theorem takes the following form. Theorem 5.9 Dini’s Theorem Let O be an open subset of R2 that contains the point (x0, y0), and let f : O → R be a continuously differentiable function defined on O such that f(x0, y0) = 0. If ∂f ∂y (x0, y0) ̸= 0, then there is a neighbourhood I of x0, a neighbourhood J of y0, and a continuously differentiable function g : I → J such that for any (x, y) ∈ I × J , f(x, y) = 0 if and only if y = g(x). Moreover, for any x ∈ I , ∂f ∂x (x, g(x)) + ∂f ∂y (x, g(x))g′(x) = 0. Dini’s theorem says that to be able to solve y as a function of x, a sufficient condition is that the function f has continuous partial derivatives, and fy does not vanish. By interchanging the roles of x and y, we see that if fx does not vanish, we can solve x as a function of y. For the function f : R2 → R, f(x, y) = x2 + y2 − 1, the points on the set x2 + y2 = 1 which fy(x, y) = 2y vanishes are the points (1, 0) and (−1, 0). In fact, we have seen that we cannot solve y as functions of x in neighbourhoods of these two points. Chapter 5. The Inverse and Implicit Function Theorems 311 Proof of Dini’s Theorem Without loss of generality, assume that fy(x0, y0) > 0. Let u0 = (x0, y0). Since fy : O → R is continuous, there is an r1 > 0 such that the closed rectangle R = [x0 − r1, x0 + r1] × [y0 − r1, y0 + r1] lies in O, and for all (x, y) ∈ R, fy(x, y) > fy(x0, y0)/2 > 0. For any x ∈ [x0− r1, x0+ r1], the function hx : [y0 − r1, y0 + r1] → R has derivative h′x(y) = fy(x, y) that is positive. Hence, hx(y) = g(x, y) is strictly increasing in y. This implies that f(x, y0 − r1) < f(x, y0) < f(x, y0 + r1). When x = x0, we find that f(x0, y0 − r1) < 0 < f(x0, y0 + r1). Since f is continuously differentiable, it is continuous. Hence, there is an r2 > 0 such that r2 ≤ r1, and for all x ∈ [x0 − r2, x0 + r2], f(x, y0 − r1) < 0 and f(x, y0 + r1) > 0. Let I = (x0 − r2, x0 + r2). For x ∈ I , since hx : [y0 − r1, y0 + r1] → R is continuous, and hx(y0 − r1) < 0 < hx(y0 + r1), intermediate value theorem implies that there is a y ∈ (y0 − r1, y0 + r1) such that hx(y) = 0. Since hx is strictly increasing, this y is unique, and we denote it by g(x). This defines the function g : I → R. Let J = (y0 − r1, y0 + r1). By our argument, for each x ∈ I , y = g(x) is a unique y ∈ J such that f(x, y) = 0. Thus, for any (x, y) ∈ I × J , f(x, y) = 0 if and only if y = g(x). It remains to prove that g : I → R is continuosly differentiable. By our convention above, there is a positive constant c such that ∂f ∂y (x, y) ≥ c for all (x, y) ∈ I × J. Chapter 5. The Inverse and Implicit Function Theorems 312 Fixed x ∈ I . There exists an r > 0 such that (x − r, x + r) ⊂ I . For h satisfying 0 < |h| < r, x + h is in I . By mean value theorem, there is a ch ∈ (0, 1) such that f(x+ h, g(x+ h))− f(x, g(x)) = h ∂f ∂x (uh) + (g(x+ h)− g(x)) ∂f ∂y (uh), where uh = (x, g(x)) + ch(h, g(x+ h)− g(x)). (5.4) Since f(x+ h, g(x+ h)) = 0 = f(x, g(x)), we find that g(x+ h)− g(x) h = −fx(uh) fy(uh) . (5.5) Since fx is continuous on the compact set R, it is bounded. Namely, there exists a constant M such that |fx(x, y)| ≤M for all (x, y) ∈ R. Eq. (5.5) then implies that |g(x+ h)− g(x)| ≤ M c |h|. Taking h→ 0 proves that g is continuous at x. From (5.4), we find that lim h→0 uh = (x, g(x)). Since fx and fy are continuous at (x, g(x)), eq. (5.5) gives lim h→0 g(x+ h)− g(x) h = − lim h→0 fx(uh) fy(uh) = −fx(x, g(x)) fy(x, g(x)) . This proves that g is differentiable at x and ∂f ∂x (x, g(x)) + ∂f ∂y (x, g(x))g′(x) = 0. Chapter 5. The Inverse and Implicit Function Theorems 313 Figure 5.5: Proof of Dini’s Theorem.Example 5.8 Consider the equation xy3 + sin(x+ y) + 4x2y = 3. Show that in a neighbourhood of (−1, 1), this equation defines y as a function of x. If this function is denoted as y = g(x), find g′(−1). Solution Let f : R2 → R be the function defined as f(x, y) = xy3 + sin(x+ y) + 4x2y − 3. Since sine function and polynomial functions are infinitely differentiable, f is infinitely differentiable. ∂f ∂y (x, y) = 3xy2 + cos(x+ y) + 4x2, ∂f ∂y (−1, 1) = 2 ̸= 0. By Dini’s theorem, there is a neighbourhood of (−1, 1) such that y can be solved as a function of x. Now, ∂f ∂x (x, y) = y3 + cos(x+ y) + 8xy, ∂f ∂x (−1, 1) = −6. Hence, g′(0) = −−6 2 = 3. Chapter 5. The Inverse and Implicit Function Theorems 314 Now we turn to the general case. First we consider polynomial mappings of degree at most one. Let A = [aij] be an m × n matrix, and let B = [bij] be an m×m matrix. Given x ∈ Rn, y ∈ Rm, c ∈ Rm, the system of equations Ax+By = c is the following m equations in m+ n variables x1, . . . , xn, y1, . . . , ym. a11x1 + a12x2 + · · ·+ a1nxn + b11y1 + b12y2 + · · ·+ b1mym = c1, a21x1 + a22x2 + · · ·+ a2nxn + b21y1 + b22y2 + · · ·+ b2mym = c2, ... am1x1 + am2x2 + · · ·+ amnxn + bm1y1 + bm2y2 + · · ·+ bmmym = cm. Let us look at an example. Example 5.9 Consider the linear system 2x1 + 3x2 − 5x3 + 2y1 − y2 = 1 3x1 − x2 + 2x3 − 3y1 + y2 = 0 Show that y = (y1, y2) can be solved as a function of x = (x1, x2, x3). Write down the function G : R3 → R2 such that the solution is given by y = G(x), and find DG(x). Solution Let A = [ 2 3 −5 3 −1 2 ] , B = [ 2 −1 −3 1 ] . Then the system can be written as Ax+By = c, where c = [ 1 0 ] . This implies that By = c− Ax. (5.6) Chapter 5. The Inverse and Implicit Function Theorems 315 For every x ∈ R3, c − Ax is a vector in R2. Since detB = −1 ̸= 0, B is invertible. Therefore, there is a unique y satisfying (5.6). It is given by G(x) = y = B−1 (c− Ax) = − [ 1 1 3 2 ][ 1 0 ] + [ 1 1 3 2 ][ 2 3 −5 3 −1 2 ] x = − [ 1 3 ] + [ 5 2 −3 12 7 −11 ] x = [ 5x1 + 2x2 − 3x3 − 1 12x1 + 7x2 − 11x3 − 3 ] . It follows that DG = [ 5 2 −3 12 7 −11 ] . The following theorem gives a general scenario. Theorem 5.10 Let A = [aij] be an m × n matrix, and let B = [bij] be an m ×m matrix. Define the function F : Rm+n → Rm by F(x,y) = Ax+By − c, where c is a constant vector in Rm. The equation F(x,y) = 0 defines the variable y = (y1, . . . , ym) as a function of x = (x1, . . . , xn) if and only if the matrix B is invertible. If we denote this function as G : Rn → Rm, then G(x) = B−1 (c− Ax) , and DG(x) = −B−1A. Chapter 5. The Inverse and Implicit Function Theorems 316 Proof The equation F(x,y) = 0 defines the variables y as a function of x if and only for for each x ∈ Rn, there is a unique y ∈ Rm satisfying By = c− Ax. This is a linear system for the variable y. By the theory of linear algebra, a unique solution y exists if and only if B is invertible. In this case, the solution is given by y = B−1 (c− Ax) . The rest of the assertion follows. Write a point in Rm+n as (x,y), where x ∈ Rn and y ∈ Rm. If F : Rm+n → Rm is a function that is differentiable at the point (x,y), them×(m+n) derivative matrix DF(x,y) can be written as DF(x,y) = [ DxF(x,y) DyF(x,y) ] , where DxF(x,y) = ∂F1 ∂x1 (x,y) ∂F1 ∂x2 (x,y) · · · ∂F1 ∂xn (x,y) ∂F2 ∂x1 (x,y) ∂F2 ∂x2 (x,y) · · · ∂F2 ∂xn (x,y) ... ... . . . ... ∂Fm ∂x1 (x,y) ∂Fm ∂x2 (x,y) · · · ∂Fm ∂xn (x,y) , DyF(x,y) = ∂F1 ∂y1 (x,y) ∂F1 ∂y2 (x,y) · · · ∂F1 ∂ym (x,y) ∂F2 ∂y1 (x,y) ∂F2 ∂y2 (x,y) · · · ∂F2 ∂ym (x,y) ... ... . . . ... ∂Fm ∂y1 (x,y) ∂Fm ∂y2 (x,y) · · · ∂Fm ∂ym (x,y) . Chapter 5. The Inverse and Implicit Function Theorems 317 Notice that DyF(x,y) is a square matrix. When A = [aij] is an m × n matrix, B = [bij] is an m × m matrix, c is a vector in Rm, and F : Rm+n → Rm is the function defined as F(x,y) = Ax+By − c, it is easy to compute that DxF(x,y) = A, DyF(x,y) = B. Theorem 5.10 says that we can solve y as a function of x from the system of m equations F(x,y) = 0 if and only if B = DyF(x,y) is invertible. In this case, if G : Rn → Rm is the function so that y = G(x) is the solution, then DG(x) = −B−1A = −DyF(x,y) −1DxF(x,y). In fact, this latter follows from F(x,G(x)) = 0 and the chain rule. The special case of degree one polynomial mappings gives us sufficient insight into the general implicit function theorem. However, for nonlinear mappings, the conclusions can only be made locally. Theorem 5.11 Implicit Function Theorem Let O be an open subset of Rm+n, and let F : O → Rm be a continuously differentiable function defined on O. Assume that x0 is a point in Rn and y0 is a point in Rm such that the point (x0,y0) is in O and F(x0,y0) = 0. If detDyF(x0,y0) ̸= 0, then we have the followings. (i) There is a neighbourhood U of x0, a neighbourhood V of y0, and a continuously differentiable function G : U → Rm such that for any (x,y) ∈ U × V , F(x,y) = 0 if and only if y = G(x). (ii) For any x ∈ U , DxF(x,G(x)) +DyF(x,G(x))DG(x) = 0. Chapter 5. The Inverse and Implicit Function Theorems 318 Here we will give a proof of the implicit function theorem using the inverse function theorem. The idea of the proof is to construct a mapping which one can apply the inverse function theorem. Let us look at an example first. Example 5.10 Let F : R5 → R2 be the function defined as F(x1, x3, x3, y1, y2) = (x1y 2 2, x2x3y 2 1 + x1y2). Define the mapping H : R5 → R5 as H(x,y) = (x,F(x,y)) = (x1, x2, x3, x1y 2 2, x2x3y 2 1 + x1y2). Then we find that DH(x,y) = 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 y22 0 0 0 2x1y2 y2 x3y 2 1 x2y 2 1 2x2x3y1 x1 . Notice that DH(x,y) = [ I3 0 DxF(x,y) DyF(x,y) ] . Proof of the Implicit Function Theorem Let H : O → Rm+n be the mapping defined as H(x,y) = (x,F(x,y)) . Notice that F(x,y) = 0 if and only if H(x,y) = (x,0). Since the first n components of H are infinitely differentiable functions, the mapping H : O → Rm+n is continuously differentiable. Chapter 5. The Inverse and Implicit Function Theorems 319 Now, DH(x,y) = [ In 0 DxF(x,y) DyF(x,y) ] . Therefore, detDH(x0,y0) = detDyF(x0,y0) ̸= 0. By the inverse function theorem, there is a neighbourhood W of (x0,y0) and a neighbourhood Z of H(x0,y0) = (x0,0) such that H : W → Z is a bijection and H−1 : Z → W is continuously differentiable. For u ∈ Rn, v ∈ Rm so that (u,v) ∈ Z, let H−1(u,v) = (Φ(u,v),Ψ(u,v)), where Φ is a map from Z to Rn and Ψ is a map from Z to Rm. Since H−1 is continuously differentiable, Φ and Ψ are continuously differentiable. Given r > 0, letDr be the open cubeDr = m+n∏ i=1 (−r, r). SinceW and Z are open sets that contain (x0,y0) and (x0,0) respectively, there exists r > 0 such that (x0,y0) +Dr ⊂ W, (x0,0) +Dr ⊂ Z. If Ar = n∏ i=1 (−r, r), Br = m∏ i=1 (−r, r), U = x0 + Ar, V = y0 +Br, then (x0,y0) +Dr = U × V, (x0,0) +Dr = U ×Br. Hence, U × V ⊂ W and U ×Br ⊂ Z. Define G : U → Rm by G(x) = Ψ(x,0). Since Ψ is continuously differentiable, G is continuously differentiable. If x ∈ U , y ∈ V , then (x,y) ∈ W . For such (x,y), F(x,y) = 0 implies H(x,y) = (x,0). Since H : W → Z is a bijection, (x,0) ∈ Z and H−1(x,0) = (x,y). Comparing the last m components give y = Ψ(x,0) = G(x). Chapter 5. The Inverse and Implicit Function Theorems 320 Conversely, since H(H−1(u,v)) = (u,v) for all (u,v) ∈ Z, we find that (Φ(u,v),F(Φ(u,v),Ψ(u,v))) = (u,v) for all (u,v) ∈ Z. For all u ∈ U , (u,0) is in Z. Therefore, Φ(u,0) = u, F(Φ(u,0),Ψ(u,0)) = 0. This implies that if x ∈ U , then F(u,G(u)) = 0. In other words, if (x,y) is in U × V and y = G(x), we must have F(x,y) = 0. Since we have shown that G : U → Rm is continuously differentiable, the formula DxF(x,G(x)) +DyF(x,G(x))DG(x) = 0 follows from F(x,G(x)) = 0 and the chain rule. Example 5.11 Consider the system of equations 2x2y + 3xy2u+ xyv + uv = 7 4xu− 5yv + u2y + v2x = 1 (5.7)Notice that when (x, y) = (1, 1), (u, v) = (1, 1) is a solution of this system. Show that there are neighbourhoods U and V of (1, 1), and a continuously differentiable function G : U → R2 such that if (x, y, u, v) ∈ U × V , then (x, y, u, v) is a solution of the system of equations above if and only if u = G1(x, y) and v = G2(x, y). Also, find the values of ∂G1 ∂x (1, 1), ∂G1 ∂y (1, 1), ∂G2 ∂x (1, 1) and ∂G2 ∂y (1, 1). Solution Define the function F : R4 → R2 by F(x, y, u, v) = (2x2y+3xy2u+xyv+uv−7, 4xu−5yv+u2y+v2x−1). Chapter 5. The Inverse and Implicit Function Theorems 321 This is a polynomial mapping. Hence, it is continuously differentiable. It is easy to check that F(1, 1, 1, 1) = 0. Now, D(u,v)F(x, y, u, v) = [ 3xy2 + v xy + u 4x+ 2uy −5y + 2vx ] . Thus, detD(u,v)F(1, 1, 1, 1) = [ 4 2 6 −3 ] = −24 ̸= 0. By implicit function theorem, there are neighbourhoods U and V of (1, 1), and a continuously differentiable function G : U → R2 such that, if (x, y, u, v) ∈ U×V , then (x, y, u, v) is a solution of the system of equations (5.7) if and only if u = G1(x, y) and v = G2(x, y). Finally, D(x,y)F(x, y, u, v) = [ 4xy + 3y2u+ yv 2x2 + 6xyu+ xv 4u+ v2 −5v + u2 ] , D(x,y)F(1, 1, 1, 1) = [ 8 9 5 −4 ] . Chain rule gives DG(1, 1) = −D(u,v)F(1, 1, 1, 1) −1D(x,y)F(1, 1, 1, 1) = 1 24 [ −3 −2 −6 4 ][ 8 9 5 −4 ] = 1 24 [ −34 −19 −28 −70 ] . Therefore, ∂G1 ∂x (1, 1) = −17 12 , ∂G1 ∂y (1, 1) = −19 24 , ∂G2 ∂x (1, 1) = −7 6 , ∂G2 ∂y (1, 1) = −35 12 . Chapter 5. The Inverse and Implicit Function Theorems 322 Remark 5.3 The Rank of a Matrix In the formulation of the implicit function theorem, the assumption that detDyF(x0,y0) ̸= 0 can be replaced by the assumption that there are m variables u1, . . . , um among the n+m variables x1, . . . , xn, y1, . . . , ym such that detD(u1,...,um)F(x0,y0) ̸= 0. Recall that the rank r of an m × k matrix A is the dimension of its row space or the dimension of its column space. Thus, the rank r of a m × k matrixA is the maximum number of column vectors ofAwhich are linearly independent, or the maximum number of row vectors of A that are linearly independent. Hence, the maximum possible value of r is max{m, k}. If r = max{m, k}, we say that the matrix A has maximal rank. For a m× k matrix where m ≤ k, it has maximal rank if r = m. In this case, there is a m×m submatrix of A consists of m linearly independent vectors in Rm. The determinant of this submatrix is nonzero. Thus, the condition detDyF(x0,y0) ̸= 0 in the formulation of the implicit function theorem can be replaced by the condition that the m × (m + n) matrix DF(x0,y0) has maximal rank. Example 5.12 Consider the system 2x2y + 3xy2u+ xyv + uv = 7 4xu− 5yv + u2y + v2x = 1 (5.8) defined in Example 5.11. Show that there are neighbourhoods U and V of (1, 1), and a continuously differentiable function H : V → R2 such that if (x, y, u, v) ∈ U×V , then (x, y, u, v) is a solution of the system of equations if and only if x = H1(u, v) and y = H2(u, v). Find DH(1, 1). Chapter 5. The Inverse and Implicit Function Theorems 323 Solution Define the function F : R4 → R2 as in the solution of Example 5.11. Since detD(x,y)F(1, 1, 1, 1) = [ 8 9 5 −4 ] = −77 ̸= 0, the implicit function theorem implies there are neighbourhoods U and V of (1, 1), and a continuously differentiable function H : V → R2 such that if (x, y, u, v) ∈ U×V , then (x, y, u, v) is a solution of the system of equations (5.8) if and only if x = H1(u, v) and y = H2(u, v). Moreover, DH(1, 1) = −D(x,y)F(1, 1, 1, 1) −1D(u,v)F(1, 1, 1, 1) = 1 77 [ −4 −9 −5 8 ][ 4 2 6 −3 ] = 1 77 [ −70 19 28 −34 ] . Remark 5.4 The function G : U → R2 in Example 5.11 and the function H : V → R2 in Example 5.12 are in fact inverses of each other. Notice that DG(1, 1) is invertible. By the inverse function theorem, there is a neighbourhood U ′ of (1, 1) such that V ′ = G(U) is open, and G : U ′ → V ′ is a bijection with continuously differentiable inverse. By shrinking down the sets U and V , we can assume that U = U ′, and V = V ′. If (x, y) ∈ U and (u, v) ∈ V , F(x, y, u, v) = 0 if and only if (u, v) = G(x, y), if and only if (x, y) = H(u, v). This implies that G : U → V and H : V → U are inverses of each other. At the end of this section, let us consider a geometric application of the implicit function theorem. First let us revisit the example where f(x, y) = x2+ y2− 1. At each point (x0, y0) such that f(x0, y0) = 0, x20 + y20 = 1. Hence, ∇f(x0, y0) = (2x0, 2y0) ̸= 0. Notice that the vector ∇f(x0, y0) = Chapter 5. The Inverse and Implicit Function Theorems 324 (2x0, 2y0) is normal to the circle x2 + y2 = 1 at the point (x0, y0). Figure 5.6: The tangent vector and normal vector at a point on the circle x2+y2− 1 = 0. If y0 > 0, let U = (−1, 1) × (0,∞). Restricted to U , the points where f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(x) = √ 1− x2. If y0 < 0, let U = (−1, 1) × (−∞, 0). Restricted to U , the points where f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(x) = − √ 1− x2. If y0 = 0, then x0 = 1 or −1. In fact, we can consider more generally the cases where x0 > 0 and x0 < 0. If x0 > 0, let U = (0,∞) × (−1, 1). Restricted to U , the points where f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(y) = √ 1− y2. If x0 < 0, let U = (−∞, 0) × (−1, 1). Restricted to U , the points where f(x, y) = 0 is the graph of the function g : (−1, 1) → R, g(y) = − √ 1− y2. Definition 5.4 Surfaces Let S be a subset of Rk for some positive integer k. We say that S is a n-dimensional surface if for each x0 on S, there is an open subset D of Rn, an open neighbourhood U of x0 in Rk, and a one-to-one differentiable mapping G : D → Rk such that G(D) ⊂ S, G(D) ∩ U = S ∩ U , and DG(u) has rank n at each u ∈ D. Chapter 5. The Inverse and Implicit Function Theorems 325 Example 5.13 We claim that the n-sphere Sn = {(x1, . . . , xn, xn+1) |x21 + · · ·+ x2n + x2n+1 = 1} is an n-dimensional surface. Let (a1, . . . , an, an+1) be a point on Sn. Then at least one of the components a1, . . . , an, an+1 is nonzero. Without loss of generality, assume that an+1 > 0. Let D = { (x1, . . . , xn) |x21 + · · ·+ x2n < 1 } , U = {(x1, . . . , xn, xn+1) |xn+1 > 0} , and define the mapping G : D → U by G(x1, . . . , xn) = ( x1, . . . , xn, √ 1− x21 − · · · − x2n ) . Then G is a differentiable mapping, G(D) ⊂ Sn and G(D)∩U = Sn∩U . Now, DG(x1, . . . , xn) = [ In v ] , where v = ∇Gn+1(x1, . . . , xn). Since the first n-rows of DG(x1, . . . , xn) is the n × n identity matrix, it has rank n. Thus, Sn is an n-dimensional surface. Generalizing Example 5.13, we find that a large class of surfaces is provided by graphs of differentiable functions. Theorem 5.12 Let D be an open subset of Rn, and let g : D → R be a differentiable mapping. Then the graph of g given by Gg = {(x1, . . . , xn, xn+1) | (x1, . . . , xn) ∈ D, xn+1 = g(x1, . . . , xn)} , is an n-dimensional surface. A hyperplane in Rn+1 is the set of points in Rn+1 which satisfies an equation Chapter 5. The Inverse and Implicit Function Theorems 326 of the form a1x1 + · · ·+ anxn + an+1xn+1 = b, where a = (a1, . . . , an, an+1) is a nonzero vector in Rn+1. By definition, if u and v are two points on the plane, then ⟨a,u− v⟩ = 0. This shows that a is a vector normal to the plane. When D is an open subset of Rn, and g : D → R is a differentiable mapping, the graph Gg of g is an n-dimensional surface. If u = (u1, . . . , un) is a point on D, (u, g(u)) is a point on Gg, we have seen that the equation of the tangent plane at the point (u, g(u)) is given by xn+1 = f(u) + n∑ i=1 ∂g ∂xi (u, g(u))(xi − ui). Implicit function theorem gives the following. Theorem 5.13 Let O be an open subset of Rn+1, and let f : O → R be a continuously differentiable function. If x0 is a point in O such that f(x0) = 0 and ∇f(x0) ̸= 0, then there is neighbourhood U of x0 contained in O such that restricted to U , f(x) = 0 isthe graph of a continuously differentiable function g : D → R, and ∇f(x) is a vector normal to the tangent plane of the graph at the point x. Proof Assume that x0 = (a1, . . . , an, an+1). Since ∇f(x0) ̸= 0, there is a 1 ≤ k ≤ n + 1 such that ∂f ∂xk (x0) ̸= 0. Without loss of generality, assume that k = n+ 1. Chapter 5. The Inverse and Implicit Function Theorems 327 Given a point x = (x1, . . . , xn, xn+1) in Rn+1, let u = (x1, . . . , xn) so that x = (u, xn+1). By the implicit function theorem, there is a neighbourhood D of u0 = (a1, . . . , an), an r > 0, and a continuously differentiable function g : D → R such that if U = D × (an+1 − r, an+1 + r), (u, un+1) ∈ U , then f(u, un+1) = 0 if and only if un+1 = g(u). In other words, in the neighbourhood U of x0 = (u0, an+1), f(u, un+1) = 0 if and only if (u, un+1) is a point on the graph of the function g. The equation of the tangent plane at the point (u, un+1) is xn+1 − un+1 = n∑ i=1 ∂g ∂xi (u)(xi − ui). By chain rule, ∂g ∂xi (u) = − ∂f ∂xi (u, un+1) ∂f ∂xn+1 (u, un+1) . Hence, the equation of the tangent plane can be rewritten as n+1∑ i=1 (xi − ui) ∂f ∂xi (u, un+1) = 0. This shows that ∇f(u, un+1) is a vector normal to the tangent plane. Example 5.14 Find the equation of the tangent plane to the surface x2 + 4y2 + 9z2 = 36 at the point (6, 1,−1). Solution Let f(x, y, z) = x2 + 4y2 + 9z2. Then ∇f(x, y, z) = (2x, 8y, 18z). It follows that ∇f(6, 1,−1) = 2(6, 4,−9). Hence, the equation of the tangent plane to the surface at (6, 1,−1) is 6x+ 4y − 9z = 36 + 4 + 9 = 49. Chapter 5. The Inverse and Implicit Function Theorems 328 Exercises 5.3 Question 1 Consider the equation 4yz2 + 3xz3 − 11xyz = 14. Show that in a neighbourhood of (−1, 1, 2), this equation defines z as a function of (x, y). If this function is denoted as z = g(x, y), find ∇g(−1, 1). Question 2 Consider the system of equations 2xu2 + vyz + 3uv = 2 5x+ 7yzu− v2 = 1 (a) Show that when (x, y, z) = (−1, 1, 1), (u, v) = (1, 1) is a solution of this system. (b) Show that there are neighbourhoods U and V of (−1, 1, 1) and (1, 1), and a continuously differentiable function G : U → R2 such that, if (x, y, z, u, v) ∈ U ×V , then (x, y, z, u, v) is a solution of the system of equations above if and only if u = G1(x, y, z) and v = G2(x, y, z). (c) Find the values of ∂G1 ∂x (−1, 1, 1), ∂G2 ∂x (−1, 1, 1) and ∂G2 ∂z (−1, 1, 1). Question 3 Let O be an open subset of R2n, and let F : O → Rn be a continuously differentiable function. Assume that x0 and y0 are points in Rn such that (x0,y0) is a point in O, F(x0,y0) = 0, and DxF(x0,y0) and DyF(x0,y0) are invertible. Show that there exist neighbourhoods U and V of x0 and y0, and a continuously differentiable bijective function G : U → V such that, if (x,y) is in U × V , F(x,y) = 0 if and only if y = G(x). Chapter 5. The Inverse and Implicit Function Theorems 329 5.4 Extrema Problems and the Method of Lagrange Multipliers Optimization problems are very important in our daily life and in mathematical sciences. Given a function f : D → R, we would like to know whether it has a maximum value or a minimum value. In Chapter 3, we have dicusssed the extreme value theorem, which asserts that a continuous function that is defined on a compact set must have maximum and minimum values. In Chapter 4, we showed that if a function f : D → R has (local) extremum at an interior point x0 of its domain D and it is differentiable at x0, then x0 must be a stationary point. Namely, ∇f(x0) = 0. Combining these various results, we can formulate a strategy for solving a special type of optimization problems. Let us first consider the following example. Example 5.15 Let K = { (x, y) |x2 + 4y2 ≤ 100 } , and let f : K → R be the function defined as f(x, y) = x2 + y2. Find the maximum and minimum values of f : K → R, and the points where these values appear. Solution Let g : R2 → R be the function defined as g(x, y) = x2 + 4y2 − 100. It is a polynomial function. Hence, it is continuous. Since K = g−1((−∞, 0]) and (−∞, 0] is closed in R, K is a closed set. By a previous exercise, O = intK = { (x, y) |x2 + 4y2 < 100 } and C = bdK = { (x, y) |x2 + 4y2 = 100 } . Chapter 5. The Inverse and Implicit Function Theorems 330 For any (x, y) ∈ K, ∥(x, y)∥2 = x2 + y2 ≤ x2 + 4y2 ≤ 100. Therefore, K is bounded. Since K is closed and bounded, and the function f : K → R, f(x) = x2 + y2 is continuous, extreme value theorem says that f has maximum and minimum values. These values appear either in O or on C. Since f : O → R is differentiable, if (x0, y0) is an extremizer of f : O → R, we must have ∇f(x0, y0) = (0, 0), which gives (x0, y0) = (0, 0). The other candidates of extremizers are on C. Therefore, we need to find the maximum and minimum values of f(x, y) = x2 + y2 subject to the constraint x2 + 4y2 = 100. From x2 + 4y2 = 100, we find that x2 = 100 − 4y2, and y can only take values in the interval [−5, 5]. Hence, we want to find the maximum and minimum values of h : [−5, 5] → R, h(y) = 100− 4y2 + y2 = 100− 3y2. When y = 0, h has maximum value 100, and when y = ±5, it has minimum value 100 − 3 × 25 = 25. Notice that when y = 0, x = ±10; while when y = ±5, x = 0. Hence, we have five candidates for the extremizers of f . Namely, u1 = (0, 0), u2 = (10, 0), u3 = (−10, 0), u4 = (0, 5) and u5 = (0,−5). The function values at these 5 points are f(u1) = 0, f(u2) = f(u3) = 100, f(u4) = f(u5) = 25. Therefore, the minimum value of f : K → R is 0, and the maximum value is 100. The minimum value appears at the point (0, 0) ∈ intK, while the maximum value appears at (±10, 0) ∈ bdK. Example 5.15 gives a typical scenario of the optimization problems that we want to study in this section. Chapter 5. The Inverse and Implicit Function Theorems 331 Figure 5.7: The extreme values of f(x, y) = x2 + y2 on the sets K = {(x, y) |x2 + 4y2 ≤ 100} and C = {(x, y) |x2 + 4y2 = 100}. Optimization Problem Let K be a compact subset of Rn with interior O, and let f : K → R be a function continuous on K, differentiable on O. We want to find the maximum and minimum values of f : K → R. (i) By the extreme value theorem, f : K → R has maximum and minimum values. (ii) Since K is closed, K is a disjoint union of its interior O and its boundary C. Since C is a subset of K, it is bounded. On the other hand, being the boundary of a set, C is closed. Therefore, C is compact. (iii) The extreme values of f can appear in O or on C. (iv) If x0 is an extremizer of f : K → R and it is in O, we must have ∇f(x0) = 0. Namely, x0 is a stationary point of f : O → R. (v) If x0 is an extremizer of f : K → R and it is not in O, it is an extremizer of f : C → R. (vi) Since C is compact, f : C → R has maximum and minimum values. Chapter 5. The Inverse and Implicit Function Theorems 332 Therefore, the steps to find the maximum and minimum values of f : K → R are as follows. Step 1 Find the stationary points of f : O → R. Step 2 Find the extremizers of f : C → R. Step 3 Compare the values of f at the stationary points of f : O → R and the extremizers of f : C → R to determine the extreme values of f : K → R. Of particular interest is when the boundary ofK can be expressed as g(x) = 0, where g : D → R is a continuously differentiable function defined on an open subset D of Rn. If f is also defined and differentiable on D, the problem of finding the extreme values of f : C → R becomes finding the extreme values of f : D → R subject to the constraint g(x) = 0. In Example 5.15, we have used g(x) = 0 to solve one of the variables in terms of the others and substitute into f to transform the optimization problem to a problem with fewer variables. However, this strategy can be quite complicated because it is often not possible to solve one variable in terms of the others explicitly from the constraint g(x) = 0. The method of Lagrange multipliers provides a way to solve constraint optimization problems without having to explicitly solve some variables in terms ofthe others. The validity of this method is justified by the implicit function theorem. Theorem 5.14 The Method of Lagrange Multiplier (One Constraint) Let O be an open subset of Rn+1 and let f : O → R and g : O → R be continuously differentiable functions defined on O. Consider the subset of O defined as C = {x ∈ O | g(x) = 0} . If x0 is an extremizer of the function f : C → R and ∇g(x0) ̸= 0, then there is a constant λ, known as the Lagrange multiplier, such that ∇f(x0) = λ∇g(x0). Chapter 5. The Inverse and Implicit Function Theorems 333 Proof Without loss of generality, assume that x0 is a maximizer of f : C → R. Namely, f(x) ≤ f(x0) for all x ∈ C. (5.9) Given that ∇g(x0) ̸= 0, there exists a 1 ≤ k ≤ n + 1 such that ∂g ∂xk (x0) ̸= 0. Without loss of generality, assume that k = n + 1. Let x0 = (a1, . . . , an, an+1). Given a point x = (x1, . . . , xn, xn+1) in Rn+1, let u = (x1, . . . , xn) so that x = (u, xn+1). By implicit function theorem, there is a neighbourhood D of u0 = (a1, . . . , an), an r > 0, and a continuously differentiable function h : D → R such that for (u, xn+1) ∈ D × (an+1 − r, an+1 + r), g(u, xn+1) = 0 if and only if xn+1 = h(u). Consider the function F : D → R defined as F (u) = f(u, h(u)). By (5.9), we find that F (u0) ≥ F (u) for all u ∈ D. In other words, u0 is a maximizer of the function F : D → R. Since u0 is an interior point of D and F : D → R is continuously differentiable, ∇F (u0) = 0. Since F (u) = f(u, h(u)), we find that for 1 ≤ i ≤ n, ∂F ∂xi (u0) = ∂f ∂xi (u0, an+1) + ∂f ∂xn+1 (u0, an+1) ∂h ∂xi (u0) = 0. (5.10) On the other hand, applying chain rule to g(u, h(u)) = 0 and set u = u0, we find that ∂g ∂xi (u0, an+1) + ∂g ∂xn+1 (u0, an+1) ∂h ∂xi (u0) = 0 for 1 ≤ i ≤ n. (5.11) By assumption, ∂g ∂xn+1 (x0) ̸= 0. Let λ = ∂f ∂xn+1 (x0) ∂g ∂xn+1 (x0) . Chapter 5. The Inverse and Implicit Function Theorems 334 Then ∂f ∂xn+1 (x0) = λ ∂g ∂xn+1 (x0). (5.12) Eqs. (5.10) and (5.11) show that for 1 ≤ i ≤ n, ∂f ∂xi (x0) = −λ ∂g ∂xn+1 (x0) ∂h ∂xi (u0) = λ ∂g ∂xi (x0). (5.13) Eqs. (5.12) and (5.13) together imply that ∇f(x0) = λ∇g(x0). This completes the proof of the theorem. Remark 5.5 Theorem 5.14 says that if x0 is an extremizer of the constraint optimization problem max /min f(x) subject to g(x) = 0, then the gradient of f at x0 should be parallel to the gradient of g at x0 if the latter is nonzero. One can refer to Figure 5.7 for an illustration. Recall that the gradient of f gives the direction where f changes most rapidly, while the gradient of g here represents the normal vector to the curve g(x) = 0. Using the method of Lagrange multiplier, there are n + 2 variables x1, . . . , xn+1 and λ to be solved. The equation ∇f(x) = λ∇g(x) gives n + 1 equations, while the equation g(x) = 0 gives one. Therefore, we need to solve n+ 2 variables from n+ 2 equations. Example 5.16 Let us solve the constraint optimization problem that appears in Example 5.15 using the Lagrange multiplier method. Let f : R2 → R and g : R2 → R be respectively the functions f(x, y) = x2 + y2 and g(x, y) = x2+4y2− 100. They are both continuously differentiable. We want to find the maximum and minimum values of the function f(x, y) subject to the constraint g(x, y) = 0. Notice that ∇g(x, y) = (2x, 8y) is the zero vector if and only if (x, y) = (0, 0), but (0, 0) is not on the curve g(x, y) = 0. Hence, for any (x, y) satisfying g(x, y) = 0, ∇g(x, y) ̸= 0. Chapter 5. The Inverse and Implicit Function Theorems 335 By the method of Lagrange multiplier, we need to find (x, y) satisfying ∇f(x, y) = λ∇g(x, y) and g(x, y) = 0. Therefore, 2x = 2λx, 2y = 8λy. This gives x(1− λ) = 0, y(1− 4λ) = 0. The first equation says that either x = 0 or λ = 1. If x = 0, from x2 + 4y2 = 100, we must have y = ±5. If λ = 1, then y(1− 4λ) = 0 implies that y = 0. From x2 +4y2 = 100, we then obtain x = ±10. Hence, we find that the candidates for the extremizers are (±10, 0) and (0,±5). Since f(±10, 0) = 100 and f(0,±5) = 25, we conclude that subject to x2+4y2 = 100, the maximum value of f(x, y) = x2+ y2 is 100, and the minimum value of f(x, y) = x2 + y2 is 25. Example 5.17 Use the Lagrange multiplier method to find the maximum and minimum values of the function f(x, y, z) = 8x+ 24y + 27z on the set S = { (x, y, z) |x2 + 4y2 + 9z2 = 289 } , and the points where each of them appears. Solution Let g : R3 → R be the function g(x, y, z) = x2 + 4y2 + 9z2 − 289. The functions f : R3 → R, f(x, y, z) = 8x + 24y + 27z and g : R3 → R are both continuously differentiable. Chapter 5. The Inverse and Implicit Function Theorems 336 Notice that ∇g(x, y, z) = (2x, 8y, 18z) = 0 if and only if (x, y, z) = 0, and 0 does not lie on S. By Lagrange multiplier method, to find the maximum and minimum values of f : S → R, we need to solve the equations ∇f(x, y, z) = λ∇g(x, y, z) and g(x, y, z) = 0. These give 8 = 2λx, 24 = 8λy, 27 = 18λz x2 + 4y2 + 9z2 = 289. To satisfy the first three equations, none of the λ, x, y and z can be zero. We find that x = 4 λ , y = 3 λ , z = 3 2λ . Substitute into the last equation, we have 64 + 144 + 81 4λ2 = 289. This gives 4λ2 = 1. Hence, λ = ±1 2 . When λ = 1 2 , (x, y, z) = (8, 6, 3). When λ = −1 2 , (x, y, z) = (−8,−6,−3). These are the two candidates for the extremizers of f : S → R. Since f(8, 6, 3) = 289 and f(−8,−6,−3) = −289, we find that the maximum and minimum values of f : S → R are 289 and −289 respectively, and the maximum value appear at (8, 6, 3), the minimum value appear at (−8,−6,−3). Now we consider more general constraint optimization problems which can have more than one constraints. Chapter 5. The Inverse and Implicit Function Theorems 337 Theorem 5.15 The Method of Lagrange Multiplier (General) Let O be an open subset of Rm+n and let f : O → R and G : O → Rm be continuously differentiable functions defined on O. Consider the subset of O defined as C = {x ∈ O |G(x) = 0} . If x0 is an extremizer of the function f : C → R and the matrix DG(x0) has (maximal) rank m, then there are constants λ1, . . ., λm, known as the Lagrange multipliers, such that ∇f(x0) = m∑ i=1 λi∇Gi(x0). Proof Without loss of generality, assume that x0 is a maximizer of f : C → R. Namely, f(x) ≤ f(x0) for all x ∈ C. (5.14) Given that the matrix DG(x0) has rank m, m of the column vectors are linearly independent. Without loss of generality, assume that the column vectors in the last m columns are linearly independent. Write a point x in Rm+n as x = (u,v), where u = (u1, . . . , un) is in Rn and v = (v1, . . . , vm) is in Rm. By our assumption, DvG(u0,v0) is invertible. By implicit function theorem, there is a neighbourhood D of u0, a neighbourhood V of v0, and a continuously differentiable function H : D → Rm such that for (u,v) ∈ D × V , G(u,v) = 0 if and only if v = H(u). Consider the function F : D → R defined as F (u) = f(u,H(u)). By (5.14), we find that F (u0) ≥ F (u) for all u ∈ D. Chapter 5. The Inverse and Implicit Function Theorems 338 In other words, u0 is a maximizer of the function F : D → R. Since u0 is an interior point of D and F : D → R is continuously differentiable, ∇F (u0) = 0. Since F (u) = f(u,H(u)), we find that ∇F (u0) = Duf(u0,v0) +Dvf(u0,v0)DH(u0) = 0. (5.15) On the other hand, applying chain rule to G(u,H(u)) = 0 and set u = u0, we find that DuG(u0,v0) +DvG(u0,v0)DH(u0) = 0. (5.16) Take [ λ1 λ2 · · · λm ] = λ = Dvf(x0)DvG(x0) −1. Then Dvf(x0) = λDvG(x0). (5.17) Eqs. (5.15) and (5.16) show that Duf(x0) = −λDvG(x0)DH(u0) = λDuG(x0). (5.18) Eqs. (5.17) and (5.18) together imply that ∇f(x0) = λDG(x0) = m∑ i=1 λi∇Gi(x0). This completes the proof of the theorem. In the general constraint optimization problem proposed in Theorem 5.15, there are n + 2m variables u1, . . . , un, v1, . . . , vm and λ1, . . . , λm to be solved. The components of ∇f(x) = m∑ i=1 λi∇Gi(x) give n + m equations, while the components of G(x) = 0 give m equations. Hence, we have to solve n+ 2m variablesfrom n+ 2m equations. Let us look at an example. Chapter 5. The Inverse and Implicit Function Theorems 339 Example 5.18 Let K be the subset of R3 given by K = { (x, y, z) |x2 + y2 ≤ 4, x+ y + z = 1 } . Find the maximum and minimum values of the function f : K → R, f(x, y, z) = x+ 3y + z. Solution Notice that K is the intersection of the two closed sets K1 = {(x, y, z) |x2 + y2 ≤ 4} and K2 = {(x, y, z) |x+ y + z = 1}. Hence, K is a closed set. If (x, y, z) is in K, x2 + y2 ≤ 4. Thus, |x| ≤ 2, |y| ≤ 2 and hence |z| ≤ 1 + |x| + |y| ≤ 5. This shows that K is bounded. Since K is closed and bounded, f : K → R is continuous, f : K → R has maximum and minimum values. Let D = { (x, y, z) |x2 + y2 < 4, x+ y + z = 1 } , C = { (x, y, z) |x2 + y2 = 4, x+ y + z = 1 } . Then K = C ∪ D. We can consider the extremizers of f : D → R and f : C → R separately. To find the extremizers of f : D → R, we can regard this as a constraint optimization problem where we want to find the extreme values of f : O → R, f(x, y, z) = x+ 3y + z on O = { (x, y, z) |x2 + y2 < 4 } , subject to the constraint g(x, y, z) = 0, where g : O → R is the function g(x, y, z) = x+ y + z − 1. Now ∇g(x, y, z) = (1, 1, 1) ̸= 0. Hence, at an extremizer, we must have ∇f(x, y, z) = λg(x, y, z), which gives (1, 3, 1) = λ(1, 1, 1). Chapter 5. The Inverse and Implicit Function Theorems 340 This says that the two vectors (1, 3, 1) and (1, 1, 1) must be parallel, which is a contradiction. Hence, f : O → R does not have extremizers. Now, to find the extremizers of f : C → R, we can consider it as finding the extreme values of f : R3 → R, f(x, y, z) = x + 3y + z, subject to G(x, y, z) = 0, where G(x, y, z) = (x2 + y2 − 4, x+ y + z − 1). Now DG(x, y, z) = [ 2x 2y 0 1 1 1 ] . This matrix has rank less than 2 if and only if (2x, 2y, 0) is parallel to (1, 1, 1), which gives x = y = z = 0. But the point (x, y, z) = (0, 0, 0) is not on C. Therefore, DG(x, y, z) has maximal rank for every (x, y, z) ∈ C. Using the Lagrange multiplier method, to solve for the extremizer of f : C → R, we need to solve the system ∇f(x, y, z) = λ∇G1(x, y, z) + µG2(x, y, z), G(x, y, z) = 0. These gives 1 = 2λx+ µ, 3 = 2λy + µ, 1 = µ, x2 + y2 = 4, x+ y + z = 1. From µ = 1, we have 2λx = 0 and 2λy = 2. The latter implies that λ ̸= 0. Hence, we must have x = 0. Then x2 + y2 = 4 gives y = ±2. When (x, y) = (0, 2), z = −1. When (x, y) = (0,−2), z = 3. Hence, we only have two candidates for extremizers, which are (0, 2,−1) and (0,−2, 3). Since f(0, 2,−1) = 5, f(0,−2, 3) = −3, we find that f : K → R has maximum value 5 at the point (0, 2,−1), and minimum value −3 at the point (0,−2, 3). Chapter 5. The Inverse and Implicit Function Theorems 341 Exercises 5.4 Question 1 Find the extreme values of the function f(x, y, z) = 4x2 + y2 + yz + z2 on the set S = { (x, y, z) | 2x2 + y2 + z2 ≤ 8 } . Question 2 Find the point in the set S = { (x, y) | 4x2 + y2 ≤ 36, x2 + 4y2 ≥ 4 } that is closest to and farthest from the point (1, 0). Question 3 Use the Lagrange multiplier method to find the maximum and minimum values of the function f(x, y, z) = x+ 2y − z on the set S = { (x, y, z) |x2 + y2 + 4z2 ≤ 84 } , and the points where each of them appears. Question 4 Find the extreme values of the function f(x, y, z) = x on the set S = { (x, y, z) |x2 = y2 + z2, 7x+ 3y + 4z = 60 } . Question 5 Let K be the subset of R3 given by K = { (x, y, z) | 4x2 + z2 ≤ 68, y + z = 12 } . Find the maximum and minimum values of the function f : K → R, f(x, y, z) = x+ 2y. Chapter 5. The Inverse and Implicit Function Theorems 342 Question 6 Let A be an n×n symmetric matrix, and let QA : Rn → R be the quadratic formQA(x) = xTAx defined byA. Show that the minimum and maximum values of QA : Sn−1 → R on the unit sphere Sn−1 are the smallest and largest eigenvalues of A. Chapter 6. Multiple Integrals 343 Chapter 6 Multiple Integrals For a single variable functions, we have discussed the Riemann integrability of a function f : [a, b] → R defined on a compact interval [a, b]. In this chapter, we consider the theory of Riemann integrals for multivariable functions. For a function F : D → Rm that takes values in Rm with m ≥ 2, we define the integral componentwise. Namely, we say that the function F : D → Rm is Riemann integrable if and only if each of the component functions Fj : D → R, 1 ≤ j ≤ m is Riemann integrable, and we define∫ D F = (∫ D F1, ∫ D F2, . . . , ∫ D Fm ) . Thus, in this chapter, we will only discuss the theory of integration for functions f : D → R that take values in R. A direct generalization of a compact interval [a, b] to Rn is a product of compact intervals I = n∏ i=1 [ai, bi], which is a closed rectangle. In this chapter, when we say I is a rectangle, it means I can be written as n∏ i=1 [ai, bi] with ai < bi for all 1 ≤ i ≤ n. The edges of I = n∏ i=1 [ai, bi] are [a1, b1], [a2, b2], . . ., [an, bn]. We first discuss the integration theory of functions defined on closed rectangles of the form n∏ i=1 [ai, bi]. For applications, we need to consider functions defined on other subsets D of Rn. One of the most useful theoretical tools for evaluating single integrals is the fundamental theorem of calculus. To apply this tool for multiple integrals, we need to consider iterated integrals. Another useful tool is the change of variables formula. For multivariable functions, the change of variables theorem is much more complicated. Nevertheless, we will discuss these in this chapter. Chapter 6. Multiple Integrals 344 6.1 Riemann Integrals In this section, we define the Riemann integral of a function f : D → R defined on a subset D of Rn. We first consider the case where D = n∏ i−1 [ai, bi]. Let us first consider partitions. We say that P = {x0, x1, . . . , xk} is a partition of the interval [a, b] if a = x0 < x1 < · · · < xk−1 < xk = b. It divides [a, b] into k subintervals J1, . . . , Jk, where Ji = [xi−1, xi]. Definition 6.1 Partitions A partition P of a closed rectangle I = n∏ i=1 [ai, bi] is achieved by having a partition Pi of [ai, bi] for each 1 ≤ i ≤ n. We write P = (P1, P2, . . . , Pn) for such a partition. The partition P divides the rectangle I into a collection JP of rectangles, any two of which have disjoint interiors. A closed rectangle J in JP can be written as J = J1 × J2 × · · · × Jn, where Ji, 1 ≤ i ≤ n is a subinterval in the partition Pi. If the partition Pi divides [ai, bi] into ki subintervals, then the partition P = (P1, . . . , Pn) divides the rectangle I = n∏ i=1 [ai, bi] into |JP| = k1k2 · · · kn rectangles. Example 6.1 Consider the rectangle I = [−2, 9] × [1, 6]. Let P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. The partition P1 divides the interval I1 = [−2, 9] into the three subintervals [−2, 0], [0, 4] and [4, 9]. The partition P2 divides the interval I2 = [1, 6] into the two subintervals [1, 3] and [3, 6]. Therefore, the partition P = (P1, P2) divides the rectangle I into the following six rectangles. [−2, 0]× [1, 3], [0, 4]× [1, 3], [4, 9]× [1, 3], [−2, 0]× [3, 6], [0, 4]× [3, 6], [4, 9]× [3, 6]. Chapter 6. Multiple Integrals 345 Figure 6.1: A partition of the rectangle [−2, 9]× [1, 6] given in Example 6.1. Definition 6.2 Regular and Uniformly Regular Partitions Let I = n∏ i=1 [ai, bi] be a rectangle in Rn. We say that P = (P1, . . . , Pn) is a regular partition of I if for each 1 ≤ i ≤ n, Pi is a regular partition of [ai, bi] into ki intervals. We say that P is a uniformly regular partition of P into kn rectangles if for each 1 ≤ i ≤ n, Pi is a regular partition of [ai, bi] into k intervals. Example 6.2 Consider the rectangle I = [−2, 7]× [−4, 8]. (a) The partition P = (P1, P2) where P1 = {−2, 1, 4, 7} and P2 = {−4,−1, 2, 5, 8} is a regular partition of I. (b) The partition P = (P1, P2) where P1 = {−2, 1, 4, 7} and P2 = {−4, 0, 4, 8} is a uniformly regular partition of I into 32 = 9 rectangles. The length of an interval [a, b] is b − a. The area of a rectangle[a, b] × [c, d] is (b− a)× (d− c). In general, we define the volume of a closed rectangle of the form I = n∏ i=1 [ai, bi] in Rn as follows. Chapter 6. Multiple Integrals 346 Figure 6.2: A regular and a uniformly regular partition of [−2, 7] × [−4, 8] discussed in Example 6.2. Definition 6.3 Volume of a Rectangle The volume of the closed rectangle I = n∏ i=1 [ai, bi] is defined as the product of the lengths of all its edges. Namely, vol (I) = n∏ i=1 (bi − ai). Example 6.3 The volume of the rectangle I = [−2, 9]× [1, 6] is vol (I) = 11× 5 = 55. When P = {x0, x1, . . . , xk} is a partition of [a, b], it divides [a, b] into k subintervals J1, . . . , Jk, where Ji = [xi−1, xi]. Notice that k∑ i=1 vol (Ji) = k∑ i=1 (xi − xi−1) = b− a. Assume that P = (P1, · · · , Pn) is a partition of the rectangle I = n∏ i=1 [ai, bi] in Rn. Then for 1 ≤ i ≤ n, Pi is a partition of [ai, bi]. If Pi divides [ai, bi] into the ki Chapter 6. Multiple Integrals 347 subintervals Ji,1, Ji,2, . . . , Ji,ki , then the collection of rectangles in the partition P is JP = {J1,m1 × · · · × Jn,mn | 1 ≤ mi ≤ ki for 1 ≤ i ≤ n} . Notice that vol (J1,m1 × · · · × Jn,mn) = vol (J1,m1)× · · · × vol (Jn,mn). From this, we obtain the sum of volumes formula: ∑ J∈JP vol (J) = kn∑ mn=1 · · · k1∑ m1=1 vol (J1,m1)× · · · × vol (Jn,mn) = [ k1∑ m1=1 vol (J1,m1) ] × · · · × [ kn∑ mn=1 vol (Jn,mn) ] = (b1 − a1)× · · · × (bn − an) = vol (I). Proposition 6.1 Let P be a partition of I = n∏ i=1 [ai, bi]. Then the sum of the volumes of the rectangles J in the partition P is equal to the volume of the rectangle I. One of the motivations to define the integral ∫ I f for a nonnegative function f : I → R is to find the volume bounded between the graph of f and the rectangle I in Rn+1. To find the volume, we partition I into small rectangles, pick a point ξJ in each of these rectangles J, and approximate the function on J as a constant given by the value f(ξJ). The volume between the rectangle J and the graph of f over J is then approximated by f(ξJ) vol (J). This leads us to the concept of Riemann sums. If P is a partition of I = n∏ i=1 [ai, bi], we say that A is a set of intermediate points for the partition P if A = {ξJ |J ∈ JP} is a subset of I indexed by JP, such that ξJ ∈ J for each J ∈ JP. Chapter 6. Multiple Integrals 348 Definition 6.4 Riemann Sums Let I = n∏ i=1 [ai, bi], and let f : I → R be a function defined on I. Given a partition P of I, a set A = {ξJ |J ∈ JP} of intermediate points for the partition P, the Riemann sum of f with respect to the partition P and the set of intermediate points A = {ξJ} is the sum R(f,P, A) = ∑ J∈JP f(ξJ)vol (J). Example 6.4 Let I = [−2, 9] × [1, 6], and let P = (P1, P2) be the partition of I with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Let f : I → R be the function defined as f(x, y) = x2 + y. Consider a set of intermediate points A as follows. J ξJ f(ξJ) vol (J) [−2, 0]× [1, 3] (−1, 1) 2 4 [−2, 0]× [3, 6] (0, 3) 3 6 [0, 4]× [1, 3] (1, 1) 2 8 [0, 4]× [3, 6] (2, 4) 8 12 [4, 9]× [1, 3] (4, 2) 18 10 [4, 9]× [3, 6] (9, 3) 84 15 The Riemann sum R(f,P, A) is equal to 2× 4 + 3× 6 + 2× 8 + 8× 12 + 18× 10 + 84× 15 = 1578. Example 6.5 If f : I → R is the constant function f(x) = c, then for any partition P of I and any set of intermediate points A = {ξJ}, R(f,P, A) = c vol (I). When c > 0, this is the volume of the rectangle I× [0, c] in Rn+1. Chapter 6. Multiple Integrals 349 As in the single variable case, Darboux sums provide bounds for Riemann sums. Definition 6.5 Darboux Sums Let I = n∏ i=1 [ai, bi] , and let f : I → R be a bounded function defined on I. Given a partition P of I, let JP be the collection of rectangles in the partition P. For each J in JP, let mJ = inf {f(x) |x ∈ J} and MJ = sup {f(x) |x ∈ J} . The Darboux lower sum L(f,P) and the Darboux upper sum U(f,P) are defined as L(f,P) = ∑ J∈JP mJ vol (J) and U(f,P) = ∑ J∈JP MJ vol (J). Example 6.6 If f : I → R is the constant function f(x) = c, then L(f,P) = c vol (I) = U(f,P) for any partition P of I. Example 6.7 Consider the function f : I → R, f(x, y) = x2 + y defined in Example 6.4, where I = [−2, 9] × [1, 6]. For the partition P = (P1, P2) with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}, we have the followings. J mJ MJ vol (J) [−2, 0]× [1, 3] 02 + 1 = 1 (−2)2 + 3 = 7 4 [−2, 0]× [3, 6] 02 + 3 = 3 (−2)2 + 6 = 10 6 [0, 4]× [1, 3] 02 + 1 = 1 42 + 3 = 19 8 [0, 4]× [3, 6] 02 + 3 = 3 42 + 6 = 22 12 [4, 9]× [1, 3] 42 + 1 = 17 92 + 3 = 84 10 [4, 9]× [3, 6] 42 + 3 = 19 92 + 6 = 87 15 Chapter 6. Multiple Integrals 350 Therefore, the Darboux lower sum is L(f,P) = 1× 4 + 3× 6 + 1× 8 + 3× 12 + 17× 10 + 19× 15 = 521; while the Darboux upper sum is U(f,P) = 7× 4+10× 6+19× 8+22× 12+84× 10+87× 15 = 2649. Notice that we can only define Darboux sums if the function f : I → R is bounded. This means that there are constants m and M such that m ≤ f(x) ≤M for all x ∈ I. If P is a partition of the rectangle I, and J is a rectangle in the partition P, ξJ is a point in J, then m ≤ mJ ≤ f(ξJ) ≤MJ ≤M. Multipluying throughout by vol (J) and summing over J ∈ JP, we obtain the following. Proposition 6.2 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. If m ≤ f(x) ≤M for all x ∈ I, then for any partition P of I, and for any choice of intermediate points A = {ξJ} for the partition P, we have m vol (I) ≤ L(f,P) ≤ R(f,P, A) ≤ U(f,P) ≤ M vol (I). To study the behaviour of the Darboux sums when we modify the partitions, we first extend the concept of refinement of a partition to rectangles in Rn. Recall that if P and P ∗ are partitions of the interval [a, b], P ∗ is a refinement of P if each partition point of P is also a partition point of P ∗. Chapter 6. Multiple Integrals 351 Definition 6.6 Refinement of a Partition Let I = n∏ i=1 [ai, bi], and let P = (P1, . . . , Pn) and P∗ = (P ∗ 1 , . . . , P ∗ n) be partitions of I. We say that P∗ is a refinement of P if for each 1 ≤ i ≤ n, P ∗ i is a refinement of Pi. Figure 6.3: A refinement of the partition of the rectangle [−2, 9] × [1, 6] given in Figure 6.1. Example 6.8 Let us consider the partition P = (P1, P2) of the rectangle I = [−2, 9] × [1, 6] given in Example 6.1, with P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Let P ∗ 1 = {−2, 0, 1, 4, 6, 9} and P ∗ 2 = {1, 3, 4, 6}. Then P∗ = (P ∗ 1 , P ∗ 2 ) is a refinement of P. If the partition P∗ is a refinement of the partition P, then for each J in JP, P∗ induces a partition of J, which we denote by P∗(J). Example 6.9 The partition P∗ in Example 6.8 induces the partition P∗(J) = (P ∗ 1 (J), P ∗ 2 (J)) of the rectangle J = [0, 4]× [3, 6], where P ∗ 1 (J) = {0, 1, 4} and P ∗ 2 (J) = {3, 4, 6}. The partition P∗(J) divides the rectangle J into 4 rectangles, as shown in Figure 6.3. Chapter 6. Multiple Integrals 352 If the partition P∗ is a refinement of the partition P, then the collection of rectangles in P∗ is the union of the collection of rectangles in P∗(J) when J ranges over the collection of rectangles in P. Namely, JP∗ = ⋃ J∈JP JP∗(J). Using this, we can deduce the following. Proposition 6.3 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. If P and P∗ are partitions of I and P∗ is a refinement of P, then L(f,P∗) = ∑ J∈JP L(f,P∗(J)), U(f,P∗) = ∑ J∈JP U(f,P∗(J)). From this, we can show that a refinement improves the Darboux sums, in the sense that a lower sum increases, and an upper sum decreases. Theorem 6.4 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. If P and P∗ are partitions of I and P∗ is a refinement of P, then L(f,P) ≤ L(f,P∗) ≤ U(f,P∗) ≤ U(f,P). Proof For each rectangle J in the partition P, mJ ≤ f(x) ≤MJ for all x ∈ J. Applying Proposition 6.2 to the function f : J → R and the partition P∗(J), we find that mJ vol (J) ≤ L(f,P∗(J)) ≤ U(f,P∗(J)) ≤MJ vol (J). Chapter 6. Multiple Integrals 353 Summing over J ∈ JP, we find that L(f,P)≤ ∑ J∈JP L(f,P∗(J)) ≤ ∑ J∈JP U(f,P∗(J)) ≤ U(f,P). The assertion follows from Proposition 6.3. It is difficult to visualize the Darboux sums with a multivariable functions. Hence, we illustrate refinements improve Darboux sums using single variable functions, as shown in Figure 6.4 and Figure 6.5. Figure 6.4: A refinement of the partition increases the Darboux lower sum. Figure 6.5: A refinement of the partition decreases the Darboux upper sum. As a consequence of Theorem 6.4, we can prove the following. Chapter 6. Multiple Integrals 354 Corollary 6.5 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. For any two partitions P1 and P2 of I, L(f,P1) ≤ U(f,P2). Proof Let P1 = (P1,1, P1,2, . . . , P1,n) and P2 = (P2,1, P2,2, . . . , P2,n). For 1 ≤ i ≤ n, let P ∗ i be the common refinement of P1,i and P2,i obtained by taking the union of the partition points in P1,i and P2,i. Then P∗ = (P ∗ 1 , . . . , P ∗ n) is a common refinement of the partitions P1 and P2. By Theorem 6.4, L(f,P1) ≤ L(f,P∗) ≤ U(f,P∗) ≤ U(f,P2). Now we define lower and upper integrals of a bounded function f : I → R . Definition 6.7 Lower Integrals and Upper Integrals Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. Let SL(f) be the set of Darboux lower sums of f , and let SU(f) be the set of Darboux upper sums of f . 1. The lower integral of f , denoted by ∫ I f , is defined as the least upper bound of the Darboux lower sums.∫ I f = supSL(f) = sup {L(f,P) |P is a partition of I} . 2. The upper integral of f , denoted by ∫ I f , is defined as the greatest lower bound of the Darboux upper sums.∫ I f = inf SU(f) = inf {U(f,P) |P is a partition of I} . Chapter 6. Multiple Integrals 355 Example 6.10 If f : I → R is the constant function f(x) = c, then for any partition P of I, L(f,P) = c vol (I) = U(f,P). Therefore, both SL(f) and SU(f) are the one-element set {c vol (I)}. This shows that ∫ I f = ∫ I f = c vol (I). For a constant function, the lower integral and the upper integral are the same. For a general bounded funtion, we have the following. Theorem 6.6 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. Then we have ∫ I f ≤ ∫ I f. Proof By Corollary 6.5, every element of SL(f) is less than or equal to any element of SU(f). This implies that∫ I f = supSL(f) ≤ inf SU(f) = ∫ I f. Example 6.11 The Dirichlet’s Function Let I = n∏ i=1 [ai, bi], and let f : I → R be the function defined as f(x) = 1, if all components of x are rational, 0, otherwise. This is known as the Dirichlet’s function. Find the lower inegral and the upper integral of f : I → R. Chapter 6. Multiple Integrals 356 Solution Let P = (P1, . . . , Pn) be a partition of I. A rectangle J in the partition P can be written in the form J = n∏ i=1 [ui, vi]. By denseness of rational numbers and irrational numbers, there exist a rational number αi and an irrational number βi in (ui, vi). Let α = (α1, . . . , αn) and β = (β1, . . . , βn). Then α and β are points in J, and 0 = f(β) ≤ f(x) ≤ f(α) = 1 for all x ∈ J. Therefore, mJ = inf x∈J f(x) = 0, MJ = sup x∈J f(x) = 1. It follows that L(f,P) = ∑ J∈JP mJ vol (J) = 0, U(f,P) = ∑ J∈JP MJ vol (J) = ∑ J∈JP vol (J) = vol (I). Therefore, SL(f) = {0}, while SU(f) = {vol (I)}. This shows that the lower inegral and the upper integral of f : I → R are given respectively by ∫ I f = 0 and ∫ I f = vol (I). As we mentioned before, one of the motivations to define the integral f : I → R is to calculate volumes. Given that f : I → R is a nonnegative continuous function defined on the rectangle I in Rn, let S = {(x, y) |x ∈ I, 0 ≤ y ≤ f(x)} , which is the solid bounded between I and the graph of f . It is reasonable to expect that S has a volume, which we denote by vol (S). We want to define the integral Chapter 6. Multiple Integrals 357 ∫ I f so that it gives vol (S). Notice that if P is a partition of I, then the Darboux lower sum L(f,P) = ∑ J∈JP mJ vol (J) is the sum of volumes of the collection of rectangles {J× [0,mJ] |J ∈ JP} in Rn+1, each of which is contained in S. Since any two of these rectangles can only intersect on the boundaries, it is reasonable to expect that L(f,P) ≤ vol (S). Similarly, the Darboux upper sum U(f,P) = ∑ J∈JP MJ vol (J) is the sum of volumes of the collection of rectangles {J× [0,MJ] |J ∈ JP} in Rn+1, the union of which contains S. Therefore, it is reasonable to expect that vol (S) ≤ U(f,P). Hence, the volume of S should be a number between L(f,P) and U(f,P) for any partition P. To make the volume well-defined, there should be only one number between L(f,P) and U(f,P) for all partitions P. By definition, any number between the lower integral and the upper integral is in between L(f,P) and U(f,P) for any partition P. Hence, to have the volume well-defined, we must require the lower integral and the upper integral to be the same. This motivates the following definition of integrability for a general bounded function. Chapter 6. Multiple Integrals 358 Definition 6.8 Riemann integrability Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. We say that f : I → R is Riemann integrable, or simply integrable, if∫ I f = ∫ I f. In this case, we define the integral of f over the rectangle I as∫ I f = ∫ I f = ∫ I f. It is the unique number larger than or equal to all Darboux lower sums, and smaller than or equal to all Darboux upper sums. Example 6.12 Example 6.10 says that a constant function f : I → R, f(x) = c is integrable and ∫ I f = c vol (I). Example 6.13 The Dirichlet’s function defined in Example 6.11 is not Riemann integrable since the lower integral and the upper integral are not equal. Leibniz Notation for Riemann Integrals The Leibniz notation of the Riemann integral of f : I → R is∫ I f(x)dx, or equivalently, ∫ I f(x1, . . . , xn)dx1 · · · dxn. As in the single variable case, there are some criteria for Riemann integrability which follows directly from the criteria that the lower integral and the upper integral are the same. Chapter 6. Multiple Integrals 359 Theorem 6.7 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. The following are equivalent. (a) The function f : I → R is Riemann integrable. (b) For every ε > 0, there is a partition P of the rectangle I such that U(f,P)− L(f,P) < ε. We define an Archimedes sequence of partitions exactly the same as in the single variable case. Definition 6.9 Archimedes Sequence of Partitions Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. If {Pk} is a sequence of partitions of the rectangle I such that lim k→∞ (U(f,Pk)− L(f,Pk)) = 0, we call {Pk} an Archimedes sequence of partitions for the function f . Then we have the following theorem. Theorem 6.8 The Archimedes-Riemann Theorem Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. The function f : I → R is Riemann integrable if and only if f has an Archimedes sequence of partitions {Pk}. In this case, the integral ∫ I f can be computed by ∫ I f = lim k→∞ L(f,Pk) = lim k→∞ U(f,Pk). A candidate for an Archimedes sequence of partitions is the sequence {Pk}, Chapter 6. Multiple Integrals 360 where Pk is the uniformly regular partition of I into kn rectangles. Example 6.14 Let I = [0, 1]× [0, 1]. Consider the function f : I → R defined as f(x, y) = 1, if x ≥ y, 0, if x < y. For k ∈ Z+, let Pk be the uniformly regular partition of I into k2 rectangles. (a) For each k ∈ Z+, compute the Darboux lower sum L(f,Pk) and the Darboux upper sum U(f,Pk). (b) Show that f : I → R is Riemann integrable and find the integral ∫ I f . Solution Fixed k ∈ Z+, let Pk = {u0, u1, . . . , uk}, where ui = i k for 0 ≤ i ≤ k. Then Pk = (Pk, Pk), and it divides I = [0, 1]× [0, 1] into the k2 rectangles Ji,j , 1 ≤ i ≤ k, 1 ≤ j ≤ k, where Ji,j = [ui−1, ui]× [uj−1,uj]. We have vol (Ji,j) = 1 k2 . Let mi,j = inf (x,y)∈Ji,j f(x, y) and Mi,j = sup (x,y)∈Ji,j f(x, y). Notice that if i < j − 1, then x ≤ ui < uj−1 ≤ y for all (x, y) ∈ Ji,j. Hence, f(x, y) = 0 for all (x, y) ∈ Ji,j. This implies that mi,j =Mi,j = 0 when i < j − 1. Chapter 6. Multiple Integrals 361 If i ≥ j + 1, then x ≥ ui−1 ≥ uj ≥ y for all (x, y) ∈ Ji,j. Hence, f(x, y) = 1 for all (x, y) ∈ Ji,j. This implies that mi,j =Mi,j = 1 when i ≥ j + 1. When i = j − 1, if (x, y) is in Ji,j , x ≤ ui = uj−1 ≤ y, and x = y if and only if (x, y) is the point (ui, uj−1). Hence, f(x, y) = 0 for all (x, y) ∈ Ji,j , except for (x, y) = (ui, uj−1), where f(ui, uj−1) = 1. Hence, mi,j = 0, Mi,j = 1 when i = j − 1. When i = j, 0 ≤ f(x, y) ≤ 1 for all (x, y) ∈ Ji,j . Since (ui−1, uj) and (ui, uj) are in Ji,j , and f(ui−1, uj) = 0 while f(ui, uj) = 1, we find that mi,j = 0, Mi,j = 1 when i = j. It follows that L(f,Pk) = k∑ i=1 k∑ j=1 mi,j vol (Ji,j) = k∑ i=2 i−1∑ j=1 1 k2 = 1 k2 k∑ i=2 (i− 1) = 1 k2 k−1∑ i=1 i = k(k − 1) 2k2 . U(f,Pk) = k∑ i=1 k∑ j=1 Mi,j vol (Ji,j) = k−1∑ i=1 i+1∑ j=1 1 k2 + k∑ j=1 1 k2 = 1 k + 1 k2 k−1∑ i=1 (i+ 1) = 1 k2 ( k(k + 1) 2 − 1 + k ) = k2 + 3k − 2 2k2 . Chapter 6. Multiple Integrals 362 Since U(f,Pk)− L(f,Pk) = 2k − 1 k2 for all k ∈ Z+, we find that lim k→∞ (U(f,Pk)− L(f,Pk)) = 0. Hence, {Pk} is an Archimedes sequence of partitions for f . By the Arichimedes-Riemann theorem, f : I → R is Riemann integrable, and∫ I f = lim k→∞ L(f,Pk) = lim k→∞ k(k − 1) 2k2 = 1 2 . Figure 6.6: This figure illustrates the different cases considered in Example 6.14 when k = 8. As in the single variable case, there is an equivalent definition for Riemann integrability using Riemann sums. For a partition P = {x0, x1, . . . , xk} of an interval [a, b], we define the gap of the partition P as |P | = max {xi − xi−1 | 1 ≤ i ≤ k} . For a closed rectangle I = n∏ i=1 [ai, bi], we replace the length xi−xi−1 of an interval in the partition by the diameter of a rectangle in the partition. Recall that the Chapter 6. Multiple Integrals 363 diameter of a rectangle J = n∏ i=1 [ui, vi] is diamJ = √ (v1 − u1)2 + · · ·+ (vn − un)2. Definition 6.10 Gap of a Partition Let P be a partition of the rectangle I = n∏ i=1 [ai, bi]. Then the gap of the partition P is defined as |P| = max {diamJ |J ∈ JP} . Example 6.15 Find the gap of the partition P = (P1, P2) of the rectangle I = [−2, 9] × [1, 6] defined in Example 6.1, where P1 = {−2, 0, 4, 9} and P2 = {1, 3, 6}. Solution The length of the three invervals in the partition P1 = {−2, 0, 4, 9} of the interval [−2, 9] are 2, 4 and 5 respectively. The lengths of the two intervals in the partition P2 = {1, 3, 6} of the interval [1, 6] are 2 and 3 respectively. Therefore, the diameters of the 6 rectangles in the partition P are √ 22 + 22, √ 42 + 22, √ 52 + 22, √ 22 + 32, √ 42 + 32, √ 52 + 32. From this, we see that the gap of P is √ 52 + 32 = √ 34. In the example above, notice that |P1| = 5 and |P2| = 3. In general, it is not difficult to see the following. Proposition 6.9 Let P = (P1, . . . , Pn) be a partition of the closed rectangle I = n∏ i=1 [ai, bi]. Then |P| = √ |P1|2 + · · ·+ |Pn|2. Chapter 6. Multiple Integrals 364 The following theorem gives equivalent definitions of Riemann integrability of a bounded function. Theorem 6.10 Equivalent Definitions for Riemann Integrability Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. The following three statements are equivalent for saying that f : I → R is Riemann integrable. (a) The lower integral and the upper integral are the same. Namely,∫ I f = ∫ I f. (b) There exists a number I that satisfies the following. For any ε > 0, there exists a δ > 0 such that if P is a partition of the rectangle I with |P| < δ, then |R(f,P, A)− I| < ε for any choice of intermediate points A = {ξJ} for the partition P. (c) For any ε > 0, there exists a δ > 0 such that if P is a partition of the rectangle I with |P| < δ, then U(f,P)− L(f,P) < ε. The most useful definition is in fact the second one in terms of Riemann sums. It says that a bounded function f : I → R is Riemann integrable if the limit lim |P|→0 R(f,P, A) exists. As a consequence of Theorem 6.10, we have the following. Chapter 6. Multiple Integrals 365 Theorem 6.11 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. If f : I → R is Riemann integrable, then for any sequence {Pk} of partitions of I satisfying lim k→∞ |Pk| = 0, we have (i) ∫ I f = lim k→∞ L(f,Pk) = lim k→∞ U(f,Pk). (ii) ∫ I f = lim k→∞ R(f,Pk, Ak), where for each k ∈ Z+, Ak is a choice of intermediate points for the partition Pk. The proof is exactly the same as the single variable case. The contrapositive of Theorem 6.11 gives the following. Theorem 6.12 Let I = n∏ i=1 [ai, bi], and let f : I → R be a bounded function defined on I. Assume that {Pk} is a sequence of partitions of I such that lim k→∞ |Pk| = 0. (a) If for each k ∈ Z+, there exists a choice of intermediate points Ak for the partition Pk such that the limit lim k→∞ R(f,Pk, Ak) does not exist, then f : I → R is not Riemann integrable. (b) If for each k ∈ Z+, there exist two choices of intermediate points Ak andBk for the partition Pk so that the two limits lim k→∞ R(f,Pk, Ak) and lim k→∞ R(f,Pk, Bk) are not the same, then f : I → R is not Riemann integrable. Theorem 6.12 is useful for justifying that a bounded function is not Riemann integrable, without having to compute the lower integral or the upper integral. To Chapter 6. Multiple Integrals 366 apply this theorem, we usually consider the sequence of partitions {Pk}, where Pk is the uniformly regular partition of I into kn rectangles. Example 6.16 Let I = [0, 1]× [0, 1], and let f : I → R be the function defined as f(x, y) = 0, if x is rational, y, if x is irrational. Show that f : I → R is not Riemann integrable. Solution For k ∈ Z+, let Pk be the uniformly regular partition of I into k2 rectangles. Then Pk = (Pk, Pk), where Pk = {u0, u1, . . . , uk} with ui = i k when 0 ≤ i ≤ k. Notice that |Pk| = √ 2 k , and so lim k→∞ Pk = 0. The partition Pk divides the square I into k2 squares Ji,j , 1 ≤ i ≤ k, 1 ≤ j ≤ k, where Ji,j = [ui−1, ui] × [uj−1, uj]. For 1 ≤ i ≤ k, since irrational numbers are dense, there is an irrational number ci in the interval (ui−1, ui). For 1 ≤ i ≤ k, 1 ≤ j ≤ k, let αi,j and βi,j be the points in Ji,j given respectively by αi,j = (ui, uj), βi,j = (ci, uj). Then f(αi,j) = 0, f(βi,j) = uj. Let Ak = {αi,j} and Bk = { βi,j } . Then the Riemann sums R(f,Pk, Ak) and R(f,Pk, Bk) are given respectively by R(f,Pk, Ak) = k∑ i=1 k∑ j=1 f(αi,j) vol (Ji,j) = 0, Chapter 6. Multiple Integrals 367 and R(f,Pk, Bk) = k∑ i=1 k∑ j=1 f(βi,j) vol (Ji,j) = k∑ i=1 k∑ j=1 j k × 1 k2 = k × k(k + 1) 2k3 = k + 1 2k . Therefore, we find that lim k→∞ R(f,Pk, Ak) = 0, lim k→∞ R(f,Pk, Bk) = 1 2 . Since the two limits are not the same, we conclude that f : I → R is not Riemann integrable. Now we return to the proof of Theorem 6.10. To prove this theorem, it is easier to show that (a) is equivalent to (c), and (b) is equivalent to (c). We will prove the equivalence of (a) and (c). The proof of the equivalence of (b) and (c) is left to the exercises. It is a consequence of the inequality L(f,P) ≤ R(f,P, A) ≤ U(f,P), which holds for any partition P of the rectangle I, and any choice of intermediate points A for the partition P. By Theorem 6.7, (a) is equivalent to (a′) For every ε > 0, there is a partition P of I such that U(f,P)− L(f,P) < ε. Thus, to prove the equivalence of (a) and (c), it is sufficient to show the equivalence of (a′) and (c). But then (c) implies (a′) is obvious. Hence, we are left with the most technical part, which is the proof of (a′) implies (c). We formulate this as a standalone theorem. Chapter 6. Multiple Integrals368 Theorem 6.13 Let I = n∏ i=1 [ai, bi], and let P0 be a fixed a partition of I. Given that f : I → R is a bounded function defined on I, for any ε > 0, there is a δ > 0 such that for all partitions P of I, if |P| < δ, then U(f,P)− L(f,P) < U(f,P0)− L(f,P0) + ε. (6.1) If Theorem 6.13 is proved, we can show that (a′) implies (c) in Theorem 6.10 as follows. Given ε > 0, (a′) implies that we can choose a P0 such that U(f,P0)− L(f,P0) < ε 2 . By Theorem 6.13, there is a δ > 0 such that for all partitions P of I, if |P| < δ, then U(f,P)− L(f,P) < U(f,P0)− L(f,P0) + ε 2 < ε. This proves that (a′) implies (c). Hence, it remains for us to prove theorem 6.13. Let us introduce some additional notations. Given the rectangle I = n∏ i=1 [ai, bi], for 1 ≤ i ≤ n, let Si = vol (I) bi − ai = (b1 − a1)× · · · × (bi−1 − ai−1)(bi+1 − ai+1)× · · · × (bn − an). (6.2) This is the area of the bounday of I that is contained in the hyperplane xi = ai or xi = bi. For example, when n = 2, I = [a1, b1] × [a2, b2], S1 = b2 − a2 is the length of the vertical side, while S2 = b1 − a1 is the length of the horizontal side of the rectangle I. Chapter 6. Multiple Integrals 369 Proof of Theorem 6.13 Since f : I → R is bounded, there is a positive number M such that |f(x)| ≤M for all x ∈ I. Assume that P0 = (P̃1, . . . , P̃n). For 1 ≤ i ≤ n, let ki be the number of intervals in the partition P̃i. Let K = max{k1, . . . , kn}, and S = S1 + · · ·+ Sn, where Si, 1 ≤ i ≤ n are defined by (6.2). Given ε > 0, let δ = ε 4MKS . Then δ > 0. If P = (P1, . . . , Pn) is a partition of I with |P| < δ, we want to show that (6.1) holds. Let P∗ = (P ∗ 1 , . . . , P ∗ n) be the common refinement of P0 and P such that P ∗ i is the partition of [ai, bi] that contains all the partition points of P̃i and Pi. For 1 ≤ i ≤ n, let Ui be the collection of intervals in Pi which contain partition points of P̃i, and let Vi be the collection of the intervals of Pi that is not in Ui. Each interval in Vi must be in the interior of one of the intervals in P̃i. Thus, each interval in Vi is an interval in the partition P ∗ i . Since each partition point of P̃i can be contained in at most two intervals of Pi, but the first and last partition points of Pi and P̃i are the same, we find that |Ui| ≤ 2ki. Since |Pi| ≤ |P| < δ, each interval in Pi has length less than δ. Therefore, the sum of the lengths of the intervals in Ui is less than 2kiδ. Let Qi = { J ∈ JP | the ith-edge of J is from Ui } . Then ∑ J∈Qi vol (J) < 2kiδSi ≤ 2KδSi. Chapter 6. Multiple Integrals 370 Figure 6.7: The partitions P0 and P in the proof of Theorem 6.13, P0 is the partition with red grids, while P is the partition with blue grids. Those shaded rectangles are rectangles in P that contain partition points of P0. Now let Q = n⋃ i=1 Qi. Then ∑ J∈Q vol (J) < 2Kδ n∑ i=1 Si = 2KδS. For each of the rectangles J that is in Q, we do a simple estimate MJ −mJ ≤ 2M. Therefore, ∑ J∈Q (MJ −mJ) vol (J) < 4MKδS ≤ ε. For the rectangles J that are in JP \ Q, each of them is a rectangle in the partition P∗. Therefore,∑ J∈JP\Q (MJ −mJ) vol (J) ≤ U(f,P∗)−L(f,P∗) ≤ U(f,P0)−L(f,P0). Chapter 6. Multiple Integrals 371 Hence, U(f,P)− L(f,P) = ∑ J∈JP (MJ −mJ) vol (J) = ∑ J∈JP\Q (MJ −mJ) vol (J) + ∑ J∈Q (MJ −mJ) vol (J) < U(f,P0)− L(f,P0) + ε. This completes the proof. Finally we extend Riemann integrals to functions f : D → R that are defined on bounded subsets D of Rn. If D is bounded, there is a positive number L such that ∥x∥ ≤ L for all x ∈ D. This implies that D is contained in the closed rectangle IL = n∏ i=1 [−L,L]. To define the Riemann integral of f : D → R, we need to extend the domain of f from D to IL. To avoid affecting the integral, we should extend by zero. Definition 6.11 Zero Extension Let D be a subset of Rn, and let f : D → R be a function defined on D. The zero extension of f : D → R is the function f̌ : Rn → R which is defined as f̌(x) = f(x), if x ∈ D, 0, if x /∈ D. If U is any subset of Rn that contains D, then the zero extension of f to U is the function f̌ : U → R. Obviously, if f : D → R is a bounded function, its zero extension f̌ : Rn → R is also bounded. Since we have defined Riemann integrability for a bounded function g : I → R that is defined on a closed rectangle I, it is natural to say that a function f : D → R is Riemann integrable if its zero extension f̌ : I → R to a closed rectangle I is Riemann integrable, and define∫ D f = ∫ I f̌ . Chapter 6. Multiple Integrals 372 For this to be unambiguous, we have to check that if I1 and I2 are closed rectangles that contain the bounded set D, the zero extension f̌ : I1 → R is Riemann integrable if and only if the zero extension f̌ : I2 → R is Riemann integrable. Moreover, ∫ I1 f̌ = ∫ I2 f̌ . This small technicality would be proved in Section 6.2. Assuming this, we can give the following formal definition for Riemann integrality of a bounded function defined on a bounded domain. Definition 6.12 Riemann Integrals of General Functions Let D be a bounded subset of Rn, and let I = n∏ i=1 [ai, bi] be a closed rectangle in Rn that contains D. Given that f : D → R is a bounded function defined on D, we say that f : D → R is Riemann integrable if its zero extension f̌ : I → R is Riemann integrable. If this is the case, we define the integral of f over D as∫ D f = ∫ I f̌ . Example 6.17 Let I = [0, 1]× [0, 1], and let f : I → R be the function defined as f(x, y) = 1, if x ≥ y, 0, if x < y. which is considered in Example 6.14. Let D = {(x, y) ∈ I | y ≤ x} , and let g : D → R be the constant function g(x) = 1. Then f : I → R is the zero extension of g to the square I that contains D. Chapter 6. Multiple Integrals 373 In Example 6.14, we have shown that f : I → R is Riemann integrable and∫ I f(x)dx = 1 2 . Therefore, g : D → R is Riemann integrable and∫ D g(x)dx = 1 2 . Remark 6.1 Here we make two remarks about the Riemann integrals. 1. When f : D → R is the constant function, we should expect that it is Riemann integrable if and only if D has a volume, which should be defined as vol (D) = ∫ D dx. 2. If f : D → R is a nonnegative continuous function defined on the bounded set D that has a volume, we would expect that f : D → R is Riemann integrable, and the integral ∫ D f(x)dx gives the volume of the solid bounded between D and the graph of f . In Section 6.3, we will give a characterization of sets D that have volumes. We will also prove that if f : D → R is a continuous function defined on a set D that has volume, then f : D → R is Riemann integrable. Chapter 6. Multiple Integrals 374 Exercises 6.1 Question 1 Let I = [−5, 8] × [2, 5], and let P = (P1, P2) be the partition of I with P1 = {−5,−1, 2, 7, 8} and P2 = {2, 4, 5}. Find gap of the partition P. Question 2 Let I = [−5, 8] × [2, 5], and let f : I → R be the function defined as f(x, y) = x2 + 2y. Consider the partition P = (P1, P2) of I with P1 = {−5,−1, 2, 7, 8} and P2 = {2, 4, 5}. Find the Darboux lower sum L(f,P) and the Darboux upper sum U(f,P). Question 3 Let I = [−5, 8] × [2, 5], and let f : I → R be the function defined as f(x, y) = x2 + 2y. Consider the partition P = (P1, P2) of I with P1 = {−5,−1, 2, 7, 8} and P2 = {2, 4, 5}. For each rectangle J = [a, b] × [c, d] in the partition P, let αJ = (a, c) and βJ = (b, d). Find the Riemann sums R(f,P, A) and R(f,P, B), where A = {αJ} and B = {βJ}. Question 4 Let I = [−1, 1]× [2, 5], and let f : I → R be the function defined as f(x, y) = 1, if x and y are rational, 0, otherwise. (a) Given that P is a partition of I, find the Darboux lower sum L(f,P) and the Darboux upper sum U(f,P). (b) Find the lower integral ∫ I f and the upper integral ∫ I f . (c) Explain why f : I → R is not Riemann integrable. Chapter 6. Multiple Integrals 375 Question 5 Let I = [0, 4]× [0, 2]. Consider the function f : I → R defined as f(x, y) = 2x+ 3y + 1. For k ∈ Z+, let Pk be the uniformly regular partition