Buscar

Introdução à probalidade teoria de processos aleatórios

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 205 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 6, do total de 205 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 9, do total de 205 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Prévia do material em texto

Introduction to Probability Theory and
Random Processes
Lecture Notes
Rafael Izbicki and Rafael Bassi Stern
Last revised: January 17, 2017
Please send remarks, comments, typos and mistakes to rafaelizbicki@gmail.com
1
Contents
1 Mathematical Prologue 4
1.1 Combinatorial analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 (Basic) Set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Probability Theory 12
2.1 The axioms of probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 *The truth about probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 *Why were these axioms chosen? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.5 Equiprobable outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.7 The law of total probability and Bayes theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3 Random Variables 39
3.1 Distribution of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2 (Conditional) Expected value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4 Covariance and the vector space of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4 Bernoulli Processes (a long example. . . ) 63
4.1 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.4 Negative Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5 Hypergeometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.6 Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.7 Review of special discrete distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5 Continuous random variables 83
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.2 Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.3 Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.4 Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.5 Beta distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.6 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.7 Review of special continuous distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2
6 Moment generating functions and probability inequalities 109
6.1 Moment generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.2 Probability inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7 Vectors of random variables 116
7.1 Cumulative distribution function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.2 Probability density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7.3 Conditional probability density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
7.3.1 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.4 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
7.5 Conditional variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
7.6 *Some multivariate models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.6.1 Multivariate Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.6.2 Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
7.7 Review exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
8 Transformations 157
8.1 Univariate transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
8.2 Multivariate transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.2.1 Sums of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
8.2.2 The Jacobian method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
8.2.3 Moment generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
8.3 Review exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
9 Convergence 174
9.1 Convergence of real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
9.2 Convergence of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
9.3 Almost sure convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.4 Convergence in probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
9.5 Convergence in distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
9.6 Relationships between the types of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
9.7 Law of large numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
9.8 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
10 Discrete time Markov chains 197
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
10.2 Classification of states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
10.3 Stationary distribution and invariant measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
11 Poisson Process 204
12 Bibliography 205
3
1 Mathematical Prologue
1.1 Combinatorial analysis
Being able to accurately count the possible results of an experiment is key in probability theory. Combinatorial
analysis is a powerful tooltowards achieving this goal. This section contains a brief introduction to it.
1.1.1 Counting
Assume r experiments will be performed. If the i-th experiment has ni possible outcomes, i = 1, . . . , n then the
total number of outcomes of the r experiments is
Πri=1ni := n1n2 . . . nr.
Example 1.1. If we flip a coin r times, the total number of outcome in each experiment is two: either heads or
tails. It follows from the counting principle that there are Πri=12 = 2
r different results on the joint experiment,
i.e., there is a total of 2r sequences of heads or tails.
Example 1.2. If we flip a coin and then a dice, there is a total of 2× 6 = 12 results one can observe.
1.1.2 Permutations
Consider a set with n objects. The total number of ways we can sort these objects is given by
n! := n× (n− 1)× . . .× 2× 1
Example 1.3. There are 4!=24 ways of displaying four shirts in a closet.
1.1.3 Combinations
Consider a set with n objects. The total number of different groups that of size r ≤ n that can be formed with
this n objects is given by (
n
r
)
:=
n!
r!(n− r)!
Example 1.4. There are
(
10
2
)
= 45 ways of choosing two questions from a section with 10 questions from this
book.
Lemma 1.5 (Vandermonde’s Identity).
(
n+m
z
)
=
min(0,z)∑
x=max(0,m−z)
(
n
x
)(
m
z − x
)
Proof. Consider that you have n + m objects. The total number of ways you can select z of these objects is(
n+m
z
)
. Next, consider that you divide these objects into two groups, the type-1 group with n objects and the
type-2 group with m objects. For every x, the total ways you can select x objects from the type-1 group and z−x
objects from the type-2 group is
(
n
x
)(
m
z−x
)
. If one considers the set of selections ranging over all values of x, then
4
one obtains exactly the ways to select z objects from the group of size n+m. That is,(
n+m
z
)
=
r∑
x=0
(
n
x
)(
m
z − x
)
Exercises
Exercise 1.6. A restaurant offers three different entries, five main dishes and six deserts. How many different
meals does this restaurant offer?
Exercise 1.7. A student has 2 math books, 3 geography books and 4 chemistry books. In how many ways can
these books be displayed in a shelf:
• If the student does not care about books of the same subject not being close together?
• If the student wants that books about the same subject stay together?
Exercise 1.8. How many different words can be formed with two letters A, three letters B and one letter C?
Exercise 1.9. A student needs to study for three exams this week. The math teacher gave him 6 exercises to
help him study to the test, the geography teacher gave him 7 exercises, and the chemistry teacher gave him 5
exercises. Considering that the student does not have much time, how many different sets of exercise can he pick
to do if he wants to choose only 2 math exercises, 4 geography exercises, and 1 chemistry exercise?
Exercise 1.10. Eight people, named A, B, . . . , H, are going to make a line.
• In how many ways can these people be placed?
• In how many ways can these people be placed if A and B have to be next to each other?
• In how many ways can these people be placed if A and B have to be next to each other, and C and D also
have to be next to each other?
Exercise 1.11. Show that (
n+ 1
k + 1
)
=
n−k+h∑
x=h
(
x
h
)(
n− x
k − h
)
1.2 (Basic) Set theory
As we saw in the last section, combinatorial analysis is used to count the possible results of an experiment. In
probability theory, we also have to be able to accurately describe such results. This is where set theory becomes
important.
A set is a collection of objects. If a set has a finite number of objects, o1, o2, . . . , on, we denote it by
{o1, o2, . . . , on}. We denote the set of natural numbers by N, the set of integers by Z and the set of reals by
R. In probability, sets are fundamental for describing outcomes of an experiment.
Example 1.12 (Sets).
5
• The set of possible outcomes of a six sided die: {1, 2, 3, 4, 5, 6}.
• The set of outcomes of a coin flip: {T,H}.
• The set of outcomes of two coin flips: {(T, T ), (T,H), (H,T ), (H,H)}.
• The set of all odd numbers: {2n+ 1 : n ∈ N} or {1, 3, 5, 7, . . .}.
• The set of non-negative real numbers: {x ∈ R : x ≥ 0}.
• A circle of radious 1: {(x, y) ∈ R2 : x2 + y2 ≤ 1}.
Definition 1.13 (∈ and /∈). We write o ∈ S if object o is an element of set S and o /∈ S, otherwise.
Example 1.14 (∈ and /∈).
• T ∈ {T,H}.
• 7 /∈ {1, 2, 3, 4, 5, 6}.
• 7 ∈ {2n+ 1 : n ∈ N}.
Definition 1.15 (empty set - ∅). ∅ is the only set with no elements. That is, for every object o, o /∈ ∅.
Definition 1.16 (disjoint sets).
• Two sets A and B are disjoint if, for every o ∈ A we have that o /∈ B and for every o ∈ B, o /∈ A.
• A sequence of sets (An)n∈N is disjoint if, for every i 6= j, Ai is disjoint from Aj .
Example 1.17 (Disjoint sets).
• {1, 2} and {3, 4} are disjoint.
• {1, 2} and {2, 3} are not disjoint since 2 ∈ {1, 2} and 2 ∈ {2, 3}.
Definition 1.18 (⊂ and =). Let A and B be two sets. We say that:
• A ⊂ B if, for every o ∈ A, o ∈ B.
• A = B if A ⊂ B and B ⊂ A.
Example 1.19 (⊂ and =).
• {1, 2} ⊂ {1, 2, 3, 4}.
• {n ∈ Z : n ≥ 1} ⊂ N.
• {n ∈ Z : n ≥ 0} = N.
We reserve the symbol Ω for the set of all objects we are considering in a given model. Ω is often called the
sample space in probability theory. That is, for every set A we consider in that model, A ⊂ Ω.
6
1.2.1 Set operations
Definition 1.20 (complement - c). Let A be a set. o is an element of Ac if and only if o /∈ A. That is, the
complement of A is formally defined as Ac = {o ∈ Ω : o /∈ A}.
Example 1.21 (c).
• Let Ω = {T,H}, {T}c = {H}.
• Let Ω = {1, 2, 3, 4, 5, 6}, {1, 2}c = {3, 4, 5, 6}.
• Let Ω = N, {n ∈ N : n > 0}c = {0}.
Definition 1.22 (union - ∪).
• Let A and B be two sets. o ∈ Ω is an element of the union between A and B, A ∪ B, if and only if either
o is an element of A or o is an element of B. That is, A ∪B = {o ∈ Ω : o ∈ A or o ∈ B}.
• Let (An)n∈N be a sequence of sets. o ∈ Ω is an element of the union of (An)n∈N, ∪n∈NAn, if and only if there
exists n ∈ N such that o ∈ An. That is, ∪n∈NAn = {o ∈ Ω : exists n ∈ N such that o ∈ An}
Example 1.23 (∪).
• {T} ∪ {H} = {T,H}.
• {1, 2} ∪ {2, 3} = {1, 2, 3}.
• {1} ∪ {3} ∪ {5} = {1, 3, 5}.
• {n ∈ Z : n > 0} ∪ {n ∈ Z : n < 0} = {n ∈ Z : n 6= 0}.
• ∪n∈N{n} = N.
• ∪n∈N{x ∈ R : x ≥ n} = {x ∈ R : x ≥ 0}.
• ∪n∈N{x ∈ R : x ≥ 1/(n+ 1)} = {x ∈ R : x > 0}.
Definition 1.24 (intersection - ∩).
• Let A and B be two sets. o is an element of the intersection between A and B, A ∩B, if and only if o ∈ Ω
is an element of A and o is an element of B. That is, A ∩B = {o ∈ Ω : o ∈ A and o ∈ B}.
• Let (An)n∈N be a sequence of sets. o ∈ Ω is an element of the intersection of (An)n∈N, ∩n∈NAn, if and only
if for every n ∈ N, o ∈ An. That is, ∩n∈NAn = {o ∈ Ω : for every n ∈ N, o ∈ An}
Example 1.25 (∩).
• {T} ∩ {H} = ∅.
• {1, 2} ∩ {2, 3} = {2}.
• ({1, 2} ∩ {2, 3}) ∪ {5} = {2, 5}.
• {n ∈ Z : n ≥ 0} ∩ {n ∈ Z : n ≤ 0} = {0}.
7
• ∩n∈N{i ∈ N : i ≥ n} = ∅.
• ∩n∈N{x ∈ R : x ≤ n} = {x ∈ R : x ≤ 0}.
Theorem 1.26 (DeMorgan’s laws). Let (An)n∈N be a sequence of subsets of Ω. Then, for every n ∈ N,
• (∪ni=1Ai)c = ∩ni=1Aci
• (∩ni=1Ai)c = ∪ni=1Aci
Moreover,
• (∪i∈NAi)c = ∩i∈NAci
• (∩i∈NAi)c = ∪i∈NAci
Definition 1.27 (Partition). Let (An)n∈N be a sequence of sets. We say that (An)n∈N partitions Ω if:
• for every i, j ∈ N such that i 6= j, Ai and Aj are disjoint.
• ∪n∈NAn = Ω.
Exercises
Exercise 1.28. Let A = {1, 3, 5}, B = {1, 2} and Ω = {1, 2, 3, 4, 5, 6}. Find:
a. A ∪B
b. A ∩Bc
c. B ∪Bc
d. (A ∪B)c
e. (A ∩B)c
Solution:
a. A ∪B = {1, 2, 3, 5}.
b. A ∩Bc = {1, 3, 5} ∩ {3,4, 5, 6} = {3, 5}.
c. B ∪Bc = {1, 2} ∪ {3, 4, 5, 6} = {1, 2, 3, 4, 5, 6}. (Note that, for every set D, D ∪Dc = Ω)
d. (A ∪B)c = {1, 2, 3, 5}c = {4, 6}.
e. (A ∩B)c = {1}c = {2, 3, 4, 5, 6}.
Exercise 1.29. Consider that a given day can be rainy - R - or not rainy - NR. We are interested in the weather
of the next two days.
a. How would you formally write Ω?
b. How would you formally write “The outcomes such that both days are rainy’? Call this set A.
c. How would you formally write “The outcomes such that at least one day is rainy”? Call this set B.
8
d. Is it true that A ⊂ B?
e. Find Bc. How would you describe this set in English?
f. Is it true that A and Bc are disjoint?
Solution:
a. Ω = {(NR,NR), (NR,R), (R,NR), (R,R)}.
b. A = {(R,R)}.
c. B = {(NR,R), (R,NR), (R,R)}.
d. Since (R,R) is the only element of A and (R,R) ∈ B, A ⊂ B.
e. The only element of Ω that is not in B is (NR,NR). Hence Bc = {(NR,NR)}. This is the set of outcomes
such that there is no rainy day.
f. A ∩Bc = {(NR,NR)} ∩ {(R,R)} = ∅. Hence, A and Bc are disjoint.
Exercise 1.30. Are the following statements true or false?
a. ∅ ∈ ∅.
b. ∅ ∈ {∅}.
c. {∅} ∈ ∅.
d. ∅ ⊂ ∅.
e. ∅ ⊂ {∅}.
f. {∅} ⊂ ∅.
Solution:
a. By definition, there is no element in ∅. That is, ∅ /∈ ∅.
b. {∅} is the set with a single element that is ∅. Hence, ∅ ∈ {∅}.
c. By definition, there is no element in ∅.. Hence, {∅} /∈ ∅.
d. Since ∅ has no elements, all elements of ∅ are in any other set. Hence ∅ is a subset of every set and ∅ ⊂ ∅.
e. Using the same reasoning as in the previous item, ∅ ⊂ {∅}.
f. ∅ is an element of {∅} and ∅ /∈ ∅. Hence, there exists an element of {∅} that is not an element of ∅. Conclude
that {∅} 6⊂ ∅.
Exercise 1.31. Prove the following:
a. S and T are disjoint if and only if S ∩ T = ∅.
b. S ∪ T = T ∪ S
9
c. S ∩ T = T ∩ S
d. S = (Sc)c
e. S ∪ Sc = Ω
f. S ∩ Ω = S
g. S ∪ Ω = Ω
h. (S ∩ T )c = Sc ∪ T c
i. (S ∪ T )c = Sc ∩ T c
j. ({n})n∈N partitions N.
k. ({1, 2}, {3, 4}, ∅, ∅, ∅, . . .) partitions {1, 2, 3, 4}.
Solution:
a. Since S and T are disjoint, for every w ∈ A, w /∈ T and for every w ∈ T , w /∈ S. This happens if and only
if {w ∈ Ω : w ∈ S and w ∈ T} = ∅, that is S ∩ T = ∅.
b. S ∪ T = {w ∈ Ω : w ∈ S or w ∈ T} = {w ∈ Ω : w ∈ T or w ∈ S} = T ∪ S.
c. S ∩ T = {w ∈ Ω : w ∈ S and w ∈ T} = {w ∈ Ω : w ∈ T and w ∈ S} = T ∩ S.
d. If w ∈ S, then w /∈ Sc and w ∈ (Sc)c. Hence S ⊂ Sc. Similarly, if w ∈ (Sc)c, then w /∈ Sc and w ∈ S. Hence
Sc ⊂ S. Conclude that S = Sc.
e. S ∪ Sc = {w ∈ Ω : w ∈ S or w ∈ Sc} = {w ∈ Ω : w ∈ S or w /∈ S} = Ω.
f. S ∩ Ω = {w ∈ Ω : w ∈ S and w ∈ Ω} = {w ∈ Ω : w ∈ S} = S.
g. S ∪ Ω = {w ∈ Ω : w ∈ S or w ∈ Ω} = {w ∈ Ω : w ∈ Ω} = Ω.
h. If w ∈ Sc ∪ T c, then w /∈ S or w /∈ T . Hence w /∈ S ∩ T and w ∈ (S ∩ T )c. If w ∈ (S ∩ T )c, then w /∈ S ∩ T ,
that is w /∈ S or w /∈ T . That is, w ∈ Sc or W ∈ T c and, therefore w ∈ Sc ∪ T c.
i. If w ∈ Sc∩T c, then w /∈ S and w /∈ T . Hence w /∈ (S∪T ) and w ∈ (S∪T )c. If w ∈ (S∪T )c, then w /∈ S∪T ,
that is w /∈ S and w /∈ T . That is, w ∈ Sc and W ∈ T c and, therefore w ∈ Sc ∩ T c.
j. For every i 6= j, {i} ∩ {j} = ∅ and ∪n∈N{n} = N. Hence, ({n})n∈N partitions N.
k. {1, 2}∩{3, 4} = ∅ and A∩∅ = ∅ for every A. Hence, since {1, 2}∪{3, 4}∪∅ = {1, 2, 3, 4}, then the sequence
of events ({1, 2}, {3, 4}, ∅, ∅, ∅, . . .) partitions {1, 2, 3, 4}.
Exercise 1.32. Prove DeMorgan’s law.
Exercise 1.33. In a class of 25 students, consider the following statements:
• 14 will major in Computer Science.
• 12 will major in Engineering.
10
• 5 will major both in Computer Science and Engineering.
How many won’t major in either Computer Science or Engineering?
Solution: Let A denote the set of students who will major in Computer Science and B denote the set of
students who will major in Engineering. Let |A| stand for the number of elements in A. The problem states that
|Ω| = 25, |A| = 14, |B| = 12 and |A ∩ B| = 5. The problem asks for |Ac ∩ Bc|. We can partition Ω in 4 sets:
A ∩B, A ∩Bc, Ac ∩B and Ac ∩Bc. Observe that:
a. The problem states that |A ∩B| = 5.
b. (A ∩ B) ∪ (A ∩ Bc) = A. Hence, since A ∩ B and A ∩ Bc are disjoint, |A ∩ B| + |A ∩ Bc| = |A|. Thus,
5 + |A ∩Bc| = 14 and |A ∩Bc| = 9.
c. Similarly, (A ∩B) ∪ (Ac ∩B) = B and (A ∩B) and (Ac ∩B) are disjoint. Hence, |A ∩B|+ |Ac ∩B| = |B|.
Conclude that 5 + |Ac ∩B| = 12 and |Ac ∩B| = 7.
d. Since Ω can be partitioned in the sets A ∩B, A ∩Bc, Ac ∩B and Ac ∩Bc, conclude that:
|A ∩B|+ |A ∩Bc|+ |Ac ∩B|+ |Ac ∩Bc| = |Ω|
5 + 9 + 7 + |Ac ∩Bc| = 25
|Ac ∩Bc| = 4
Hence, 4 students won’t major in either Computer Science or Engineering.
Exercise 1.34. Let A,B and C be events of Ω. Using set operations, describe the following sets:
• At least one of A,B or C happens.
• Either A and B happen or C happens.
• A and B happen, but C does not.
• Exactly one of A,B and C happens.
1.3 Functions
Definition 1.35. Function
Table interpretation.
Definition 1.36. Pre-Image
Use table interpretation o find pre-image.
Exercises
11
2 Probability Theory
2.1 The axioms of probability
We denote by Ω the sample space, that is, the set of all possible outcomes in the situation we are interested in.
For example, one might be interested in the outcome of a coin flip (Ω = {H,T}), the number of rainy days next
year (Ω = {n ∈ N : n ≤ 366}) or the time until a given transistor fails (Ω = {x ∈ R : x > 0}).
For every A ⊂ Ω, one might be uncertain whether A is true or not. For example, {T} ⊂ {H,T} and one cannot
usually determine before a coin is tossed whether the outcome will be heads or tails. A probability is a function
defined on subsets of Ω, which is usually interpreted in one the following ways:
• (Frequency) P(A) denotes the relative frequency which outcomes in A will be observed if the same experiment
is performed many times independently.
• (Degree of Belief) P(A) represents the evaluators degree of belief that the outcome will be an element in A.
Events are the subsets of Ω to which we attribute probabilities. We denote by F the set of all events. In order
for P to be a probability, it must satisfy the Axioms of Probability.
Definition 2.1 (Axioms of Probability). P : F → R is a probability if:
1. (Non-negativity) For all A ∈ F , P(A) ≥ 0.
2. (Countable additivity) If (An)n∈N is a sequence of disjoint sets in F , P(∪n∈NAn) =
∑
n∈N P(An).
3. (Normalization) P(Ω) = 1.
Next, we prove some important consequences that follow from the axioms of probability.
Lemma 2.2. P(∅) = 0.
Proof. Let (An)n∈N be a sequence of subsets of Ω such that A0 = Ω and Ai = ∅ for all i > 0. By constructions,
this is a sequence of disjoint sets and, therefore, by countable additivity:
1 = P(Ω) = P(∪n∈NAn) Definition 2.1.3
=
∑
n∈N
P(An) Definition 2.1.2
= P(Ω) +
∑
n>0
P(∅)
= 1 +
∑
n>0
P(∅)
Obtain from the previous equation that
∑
n>0 P(∅) = 0. That is, P (∅) = 0.
Lemma 2.3. If A1, A2, . . . , An are disjoint, then
P(A1 ∪A2 . . . ∪An) =
n∑
i=1
P(Ai)
12
Proof. Consider the sequence (Bi)i∈N such that
Bi =
Ai , if 1 ≤ i ≤ n∅ , otherwise.
Observe that (Bi)i∈N is a sequence of disjoint sets. Hence,
P(A1 ∪A2 . . . ∪An) = P(∪i∈NBi) A1 ∪A2 . . . ∪An = ∪i∈NBi
=
∑
i∈N
P(Bi) Definition 2.1.2
=
n∑
i=1
P(Ai) +
∑
i>n
P(∅)
=
n∑
i=1
P(Ai) Lemma 2.2
Lemma 2.4. For every event A, P(Ac) = 1− P(A)
Proof. Observe that A and Ac are disjoint and A ∪Ac = Ω. Hence,
1 = P(Ω) Definition 2.1.3
= P(A ∪AC)
= P(A) + P(Ac) Lemma 2.3
Putting the extremes together, 1 = P(A) + P(Ac), that is, P(Ac) = 1− P(A).
Example 2.5. Consider a coin flip. Let Ω = {H,T} and F = {∅, {H}, {T}, {H,T}}. It follows from the axioms
of probability that P(∅) = 0 and P({H,T}) = 1. We say that the coin is fair if P(H) = 0.5 and, therefore,
P(T ) = 1− P(H)= 0.5.
Lemma 2.6. For every events A and B,
P(A ∪B) = P(A) + P(B)− P(A ∩B)
Proof. Observe that Ac ∩B and A ∩B are disjoint and (Ac ∩B) ∪ (A ∩B) = B. Hence,
P(B) = P(Ac ∩B) + P(A ∩B) Lemma 2.3
P(Ac ∩B) = P(B)− P(A ∩B) (1)
Also observe that A and Ac ∩B are disjoint and A ∪ (Ac ∩B) = A ∪B. Hence,
P(A ∪B) = P(A) + P(AC ∩B) Lemma 2.3 (2)
13
Finally,
P(A ∪B) = P(A) + P(AC ∩B) eq. (2)
= P(A) + P(B)− P(A ∩B) eq. (1)
Lemma 2.7. If A ⊆ B, then P(B) ≥ P(A).
Proof. Observe that A ∩B and Ac ∩B are disjoint and (A ∩B) ∪ (Ac ∩B) = B. Hence,
P (B) = P(A ∩B) + P(Ac ∩B) Lemma 2.3
= P(A) + P(Ac ∩B) A ⊂ B
≥ P(A) P(Ac ∩B) ≥ 0 (Definition 2.1.1)
Lemma 2.8. For every events A1, . . . , An,
P(∪ni=1Ai) ≤
n∑
i=1
P(Ai)
Proof. We use the proof technique of finite induction. Since P(A1) ≤ P(A1), the lemma is valid for n = 1. Next,
assume that
P(∪n−1i=1 Ai) ≤
n−1∑
i=1
P(Ai) (3)
Conclude that
P(∪ni=1Ai) = P((∪n−1i=1 Ai) ∪An)
= P(∪n−1i=1 Ai) + P(An)− P((∪n−1i=1 Ai) ∩An) Lemma 2.6
≤ P(∪n−1i=1 Ai) + P(An) P((∪n−1i=1 Ai) ∩An) ≥ 0 (Definition 2.1.1)
≤
n−1∑
i=1
P(Ai) + P(An) eq. (3)
=
n∑
i=1
P(Ai)
Lemma 2.9 (Continuity of probability). Let A1, A2, . . . be a sequence of events such that, for every i ≥ 1,
Ai ⊆ Ai+1. In this case, define limn→∞An := ∪i≥1Ai. It follows that
P( lim
n→∞An) = limn→∞P(An)
Proof. Define Fi := Ai ∩ Aci−1. Since, for every i ≥ 1, Ai ⊆ Ai+1, conclude that the Fi are disjoint and that
14
∪ni=1Fi = Ai and ∪i≥1Fi = ∪i≥1Ai. Therefore,
lim
n→∞P (An) = limn→∞P(∪
n
i=1Fi)
= lim
n→∞
n∑
i=1
P(Fi) Definition 2.1.2
=
∞∑
i=1
P(Fi)
= P(∪i≥1Fi) Definition 2.1.2
= P(∪i≥1Ai)
= P( lim
n→∞An)
Lemma 2.10. For every sequence of events, A1, A2, . . . ∈ F ,
P(∪i≥1Ai) ≤
∑
i≥1
P(Ai)
Proof.
P(∪i≥1Ai) = P( lim
n→∞∪
n
i=1Ai)
= lim
n→∞P(∪
n
i=1Ai) Lemma 2.9
≤ lim
n→∞
n∑
i=1
P(Ai) Lemma 2.8
=
∑
i≥1
P(Ai)
Lemma 2.11. Let A1, A2, . . . and B1, B2, . . . be a sequences of events.
1. If P(Ai) = 0, for every i ≥ 1, then P(∪i≥1Ai) = 0
2. If P(Bi) = 1, for every i ≥ 1, then P(∩i≥1Bi) = 1.
Proof.
0 ≤ P(∪i≥1Ai) Definition 2.1.1
≤
∑
i≥1
P(Ai) Lemma 2.10
= 0 P(Ai) = 0
15
Conclude that P(∪i≥1Ai) = 0. Since P(Bi) = 1, it follows from Lemma 2.4 that P(Bci ) = 0. Conclude that
P(∩i≥1Bi) = 1− P((∩i≥1Bi)c)
= 1− P(∪i≥1Bci ) Theorem 1.26
= 1 P(Bci ) = 0, Lemma 2.11.1
Exercises
Exercise 2.12. If the probability that I get a pocket pair in a poker match is 118 , what is the probability that I
don’t get a pocket pair?
Solution: Let A denote the event that I get a pocket pair. P(A) = 118 . A
c denotes the event that I don’t get
the pocket pair. Hence, by Lemma 2.4, P(Ac) = 1− P(A) = 1718 .
Exercise 2.13. Let A and B be two disjoint events. Is it possible for P(A) = 0.4 and P(B) = 0.7? Why?
Solution: Since A∩B = ∅, using Lemma 2.3, P(A∪B) = P(A) +P(B) = 1.1. This is false, since P(A∪B) ≤ 1.
Hence, it is impossible for P(A) = 0.4, P(B) = 0.7 and A and B to be disjoint.
Exercise 2.14. Consider the following incorrect argument:
Consider a fair coin and a fair six-sided die. Let the set of all possible outcomes for the coin be C,
which implies P(C) = 1. Let the set of all possible outcomes for the die be D, which implies P(D) = 1.
Now P(C ∪D), which is the probability that either C or D occur is 1. Also, C and D are also disjoint
events, hence P(C ∪D) = P(C) + P(D) = 1 + 1. That is, 1 + 1 = 1.
What is the sample space in this problem description? What were the mistakes in the argument?
Solution: According to the problem description, the experiment under consideration is the throw of both the
coin and the die. Hence,
Ω = {H1, H2, H3, H4, H5, H6
T1, T2, T3, T4, T5, T6}
Observe that C = D = Ω. Hence, C and D are not disjoint. This is the flaw in the argument.
Exercise 2.15. Let A and B be two events.
a. If P(A) = 0.7, P(B) = 0.4 and P(A ∪B) = 0.8, what is the value of P(A ∩B)?
b. If P(A ∩B) = 0.25, P(A ∪B) = 0.75 and P(A) = 2P(B), what is P(A)?
c. Prove that if P(A) = P(Bc), then P(Ac) = P(B).
Solution:
a. Using Lemma 2.6, P(A ∪B) = P(A) + P(B)− P(A ∩B). Hence, P(A ∩B) = 0.7 + 0.4− 0.8 = 0.3.
b. Using Lemma 2.6, P(A∪B) = P(A)+P(B)−P(A∩B). Since P(A) = 2P(B), P(A∪B) = 3P(B)−P(A∩B).
Hence 3P(B) = 0.75 + 0.2P(A ∩B)5 and P(B) = 13 . Using P(A) = 2P(B), P(A) = 23 .
16
c. Using Lemma 2.4, P(A) = 1−P(Ac) and P(B) = 1−P(Bc). Hence, since P(A) = P(Bc), 1−P(Ac) = 1−P(B).
That is, P(Ac) = P(B).
Exercise 2.16. Show that P((A ∩Bc) ∪ (Ac ∩B)) = P(A) + P(B)− 2P(A ∩B).
Solution: Recall that A∪B = (A∩B)∪ (A∩Bc)∪ (Ac∩B). Since the sets on the right hand side are disjoint,
it follows from Lemma 2.3 that
P(A ∪B) = P(A ∩B) + P(A ∩Bc) + P(Ac ∩B)
Also recall from Lemma 2.6 that
P(A ∪B) = P(A) + P(B)− P(A ∩B)
Putting the above equations together, conclude that,
P(A ∩B) + P(A ∩Bc) + P(Ac ∩B) = P(A) + P (B)− P(A ∩B)
P(A ∩Bc) + P(Ac ∩B) = P(A) + P(B)− 2P(A ∩B)
Exercise 2.17. Prove that
P(A ∪B ∪ C) = P(A) + P(B) + P(C)− P(A ∩B)− P(A ∩ C)− P(B ∩ C) + P(A ∩B ∩ C)
Solution:
P(A ∪ (B ∪ C)) = P(A) + P(B ∪ C)− P(A ∩ (B ∪ C)) Lemma 2.6
= P(A) + P(B ∪ C)− P((A ∩B) ∪ (A ∩ C))
= P(A) + P(B) + P(C)− P(B ∩ C)− P((A ∩B) ∪ (A ∩ C)) Lemma 2.6
Using Lemma 2.6,
P((A ∩B) ∪ (A ∩ C)) = P(A ∩B) + P(A ∩ C)− P((A ∩B) ∩ (A ∩ C))
= P(A ∩B) + P(A ∩ C)− P(A ∩B ∩ C)
Hence, combining the above equations:
P(A ∪ (B ∪ C)) = P(A) + P(B) + P(C)− P(A ∩B)− P(A ∩ C)− P(B ∩ C) + P(A ∩B ∩ C)
Exercise 2.18. Let Ω be the sample space and A and B be events. Prove that:
a. P(A ∪B) = 1− P(Ac ∩Bc).
b. P(A) = P(A ∩B) + P(A ∩Bc).
c. max(P(A),P(B)) ≤ P(A ∪B) ≤ min(1,P(A) + P(B)).
d. max(0,P(A) + P(B)− 1) ≤ P(A ∩B) ≤ min(P(A),P(B)).
Solution:
17
a. Observe that (A ∪ B)c = Ac ∩ Bc. Hence it follows from Lemma 2.4 that P(Ac ∩ Bc) = 1− P(A ∪ B) and,
arranging the terms, P(A ∪B) = 1− P (Ac ∩Bc).
b. Observe that A = (A ∩B) ∪ (A ∩Bc). Hence, since (A ∩B) and (A ∩Bc) are disjoint,
P(A) = P((A ∩B) ∪ (A ∩Bc))
= P(A ∩B) + P(A ∩Bc)
c. Since A ⊂ A ∪B and B ⊂ A ∪B, P(A) ≤ P(A ∪B) and P(B) ≤ P(A ∪B). Hence,
max{P(A),P(B)} ≤ P(A ∪B)
Recall that P(A∪B) = P(A)+P(B)−P(A∩B). Since P(A∩B) ≥ 0, conclude that P(A∪B) ≤ P(A)+P(B).
Also, since A ∪B ⊂ Ω, obtain P(A ∪B) ≤ P(Ω) = 1. Hence,
P(A ∪B) ≤ max{P(A) + P(B), 1}
d. Since (A ∩B) ⊂ A and (A ∩B) ⊂ B, P(A ∩B) ≤ P(A) and P(A ∩B) ≤ P(B). Hence,
P(A ∩B) ≤ min{P(A),P(B)}
Also recall that
1 ≥ P(A ∪B) = P(A) + P(B)− P(A ∩B)
P(A ∩B) ≥ P(A) + P(B)− 1
Since P(A ∩B) ≥ 0, P(A ∩B) ≥ max{0,P(A) + P(B)− 1}.
2.2 *The truth about probability
Although when presenting the axioms of probability we assume that P is defined over any subset E of the sample
space Ω, for uncountable spaces this is actually too strong: for instance, it is not possible to define a uniform
distribution in (0, 1) (see Section 5) that satisfy all axioms. Modern probability (more precisely, measure theory),
shows us that it is only possible to define such P over some events. In this section we present a brief overview of
this issue. A probability measure is actually defined over a σ-field of the sample space:
Definition 2.19 (σ-field).
We say that a set F of subsets of Ω is a σ-fields if
(i) Ω ∈ F
(ii) A ∈ F implies Ac ∈ F
(iii) Ai ∈ F for every i ∈ N implies ∪i∈NAi ∈ F
Example 2.20 (σ-fields). The following are σ-fields:
• F = {∅,Ω},
18
• F = {∅, {a}, {b, c},Ω} when Ω = {a, b, c},
• F = P(Ω), i.e., the set of the parts of Ω (i.e., P(Ω) = {E : E ⊂ Ω}).
We can now state the general probability axioms:
Definition 2.21. Let F be a σ-field over Ω. A probability measure over F is a function P : F → R that satisfies:
1. (Non-negativity) For all A ∈ F , P (A) ≥ 0.
2. (Countable additivity) If (An)n∈N is a sequence of disjoint sets in F , P(∪n∈NAn) =
∑
n∈N P(An).
3. (Normalization) P(Ω)= 1.
Some sets F are more useful than others. For instance, it is possible to define the continuous uniform measure
over a special σ-field called the Borelians of (0, 1), however it is not possible to do it over the set of the parts of
(0, 1). This is the reason why σ-fields are necessary in modern probability theory. For more on this issue, see for
instance (Billingsley, 2008).
Exercises
Exercise 2.22. Let Ω = {a, b, c, d}. Is {∅, {a}, {b, c},Ω} a σ-field of Ω?
Exercise 2.23. Let F be a σ-field. Prove that if Ai ∈ F for every i ∈ N, then ∩i∈NAi ∈ F .
2.3 Random variables
Numbers we are uncertain about occupy a special position in probability theory. Random variables are functions
from Ω to R and are used to represent uncertainty about numbers.
Definition 2.24. A random variable X is a function such that X : Ω→ R.
Example 2.25. Consider a single coin flip. Ω = {H,T}. The number of times we observed tails is represented
through a random variable X such that X(H) = 0 and X(T ) = 1.
Example 2.26. Consider that a given day can be rainy - R - or not rainy - NR. We are interested in the weather
of the next two days. Ω = {(NR,NR), (NR,R), (R,NR), (R,R)}. The number of times it will be rainy in the
next two days is represented through a random variable X defined in table 1.
w ∈ Ω X(w)
(NR,NR) 0
(NR,R), (R,NR) 1
(R,R) 2
Table 1: Number of rainy days (Example 2.26)
Definition 2.27 (Indicator function). The indicator function is a special type of random variable. Consider an
event A ∈ F . The indicator function of A is denoted by IA : Ω→ R and defined as:
IA(w) =
1 , if w ∈ A0 , otherwise
We will further discuss random variables in Chapter 3.
19
Exercises
Exercise 2.28. Let A and B be two disjoint events. Can you write IA + IB as a single indicator function?
Solution: Since A and B are disjoint, for any w ∈ A ∪ B, IA(w) + IB(w) = 1. Similarly, for any w /∈ A ∪ B,
IA(w) + IB(w) = 0. Hence, IA + IB = IA∪B.
Exercise 2.29. Let A and B be two events. Is it true that IA∪B − IA − IB = 0? Why?
Solution: The statement is false. Let A = {1, 2}, B = {2, 3}. Observe that A ∪B = {1, 2, 3}. Hence,
IA∪B(2)− IA(2)− IB(2) = 1− 1− 1 = −1
2.4 *Why were these axioms chosen?
There are many possible justifications for the axioms of probability. A simple one involves certain types of bets.
Consider a lottery ticket in which you get $1 if event A happens and $0, otherwise. Consider this game:
1. You define a price for this ticket Pr(A).
2. An opponent determines if you buy the ticket from him or sell the ticket to him.
Your monetary outcome is αA(IA − Pr(A)), where αA is −1 if the opponent makes you sell and αA = 1 if he
makes you buy. You are a sure loser if the opponent has the option to make you lose money for sure. When can
you be made a sure loser?
Consider that Pr(A) < 0. In this case, IA − Pr(A) > 0 and, therefore, the opponent can make you a sure loser
by setting αA = −1.
Consider that Pr(Ω) 6= 1. Observe that IΩ = 1. Hence IΩ − Pr(Ω) = 1− Pr(Ω).
• If Pr(Ω) < 1, IΩ − Pr(Ω) > 0 and, therefore, the opponent can make you a sure loser setting αA = −1.
• If Pr(Ω) > 1, IΩ − Pr(Ω) < 0 and, therefore, the opponent can make you a sure loser setting αA = 1.
Finally, let A and B be disjoint events. Observe that IA∪B − IA − IB = 0. Hence, if the opponent sets
αA∪B = 1, αA = −1, αB = −1, your final outcome is:
αA∪B(IA∪B − Pr(A ∪B)) + αA(IA − Pr(A)) + αB(IB − Pr(B)) =
(IA∪B − IA − IB) + (Pr(A) + Pr(B)− Pr(A ∪B)) =
Pr(A) + Pr(B)− Pr(A ∪B)
Similarly, if αA∪B = −1, αA = 1, αB = 1, then your final outcome is Pr(A ∪ B) − Pr(A) − Pr(B). Hence, if
Pr(A∪B) 6= P (A) +P (B) there exist an option such that the opponent makes you a sure loser. That is, in order
for you to avoid sure loss Pr must satisfy:
1. Pr(A) ≥ 0.
2. Pr(Ω) = 1.
3. If A and B are disjoint, Pr(A ∪B) = Pr(A) + Pr(B).
These rules closely resemble the axioms of probability. Hence, if one believes that how much one is willing to
bet can approximate degrees of belief, this reasoning can justify the axioms of probability.
20
2.5 Equiprobable outcomes
Consider the specific case in which Ω is a finite set and for all w1, w2 ∈ Ω, P({w1}) = P({w2}). That is the
probability of every possible outcome is the same. Equiprobability is a special case and does not apply to all
probability models.
Lemma 2.30. If all outcomes are equiprobable, then for every w ∈ Ω, P({w}) = |Ω|−1.
Proof. Notice that Ω = ∪w∈Ω{w}. Also observe that, if w1 6= w2, then {w1} is disjoint of {w2}. Hence,∑
w∈Ω
P({w}) = P (Ω) (Lemma 2.3)∑
w∈Ω
P({w}) = 1 (Definition 2.1)
|Ω| · P({w}) = 1 (Equiprobable outcomes)
P({w}) = |Ω|−1
Lemma 2.31. For every event A, P(A) = |A||Ω| .
Proof. Observe that A = ∪w∈A{w}. Since the events are disjoint,
P(A) =
∑
w∈A
P({w}) (Lemma 2.3)
=
∑
w∈A
|Ω|−1 (Lemma 2.30)
=
|A|
|Ω|
That is, in problems with equiprobable outcomes, a probability of a set is proportional to it’s size. In order to
determine sizes of sets, some counting principles are of great help. We saw some combinatorial analysis in Chapter
1, here’s a quick review in the context of the examples we have been using:
• Consider you have n different objects. The number of ways you can select k times among them with
replacement is nk.
Example 2.32. Consider a six sided die. It can assume 6 different values. Hence, if we throw the die 3
times in a row, we can observe 63 different outcomes.
• Consider you have n different objects. If the order of selection matters, the number of ways you can select
k ≤ n of these objects is n!(n−k)! .
Example 2.33. Consider there are n participants in a given chess tournament. In how many ways can the
list of top 3 be composed? n(n−3)! .
• Consider you have n different objects. If the order of selection doesn’t matter, the number of ways you can
select k ≤ n of these objects is (nk) = n!k!(n−k)! .
Example 2.34. The number of poker hands is
(
52
5
)
.
21
Exercises
Exercise 2.35. Consider we throw a fair coin 3 times.
a. What is the sample space?
b. What is the probability of each outcome?
c. What set corresponds to “obtain at least two heads”? What is the probability of this set?
Solution:
a. The sample is Ω = {(H,H,H), (H,H, T ), (H,T,H), (H,T, T ), (T,H,H), (T,H, T ), (T, T,H), (T, T, T )}.
b. Using Lemma 2.31, for each w ∈ Ω, P ({w}) = 18 .
c. The set is H≥2 = {(H,H,H), (H,H, T ), (H,T,H), (T,H,H)}. Using Lemma 2.31, P (H≥2) = |H≥2||Ω| = 48 .
Exercise 2.36.
a. How many distinct 7-digit phone numbers are there? Assume that the first digit of any telephone number
cannot be 0 or 1.
b. If I pick a number equally likely among all valid phone numbers, what is the probability that I pick 893−9240?
c. What about any number of the form 893− 924z, where z can be any digit between 0 and 8?
d. What about a number such that no 2 digits are the same?
Solution:
a. The first digit can assume 8 different values. The next six digits can each assume 10 different values. Hence,
the total number of phone numbers is 8× 106.
b. Using Lemma 2.31, since all numbers are equally probable, the probability that you pick any particular one
of them is |Ω|−1 = 1
8×106 .
c. Let A denote the set of numbers such tat no 2 digits are the same. Using Lemma 2.31, P (A) = |A|Ω . Also
observe that you can pick 8 different numbers for the 1-st digit. Next, for the second digit you can pick one
among 9, since it can’t the match 1-st digit. Next, for the third digit you can pick one among 8, since it
can’t match the 2-nd digit, and so on . . . Hence, |A| = 8 · 9 · 8 · 7 · 6 · 5 · 4 and P (A) = 483,840
8×106 ≈ 0.06.
Exercise 2.37. Suppose there are 4 people in a room. What is the probability that no two of them celebrate
their birthdays in the same month? Assume the probabilities of having a birthday in each month areequal and
the birthday of two people are independent from one another. Can you think of a situation in which assuming
independence wouldn’t be reasonable?
Solution: Let A denote the possible arrangements such that no two people celebrate their birthdays in the
same month. |A| = 12 · 11 · 10 · 9. Let B be the total possible arrangements |B| = 124. Since all possible
arrangements are equaly probable, the probability that no two people celebrate their birthday in the same month
is |A||B| =
12·11·10·9
124
.
22
1 2 3 4 5 6
1 0 1 2 3 4 5
2 1 0 1 2 3 4
3 2 1 0 1 2 3
4 3 2 1 0 1 2
5 4 3 2 1 0 1
6 5 4 3 2 1 0
Table 2: Value of X − Y for each possible outcome.
Exercise 2.38. Consider that a fair six-sided die is tossed twice. Consider that all outcomes are equally probable.
Let X be a random variable that denotes the outcome of the first die. Let Y be a random variable that denotes
the outcome of the second die. Find P (|X − Y | ∈ {0, 1, 2}).
Solution: Recall that we can express all possible outcomes in a table such that the rows correspond to the
outcomes in the first toss and the columns to the outcomes in the second toss. Table 2 presents the value of
|X −Y | for each of the possible outcomes. Observe that |X −Y | ∈ {0, 1, 2} in 24 out of the 36 possible outcomes.
Hence, since all outcomes are equiprobable, P (|X − Y | ∈ {0, 1, 2}) = 2436 .
Exercise 2.39. (Challenge) Player A has n+1 coins, while player B has n coins. Both players throw all of their
coins simultaneously and observe the number of heads each one gets. Assuming all the coins are fair, what is the
probability that A obtains more heads than B?
Solution: Let H+ denote the event that player A gets more heads than player B. Let T+ denote the event that
player A gets more tails than player B. Observe that either H+ or T+ always happen. That is, H+ ∪ T+ = Ω.
Also observe that H+ and T+ cannot happen at the same time. That is, H+ and T+ are disjoint. Finally, since
the coin is fair, by symmetry P (H+) = P (T+). Conclude that P (H+) = 12 .
Exercise 2.40 (Challenge). A rook can “capture” another if they are both in the same row or column of the
chess board. If one puts 8 rooks in the board such that the different placements are equally probable, what is the
probability that no rook can capture another?
Solution: Ω corresponds to all possible ways we can put 8 rooks in a chess board (64 positions). This
corresponds to the number of ordered subsets of size 8 out of a set of size 64, that is, |Ω| = 64(64−8)! . Let A denote
all positions such that no rook can capture another. The first rook can be put in any of the 64 positions. Next,
the second rook can be put in any position except one in the same row or column as the 1st rook, that is, 7 × 7
positions. The third rook can be put in any position except on ine the same row or column as the 1-st or 2-nd
rook, that is, 6× 6 positions, and so on . . . Hence, P (A) = 8!8!64!
(64−8)!
≈ 0.000009.
Exercise 2.41 (Challenge). Two magicians perform the following trick: A person randomly picks 5 cards out of
a fair deck and hands them out to Magician 1. Magician 1 puts the five cards in the following way. The first
card is facing down. The next four cards are facing up in the order he chooses. Magician 2 looks at the arranged
cards and guesses which one is facing down. What plan can the magicians agree upon so that Magician 2 always
answers correctly?
Solution: One can’t reveal a magic trick, right?
23
2.6 Conditional probability
Conditional probability is an extension of probability. Let A and B be two events, the conditional probability of
A given B is denoted by P(A|B). We interpret this probability as the uncertainty about A assuming that B is
true. In this sense, regular probability is a particular case of conditional probabilities since Ω is always true and,
therefore, P(A) = P(A|Ω). Besides the usual axioms of probability (Definition 2.1), conditional probabilities also
satisfy an extra axiom:
Definition 2.42 (Axiom of Conditional Probabilities).
P(A ∩B) = P(A)P(B|A)
That is, the probability of A and B happening is the same as the probability of A happening multiplied by the
probability of B happening given that A happened.
Example 2.43. Consider the outcome of a fair six-sided die. Ω = {1, 2, 3, 4, 5, 6} and all elements of Ω are
equally likely. Let Odd correspond to the event that the outcome is an odd number, Odd = {1, 3, 5}. Observe
that P(Odd) = 12 and P({3}) = 16 . Using definition 2.42,
P({3} ∩Odd) = P(Odd)P({3}|Odd)
Since {3} ⊂ Odd, {3} ∩Odd = {3} and
P({3}) = P(Odd)P({3}|Odd)
Substituting the values for P({3}) and P(Odd), we get P({3}|Odd) = 13 .
Example 2.44. Consider the outcome of a fair six-sided die. Ω = {1, 2, 3, 4, 5, 6} and all elements of Ω are equally
likely. Let A = {3, 4, 5} and B = {1, 2, 3, 4}. Using definition 2.42,
P(A ∩B) = P(A|B)P(B)
P({3, 4}) = P(A|B)P({1, 2, 3, 4})
Hence, P(A|B) = 24 . We can also compute P(B|A).
P(A ∩B) = P(B|A)P(A)
P({3, 4}) = P(A|B)P({3, 4, 5})
That is, P(B|A) = 23 and, therefore, P(A|B) 6= P(B|A).
Example 2.45. Let A ⊂ B. Using definition 2.42,
P(A ∩B) = P(A)P(B|A)
24
Since A ⊂ B, P(A ∩B) = P(A). Hence, P(B|A) = 1. Similarly,
P(A ∩B) = P(B)P(A|B)
P(A) = P(B)P(A|B)
and P(A|B) = P(A)P(B) .
Definition 2.46. Two events A and B are independent if P(A ∩B) = P(A)P(B).
Lemma 2.47. A and B are independent if and only if P(A|B) = P(A). In other words, A and B are independent
if and only if one’s uncertainty about A does not change assuming that B is true.
Proof. Assume P(A|B) = P(A). Recall from definition 2.42 that P(A∩B) = P(A|B)P(B). Using the assumption,
P(A ∩B) = P(A)P(B) and A and B are independent.
Assume A and B are indepedent. Hence, P(A)P(B) = P(A ∩ B) = P(A|B)P(B). Putting both ends of the
equality together and dividing by P(B), P(A|B) = P(A).
Example 2.48. Let Ω = {(H,H), (H,T ), (T,H), (T, T )} and all outcomes be equiprobable. Let A denote “The
outcome of the first coin flip is heads”, A = {(H,H), (H,T )}. Let B be “The outcome of the second coin flip
is heads”, B = {(H,H), (T,H)}. Observe that A ∩ B = {(H,H)} and P(A ∩ B) = 14 . Also, P(A) = P(B) = 12 .
Hence, since P(A ∩B) = P(A)P(B), A and B are independent.
A common mistake is to believe that one can conclude independence from disjointness or the reverse. Recall
that A and B are disjoint if A ∩ B = ∅, that is P(A ∩ B) = 0. In words, A and B are disjoint if it is impossible
for A and B to happen simultaneously. On the other hand A and B are independent if P(A ∩ B) = P(A)P(B).
Hence, for example, if P(A) > 0 and P(B) > 0 two events cannot simultaneously be independent and disjoint.
Definition 2.49. The events A1, A2, . . . An are jointly independent if, for every I ⊂ {1, 2, . . . , n},
P(∩i∈IAi) =
∏
i∈I
P(Ai)
Theorem 2.50 (Multiplication rule). Let A1, A2, . . . An be events. Then
P(∩ni=1Ai) = P(A1)P(A2|A1)P(A3|A1 ∩A2) . . .P(An| ∩n−1i=1 Ai)
Remark: it is common to denote P(∩ni=1Ai) by P(A1, . . . , An).
Exercises
Exercise 2.51. Let A and B be two events such that P(A) = 12 , P(B) =
1
3 and P(A ∩B) = 14 . Find:
a. P(A|B)
b. P(A|Bc)
c. P(B|A)
d. P(Ac|Bc)
Solution:
25
a.
P(A|B) = P(A ∩B)
P(B)
(Definition 2.42)
=
1/4
1/3
=
3
4
b.
P(A|Bc) = P(A ∩B
c)
P(Bc)
(Definition 2.42)
=
P(A)− P(A ∩B)
1− P(B) (A = (A ∩B) ∪ (A ∩B
c))
=
1/2− 1/4
2/3
=
3
8
c.
P(B|A) = P(B ∩A)
P(A)
(Definition 2.42)
=
1/4
1/2
=
1
2
d.
P(Ac|Bc) = P(A
c ∩Bc)
P(Bc)
(Definition 2.42)
=
P(Bc − P(A ∩Bc)
P(Bc)
(Bc = (A ∩Bc) ∪ (Ac ∩Bc)
= 1− P(A ∩B
c)
P(Bc)
= 1− P(A|Bc) = 5
8
(Definition 2.42)
Exercise 2.52. Explain how the cartoon in Figure 1 from http://xkcd.com/795/ relates to the concept of
conditional probability.
Exercise 2.53.Consider that a coin is flipped twice. I announce that there was at least 1 heads. What is the
probability that there were 2 heads?
Solution: Observe that Ω = {(H,H), (H,T ), (T,H), (T, T )}. Let A denote the event that there was at least
1 heads. A = {(H,H), (H,T ), (T,H)}. Let B denote the event that there were 2 heads. B = {(H,H)}. Using
Definition 2.42,
P (B|A) = P (A ∩B)
P (A)
Since all elements are equiprobable, using Lemma 2.31,
26
Figure 1
P(A ∩B)
P(A)
=
|A∩B|
|Ω|
|A|
|Ω|
=
|A ∩B|
|A| =
1
3
Exercise 2.54. A drug prescription states the following information:
• There is a 10% chance of experiencing headache (event H).
• There is a 15% chance of experiencing nausea (event N).
• There is a 5% chance of experiencing both side effects.
a. Are the events H and N disjoint? Why?
b. Are the events H and N independent? Why?
c. What is the probability of experiencing at least one of the two side effets?
d. What is the probability of experiencing exactly one of the two side effects?
e. What is the probability of experiencing neither headache or nausea?
f. What is the probability of experiencing headache given that you experienced at least one of the two side
effects?
Solution:
a. No, since P(H ∩N) = 5%. Hence H ∩N 6= ∅.
b. P(H ∩N) = 0.05 6= 0.015 = 0.1 · 0.15 = P(H)P(N). Hence, H and N are not independent.
c. P(H ∪N) = P(H) + P(N)− P(H ∩N) = 0.05 + 0.15 = 0.2.
27
d.
P((H ∩N c) ∪ (Hc ∩N)) = P(H ∩N c) + P(Hc ∩N)
= P(H)− P(H ∩N) + P(N)− P(N ∩H)
= 0.1− 0.05 + 0.15− 0.05 = 0.15
e. P(Hc ∩N c) = 1− P(H ∪B) = 1− 0.2 = 0.8.
f. P(H|H ∪N) = P(H∩(H∪N))P(H∪N) = P(H)P(H∪N) = 0.10.2 = 0.5.
Exercise 2.55. Assume a card is taken from a well shuffled deck. Let A denote “the card is an ace” and B denote
“the card’s suit is diamonds”. Show that A and B are independent.
Solution: There are 52 cards in a deck and, therefore, |Ω| = 52. Observe that A = {(1,♦), (1,♥), (1,♣), (1,♠)},
B = {(1,♦), (2,♦), . . . , (K,♦)} and A ∩ B = {(1,♦)}. Observe that P(A ∩ B) = |A∩B||Ω| = 152 , P(A) = |A||Ω| = 452
and P(B) = |B||Ω| =
13
52 . Hence,
P(A ∩B) = 1
52
=
4
52
· 13
52
= P(A)P(B)
That is, A and B are independent (Definition 2.46).
Exercise 2.56. Two cards are taken from a well shuffled deck:
a. Let A be “the first card’s suit is red” and B be “the second card’s suit is red”. Compute P (A ∩B). Are A
and B independent?
b. Let A be “the first card’s suit is red” and B be “the second card is a J , Q or K”. Compute P (A∩B). Are
A and B independent?
Solution:
a. |Ω| corresponds to the number of ways one can pick two cards in order out of a deck, that is 52!(52−50)! = 52×51.
In order for a pick to be an element of A the first card has to be red (26 choices) and, then, the second
one can be any (51 choices). That is, |A| = 26 × 51. Similarly, in order for a pick to be an element of B
the second card has to be red (26 choices) and, then, the first one can be any color (51 choices). That is
|B| = 51 × 26. Finally, for a pick to be in A ∩ B, the first card must be red (26 choices) and, next, the
second card must be red (25 choices left). That is, |A ∩B| = 26× 25. Using Lemma 2.31,
P(A ∩B) = |A ∩B||Ω| =
26× 25
52× 51 6=
26
52
· 26
52
=
|A|
|Ω| ·
|B|
|Ω| = P(A)P(B)
That is, A and B are not independent (Definition 2.46).
b. Similarly to the last item, observe that P (A) = 2652 and P (B) =
12
52 . Observe that, in order for a pick to be
in A ∩ B the first must be red (26 choices). Out of these 26 choices, 20 are such that you don’t pick a J ,
Q or K. Hence, if you pick any of these 20, you have 12 choices available for the next card. The other 6
choices are such that you pick a J , Q or K. In this case, you only have 11 choices left for the next card.
Hence, |A ∩B| = 20× 12 + 6× 11. Conclude that
P(A ∩B) = |A ∩B||Ω| =
20× 12 + 6× 11
52× 51 6=
26
52
· 13
52
=
|A|
|Ω| ·
|B|
|Ω| = P(A)P(B)
28
That is, A and B are not independent (Definition 2.46).
Exercise 2.57. Show that:
1. A and B are independent if and only if P(B|A) = P(B).
2. A and B are independent if and only if P(A|Bc) = P(A).
Solution:
a. If P(B|A) = P(B), then P(A ∩ B) = P(A)P(B|A) = P(A)P(B). Similarly if P(A ∩ B) = P(A)P(B), then
P(A)P(B|A) = P(A)P(B) and P(B|A) = P(B).
b. Observe that (A ∩ B) and (A ∩ Bc) are disjoint and (A ∩ B) ∪ (A ∩ Bc) = A. Hence, using Lemma 2.3,
P(A) = P(A ∩B) + P(A ∩Bc). Using Definition 2.42,
P(A) = P(B)P(A|B) + P(Bc)P(A|Bc)
= P(B)P(A|B) + (1− P(B))P(A|Bc)
If A and B are independent P(A|B) = P(A) and
P(A) = P(B)P(A) + (1− P(B))P(A|Bc)
(1− P(B))P(A) = (1− P(B))P(A|Bc)
That is P(A) = P(A|Bc). If P(A) = P(A|Bc), then
P(A) = P(B)P(A|B) + (1− P(B))P(A)
P(B)P(A) = P(B)P(A|B)
That is, P(A|B) = P(A) and A is independent of B.
Exercise 2.58.
a. If A is independent of B and B is independent of C, are A and C independent? Why?
b. If A is independent of B, B is independent of C and A is independent of C, is P(A∩B∩C) = P(A)P(B)P(C)?
Solution:
a. Consider two flips of a fair coin, Ω = {(H,H), (H,T ), (T,H), (T, T )}. Let A denote “the first coin flip is
heads”, B denote “the second coin flip is heads” and C = A. We showed in class that A is independent of
B and B is independent of C. Nevertheless P(A ∩ C) = P(A) = 12 6= 14 = P(A)P(C). Hence, A and C are
not independent.
b. Let Ω, A and B be the same as in the previous item. Let C denote the event that the coin flip land on different
values C = {(T,H), (H,T )}. Observe that P(A ∩ B ∩ C) = 0 6= P(A)P(B)P(C), but P(A ∩ C) = P(A)P(C)
and P(B ∩ C) = P(B)P(C) and P(A ∩B) = P(A)P(B).
Exercise 2.59. Prove Theorem 2.50.
29
Exercise 2.60. A fair coin is tossed 10 times. What is the probability that no two consecutive heads and no two
consecutive tails appear?
Solution: Let A denote the event “No two consecutive heads and no two consecutive tails appear”. A =
{(H,T,H, T, . . .), (T,H, T,H, . . .)}. Hence,
P(A) = P({(H,T,H, T, . . .)}) + P({(T,H, T,H, . . .)})
Since the coin flips are independent and the coin is fair, P({(H,T,H, T, . . .)}) = P({(T,H, T,H, . . .) = 1
210
. Thus,
P(A) = 2−9.
Exercise 2.61. Let A be an event. Prove that P(•|A) is indeed a probability measure.
Exercise 2.62. Let X denote the outcome of a fair six-sided die. Find P((X − 3)2 = 1|X 6= 3).
Solution: In a fair six-sided die we have Ω = {1, 2, 3, 4, 5, 6} and all outcomes are equally probable. The
problem also states that, for every w ∈ Ω), X(w) = w. Let
A = {(X − 3)2 = 1}
= {w ∈ Ω : (X(w)− 3)2 = 1}
= {w ∈ Ω : (w − 3)2 = 1}
= {2, 4}
B = {X 6= 3}
= {w ∈ Ω : X(w) 6= 3}
= {w ∈ Ω : w 6= 3}
= {1, 2, 4, 5, 6}
Hence,
P((X − 3)2 = 1|X 6= 3) = P(A|B)
=
P(A ∩B)
P(B)
=
P({2, 4}
P({1, 2, 4, 5, 6})
=
2/6
5/6
=
2
5
Exercise 2.63. (Challenge) Player A has n+1 coins, while player B has n coins. Both players throw all of their
coins simultaneously and observe the number of heads each one gets. Assuming all the coins are fair, what is the
probability that A obtains more or equal heads as B? (This problem is quite hard. You are not required to know
how to solve this one.)
Solution: This problem is quite hard. You are not required to know how to solve this one. For now, I’ll leave
it open for anyone who has trouble going to sleep. . .
30
Exercise 2.64. Prove that if A1, . . . , An are independent events and their union is the sample space, then
P(Ai) = 1 for at least one i.
31
2.7 The law of total probability and Bayes theorem
Example 2.65. Consider that there are 2 coins inside a box. You know that one of them is fair and that the other
one is biased, with probability 13 of heads. Suppose you remove either one of the coins with equal probability, flip
that coin twice and obtain two heads.
Denote by C 1
2
the event that you remove thefair coin, by C 1
3
the event that you remove the biased coin and by
H2 denote the event that you obtain two heads. The problem states that P(C 1
2
) = P(C 1
3
) = 12 . We also know that
P(H2|C 1
2
) = 14 and P(H2|C 13 ) =
1
9 . What is your belief about which coin you picked after performing the flips? In
other words, can you use the previous probability values and the axioms of probability to determine P(C 1
2
|H2)?
What about the value of P(H2)?
The answer is yes and, by the end of class, we’ll be able to solve this problem!
In the previous classes we proved various simple lemmas that follow from the axioms of probability. In this
Section we’ll use these lemmas as building blocks for two general results. In order to fully appreciate the beauty
of the next results, try for a few minutes your best shot at Example 2.65.
Theorem 2.66 (Law of Total Probability). Let (An)n∈N be a partition of Ω and B be an event.
P(B) =
∑
n∈N
P(An)P(B|An)
Proof. Observe that
B = B ∩ Ω (4)
Also, since (An)n∈N is a partition of Ω,
Ω = ∪n∈NAn (5)
Using equation 5 on equation 4, obtain:
B = B ∩ (∪n∈NAn) (6)
= ∪n∈N(B ∩An) (7)
and, therefore,
P(B) = P(∪n∈N(B ∩An)) (8)
Next, since (An)n∈N is a partition, it is a disjoint sequence. Hence, for every i 6= j, Ai ∩Aj = ∅ and
32
(B ∩Ai) ∩ (B ∩Aj) = B ∩ (Ai ∩Aj) (9)
= B ∩ ∅ (10)
= ∅ (11)
That is, (B ∩An)n∈N also is a disjoint sequence. Thus, from the axioms of probability (Definition 2.1),
P(∪n∈N(B ∩An)) =
∑
n∈N
P(B ∩An) (12)
Finally, from the axiom of conditional probability (Definition 2.42), for each n ∈ N, P(B∩An) = P(An)P(B|An).
Using equations 8 and 12, conclude that:
P(B) =
∑
n∈N
P(An)P(B|An) (13)
Lemma 2.67. If A1, . . . , An partitions Ω and B is an event. P(B) =
∑n
i=1 P(An)P(B|An).
Proof. If A1, . . . , An partitions Ω, the sequence (Ci)i∈N such that Ci = Ai for i ≤ n and Ci = ∅ for i > n also
partitions Ω. The proof follows straight from Theorem 2.66 and P(∅) = 0.
The previous result allows us to relate an unconditional probability P(B) to conditional probabilities P(B|An).
Example 2.68 (Example 2.65 continued). The problem description tells us the probability of obtaining heads
once the coin is decided, that is, P(H2|C 1
2
) and P(H2|C 1
3
). Nevertheless, it doesn’t directly tell us the probability
of obtaining H2 without knowing which coin was picked, P(H2). Observe that C 1
2
, C 1
3
partitions Ω and, therefore,
it follows from lemma 2.67 that
P(H2) = P(C 1
2
)P(H2|C 1
2
) + P(C 1
3
)P(H2|C 1
3
)
=
1
2
· 1
4
+
1
2
· 1
9
=
13
72
Simple, right? We’re starting to benefit from building a rigorous theory instead of relying on intuition alone.
Theorem 2.69 (Bayes Theorem). Let (Ai)i∈N be a partition of Ω and B be an event. For every n ∈ N,
P(An|B) = P(An)P(B|An)∑
i∈N P(Ai)P(B|Ai)
Proof. Recall from the axiom of conditional probability (Definition 2.42) that
P(An|B) = P(An ∩B)P(B) (14)
Using the axiom of conditional probability again, P(An ∩B) = P(An)P(B|An). Using equation 14,
33
P(An|B) = P(An)P(B|An)P(B) (15)
Finally, observe that (Ai)i∈N is a partition of Ω and, therefore, using Theorem 2.66, P(B) =
∑
n∈N P(An)P(B|An).
Plugging this value into equation 15
P(An|B) = P(An)P(B|An)∑
n∈N P(An)P(B|An)
(16)
Lemma 2.70. Let A1, . . . , An partition Ω and B be an event.
P(Ai|B) = P(Ai)P(B|Ai)∑n
i=1 P(Ai)P(B|Ai)
Proof. Follow the proof of Theorem 2.69. In the last step, use Lemma 2.67 instead of Theorem 2.66.
Example 2.71 (Example 2.65 continued). The problem description tells us the probability of obtaining two heads
given the bias of the coin, P(H2|C 1
2
) = 14 and P(H2|C 13 ) =
1
9 . It is less intuitive to find out what one learns about
the bias of the coin by observing that two heads, P(Ci|H2). Bayes Theorem provides the answer.
Recall that C 1
2
, C 1
3
is a partition of Ω. Hence, by Lemma 2.70,
P(C 1
2
|H2) =
P(C 1
2
)P(H2|C 1
2
)
P(C 1
2
)P(H2|C 1
2
) + P(C 1
3
)P(H2|C 1
3
)
=
1
2 · 14
1
2 · 14 + 12 · 19
=
9
13
Exercises
Exercise 2.72. Assume 5% of men and 0.25% of women are daltonic. Assume a random person is selected and
he/she is daltonic. Assuming that there are the same number of men and women, what is the probability that
this person is a man?
Exercise 2.73. Assume that there exist 4 fair coins in a bag. A person picks a number of coins with equal
probability among {1, 2, 3, 4}. Next, the person throws all the coins he picked. What is the probability that all
coins land tails, without knowing how many were picked?
Solution: Let Ci be the event that i coins are picked. Let H
A denote the event that all coins land heads.
Observe that C1, C2, C3, C4 partitions Ω. Hence, by Theorem 2.66,
P(HA) =
4∑
i=1
P(Ci)P(HA|Ci)
=
4∑
i=1
1
4
· 1
2i
=
15
64
34
Exercise 2.74. Medical case histories indicate that different illnesses may produce identical symptoms. Suppose
that a particular set of symptoms, H, occurs only when one of three illnesses, I1, I2 or I3 occurs. Assume that
the simultaneous occurrence of more than one of these illnesses is impossible. Also assme that:
a. P(I1) = 0.01; P(I2) = 0.05; P(I3) = 0.02.
b. P(H|I1) = 0.90; P(H|I2) = 0.95; P(H|I3) = 0.75.
Assuming that an ill person exhibits the symptoms, H, what is the probability that the person has illness I1?
Solution: Observe that, since I1, I2, I3 are disjoint, I1, I2, I3, I
c
1∩Ic2∩Ic3 partition Ω. Hence, by Bayes Theorem:
P(I1|H) = P(I1)P(H|I1)P(I1)P(H|I1) + P(I2)P(H|I2) + P(I3)P(H|I3) + P(Ic1 ∩ Ic2 ∩ Ic3)P(H|Ic1 ∩ Ic2 ∩ Ic3)
The problem states that P(H|Ic1 ∩ Ic2 ∩ Ic3) = 0. Hence,
P(I1|H) = P(I1)P(H|I1)P(I1)P(H|I1) + P(I2)P(H|I2) + P(I3)P(H|I3)
=
0.01× 0.90
0.01× 0.90 + 0.05× 0.95 + 0.02× 0.75 ≈ 0.12
P(I1|H) is much different from P(H|I1).
Exercise 2.75. Consider that each component C1, . . . , Cn in a system fails independently with probability p. A
system fails if there is no operational path from B to E. What is the probability that the following systems fail?
a. Series system:
B //C1 //C2 //E
b. Parallel system:
C1
��
B
??
��
E
C2
??
c. Mixed system:
C2
��
B //C1
??
��
E
C3
??
Solution: Let S denote the event that the system fails. Let Fi denote the event that component i fails.
a. The system fails if either C1 or C2 fail. That is, S = F1 ∪ F2. By Lemma 2.3,
35
P(F1 ∪ F2) = P(F1) + P(F2)− P(F1 ∩ F2)
Since F1 and F2 are independent P(F1 ∩ F2) = P(F1)P(F2). Hence, P(F1 ∪ F2) = 2p− p2.
b. The system fails if C1 and C2 fail. That is, S = F1∩F2. By independence, P(S) = P(F1∩F2) = P(F1)P(F2) =
p2.
c. The system fails if either C1 fails or C2 and C3 fail. That is, S = C1 ∪ (C2 ∩ C3).
P(C1 ∪ (C2 ∩ C3)) = P(C1) + P(C2 ∩ C3)− P(C1 ∩ (C2 ∩ C3))
= P(C1) + P(C2)P(C3)− P(C1)P(C2)P(C3)
= p+ p2 − p3
Exercise 2.76. (Monty Hall) There are 3 doors: A, B and C. There is a car behind one of these doors. If you
pick the right one, you get the car. At first, you believe that it is equally likely that the car is behind any one of
the doors. You pick door A. Next, the show’s presenter opens door B, shows that it is empty and allows you to
change to door C. Is it a good idea to change? Assume the showman would always open a door with no prize
and offer you the chance to change your door, no matter if you initially chose the door with the prize or not.
Solution: Let Pri denote the event that the prize is behind door i. Let Oi denote the event that Monty opens
door i and nothing is behind it. The problem states that P (PrA) = P (PrB) = P (PrC) =
1
3 . Also, A1, A2, A3 is
a partition. We want to know P (PrA|Ob). Using Theorem 2.70 (Bayes),
P(PrA|OB) = P(PrA)P(OB|PrA)P(PrA)P(OB|PrA) + P(PrB)P(OB|PrB)+ P(PrC)P(OB|PrC)
We know that Monty would never open door B if the prize were there. Hence, P (OB|PrB) = 0. We also know
that, if the prize were behind door C, Monty couldn’t open either door C or A (that’s the one you picked!). Hence,
you know that P (OB|PrC) = 1. Substituting these values:
P (PrA|OB) =
1
3P (OB|PrA)
1
3P (OB|PrA) + 13
=
P (OB|PrA)
P (OB|PrA) + 1
Observe that P (PrA|OB) is increasing in P (OB|PrA) and assumes values between 0 (P (OB|PrA) = 0) and 12
(P (OB|PrA) = 1). Hence, as long as P (OB|PrA) 6= 1, then P (PrA|OB) < P (PrC |OB). Also, if P (OB|PrA) = 1,
P (PrA|OB) < P (PrC |OB). Thus, in the worst case scenario, changing doors doesn’t do you any harm.
Exercise 2.77. (Polya’s urn) Consider that an urn has 1 black ball and 1 white ball. Every time you draw a
ball, you must put it back into the urn and add an extra ball with the same color. What is the probability that
you get a white ball in your 3rd draw from the urn?
36
Solution: Let Wi denote the event that you get a white ball in the i− th draw. We wish to compute P (W3).
Observe that (W1,W2), (W1,W
c
2 ), (W
c
1 ,W2), (W
c
1 ,W
c
2 ) partition Ω. Hence, using Theorem 2.66,
P (W3) = P (W3|(W1 ∩W2))P (W1 ∩W2) + P (W3|(W1 ∩W c2 ))P (W1 ∩W c2 )
+ P (W3|(W c1 ∩W2))P (W c1 ∩W2) + P (W3|(W c1 ∩W c2 ))P (W c1 ∩W c2 )
= P (W1)P (W2|W1)P (W3|W2 ∩W1) + P (W1)P (W c2 |W1)P (W3|W c2 ∩W1)
+ P (W c1 )P (W2|W c1 )P (W3|W2 ∩W c1 ) + P (W c1 )P (W c2 |W c1 )P (W3|W c2 ∩W c1 )
=
1
2
· 2
3
· 3
4
+
1
2
· 1
3
· 2
4
+
1
2
· 1
3
· 2
4
+
1
2
· 2
3
· 1
4
=
1
2
Exercise 2.78. You have 12 red, 12 yellow balls and 2 urns. Consider that the balls are distributed among the
urns, one of the urns is randomly selected (the probabilities that you choose each urn are equal) and a ball is
drawn from the urn that was selected.
a. Assume you put 2 red balls and 4 yellow balls in the first urn and the rest of the balls in the second urn.
What is the probability that you draw a yellow ball?
b. (Challenge) What way of distributing the balls among the urns maximizes your probability of drawing a
yellow ball?
Solution: Let Ui denote the event that you pick urn i and Y denote the event that you draw a yellow ball.
a.
P(Y ) = P(Y |U1)P(U1) + P(Y |U2)P(U2) (Theorem 2.66)
=
4
6
· 1
2
+
8
18
· 1
2
=
5
9
b. This question involves a lot of calculations. You’re not required to know the solution.
Exercise 2.79. Probability Theory was used in the criminal case People v. Collins. In this case, a woman had
her purse robbed. Wittnesses claimed that a couple running from the scene was composed of a black man with
a beard and a mustache and a blond girl with her hair in a ponytail. Wittnesses also said the couple drove off
in a yellow car. Malcolm and Janet Collins were found to satisfy all the traits previously presented. A professor
of mathematics also stated in Court that, if a couple were randomly selected, one would obtain the following
probabilities:
Event Probability
Man with moustache 1/4
Girl with blonde hair 1/3
Girl with ponytail 1/10
Black man with beard 1/10
Interracial couple in a car 1/1000
Partly yellow car 1/10
37
Hence, the probability that all of these characteristics are found in a randomly chosen couple is 1 in 12, 000, 000.
The prosecution claimed that this constituted proof beyond reasonable doubt that the defendants were guilty. Do
you find this argument compelling? Why?
Solution: There are many arguments one can use to discredit the prosecutor’s reasoning. For the purpose of
this class, we observe two of such arguments (that are not the only one applicable):
• A person is picked randomly. Let A denote the event that the person is a man and has a moustache. Let
B denote the event that the person is a black man and has a beard. It is hard to believe that A and B are
independent. Hence, for example P(A ∩ B) 6= P(A)P(B). The prosecutor got the number 1 in 12, 000, 000
by simply multiplying the numbers in the table. This might be incorrect.
• (Prosecutor’s fallacy) Let E denote the event that corresponds to all the evidence found in trial. Let I
denote the event that the Collins’ are innocent. The prosecutor computed P(E|I). This is not the same as
P(I|E). Using Theorem 2.69,
P(I|E) = P(I)P(E|I)
P(I)P(E|I) + P(Ic)P(E|Ic)
Exercise 2.80. Prove that if (An)n∈N and (Bn)n∈N are two sequences of events such that limn−→∞ P(An) = 1
and limn−→∞ P(Bn) = p, then limn−→∞ P(An ∩Bn) = p.
38
3 Random Variables
3.1 Distribution of random variables
Recall that a random variable is an unknown number, that is, a function from Ω to R (Definition 2.24). A discrete
random variable is a random variable that only assumes a countable number of values.
Example 3.1. Consider that a person flips a fair coin once. The possible outcomes are heads or tails and
Ω = {H,T}. Let X denote the number of heads that are observed. That is, X(H) = 1 and X(T ) = 0.
Example 3.2. Consider that a person throws a fair coin until he obtains the first heads or 3 tosses, Ω =
{H,TH, TTH, TTT}. Denote by X the number of coin flips the person performs. X is a random variable such
that X(H) = 1, X(TH) = 2, X(TTH) = 3 and X(TTT ) = 3. X is discrete and only assumes a finite number of
values, 1, 2 or 3.
Example 3.3. A fair coin is tossed until it lands heads, Ω = {H,TH, TTH, TTTH, . . .}. Denote by X the
number of coin flips the person performs. X is a discrete random variable and can assume a countably infinite
number of values. X can assume any value in N− {0}. For example, X(H) = 1, X(TH) = 2, X(TTH) = 3, . . .
Definition 3.4. Let X be a random variable. For x ∈ R, we define pX(x) = P(X = x) = P({w ∈ Ω : X(w) = x}).
We call the function pX : R→ [0, 1] the probability mass function (pmf) of X.
Remark: The set {w ∈ Ω : X(w) = x} is often denoted by X−1({x}).
Example 3.5. Consider Example 3.1.
pX(0) = P(X = 0) = P({w ∈ Ω : X(w) = 0}) = P({T}) = 0.5
pX(1) = P(X = 1) = P({w ∈ Ω : X(w) = 1}) = P({H}) = 0.5
Doesn’t it feel good to write pX(1) instead of P({w ∈ Ω : X(w) = 1})? This is the power of good notation. . .
Example 3.6. Consider Example 3.2. Observe that the elements of Ω are not equiprobable. Hence, even though
{H} and {TH} are unitary sets, their probabilities are different.
In order to obtain the probabilities of the unitary subsets of Ω, define the event Hi as “the i-th coin flip is
heads”. Observe that all Hi are jointly independent and, since the coin is fair, P(Hi) = 0.5. Hence,
P({H}) = P(H1) = 0.5
P({TH}) = P(Hc1 ∩H2) = P(Hc1)P(H2) = 0.25
P({TTH}) = P(Hc1 ∩Hc2 ∩H3) = P(Hc1)P(Hc2)P(H3) = 0.125
P({TTT}) = P(Hc1 ∩Hc2 ∩Hc3) = P(Hc1)P(Hc2)P(Hc3) = 0.125
Thus, we can compute the pmf of X,
pX(1) = P(X = 1) = P({w ∈ Ω : X(w) = 1}) = P({H}) = 0.5
pX(2) = P(X = 2) = P({w ∈ Ω : X(w) = 2}) = P({TH}) = 0.25
pX(3) = P(X = 3) = P({w ∈ Ω : X(w) = 3}) = P({TTH, TTT}) = 0.25
39
Example 3.7. Consider Example 3.3. Observe that, once again, the elements of Ω are not equiprobable. Define
the event Hi as “the i-th coin flip is heads”. Observe that all Hi are jointly independent and, since the coin is
fair, P(Hi) = 0.5. Hence,
pX(1) = P(X = 1) = P({w ∈ Ω : X(w) = 1}) = P(H1) = 0.5
pX(2) = P(X = 2) = P({w ∈ Ω : X(w) = 2}) = P(Hc1 ∩H2) = 0.25
. . .
pX(n) = P(X = n) = P({w ∈ Ω : X(w) = n}) = P(Hc1 ∩ . . . Hcn−1 ∩Hn) =
1
2n
. . .
l l l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l l l
5 10 15 20
0.
00
0.
05
0.
10
0.
15
Probability of obtaining number of heads in 20 coin flips
Number of Heads
Pr
ob
ab
ilit
y
Figure 2
Lemma 3.8. Let X be a discrete random variable and pX be the pmf of X. Let χ be the possible values of X.
• For every x ∈ χ, 0 ≤ pX(x) ≤ 1.
• ∑x∈χ pX(x) =1.
Proof.
• pX(x) = P (X = x) = P ({w ∈ Ω : X(w) = x}). Recall that 0 ≤ P ({w ∈ Ω : X(w) = x}) ≤ 1.
• ∑x∈χ pX(x) = ∑x∈χ P ({w ∈ Ω : X(w) = x}). Observe that, for x1 6= x2, the events {w ∈ Ω : X(w) = x1}
and {w ∈ Ω : X(w) = x2} are disjoint. Hence, by countable additivity,
∑
x∈χ
pX(x) = P (∪x∈χ{w ∈ Ω : X(w) = x}) = P (Ω) = 1
40
There are many ways to represent a pmf. For example, a formula, a table of values or just writing a list of
values. A useful visualization tool is to plot the probability values against χ, as shown in Figure 2.
Any probabilities associated to X maybe be calculated using a pmf. One may also use cumulative distribution
functions, i.e., FX(x) := P(X ≤ x), however we leave this for continuous random variables (Chapter 5).
Definition 3.9. Let X1, . . . , Xn be discrete random variables. We say that they are independent if, for every
x1, . . . , xn ∈ R,
P(X1 = x1, . . . , Xn = xn) := P(∩ni=1Xi = xi) =
n∏
i=1
P(Xi = xi)
Therefore, for every x1, . . . , xn, {ω ∈ Ω : Xi(ω) = xi} are jointly independent.
Exercises
Exercise 3.10. Let X be a random variable such that X ∈ N and pX(i) = c · 2−i. Find c.
Solution: It follows from Lemma 3.8 that
∑
i∈N pX(i) = 1. Also,∑
i∈N
pX(i) =
∑
i∈N
c · 2−i = c ·
∑
i∈N
2−i = c · 1
1− 2−1 = c · 2
Hence, c · 2 = 1 and c = 2−1.
Exercise 3.11. Consider that you roll 3 times a coin with probability p of heads. Let X be the number of heads
you observe. Find the pmf of X.
Solution: There can be any number of heads from 0 to 3. Hence, we must determine the pmf of X for values
in {0, 1, 2, 3}. Let Hi denote the event that the i-th coin flip is heads. The Hi’s are jointly independent and are
such that P (Hi) = p.
pX(0) = P ({TTT}) = P (Hc1 ∩Hc2 ∩Hc3) = P (Hc1)P (Hc2)P (Hc3) = (1− p)3
pX(1) = P ({TTH, THT,HTT}) = P ({TTH}) + P ({THT}) + P ({HTT})
= P (Hc1)P (H
c
2)P (H3) + P (H
c
1)P (H2)P (H
c
3) + P (H1)P (H
c
2)P (H
c
3) = 3p(1− p)2
pX(2) = P ({THH,HTH,HHT}) = P ({THH}) + P ({HTH}) + P ({HHT})
= P (Hc1)P (H2)P (H3) + P (H1)P (H
c
2)P (H3) + P (H1)P (H2)P (H
c
3) = 3p
2(1− p)
pX(3) = p({HHH}) = P (H1 ∩H2 ∩H3) = P (H1)P (H2)P (H3) = p3
Exercise 3.12. You throw a four-sided die twice. Let X1 and X2 denote, respectively, the outcomes of the first
and second throw. Let Y = X1 +X2. Find the pmf of Y and of 2X1. Observe that the pmf of summing two die
outcomes is different from the pmf of multiplying a die outcome by two.
41
i {w ∈ Ω : 2 ·X1 = i} p2·X1(i)
2 {(1, 1), (1, 2), (1, 3), (1, 4)} 4 · 16−1
4 {(2, 1), (2, 2), (2, 3), (2, 4)} 4 · 16−1
6 {(3, 1), (3, 2), (3, 3), (3, 4)} 4 · 16−1
8 {(4, 1), (4, 2), (4, 3), (4, 4)} 4 · 16−1
i {w ∈ Ω : X1 +X2 = i} pX1+X2(i)
2 {(1, 1)} 1 · 16−1
3 {(1, 2), (2, 1)} 2 · 16−1
4 {(1, 3), (2, 2), (3, 1)} 3 · 16−1
5 {(1, 4), (2, 3), (3, 2), (4, 1)} 4 · 16−1
6 {(2, 4), (3, 3), (4, 2)} 3 · 16−1
7 {((3, 4), (4, 3)} 2 · 16−1
8 {(4, 4)} 1 · 16−1
Solution: Note that all outcomes are equally probable
Ω =

(1, 1), (1, 2), (1, 3), (1, 4),
(2, 1), (2, 2), (2, 3), (2, 4),
(3, 1), (3, 2), (3, 3), (3, 4),
(4, 1), (4, 2), (4, 3), (4, 4)

Observe that 2 ·X1 depends only on the first outcome. Also, using the tabular representation of Ω, observe that
X1 +X2 is constant on the diagonals of Ω. Using these ideas, observe that:
Exercise 3.13. Consider you pick a coin with equal probability among a fair coin and a coin with probability p
of landing heads. Next, you throw the coin you picked 3 times. Let X denote the number of heads you observe.
Find the pmf of X.
Solution: Let F denote the event that you picked the fair coin. If we knew the outcome of F , we could use
Exercise 3.11 to determine the relevant probabilities. Hence, we use the Law of Total Probability (Theorem 2.66):
pX(0) = P (X = 0) = P (X = 0|F )P (F ) + P (X = 0|F c)P (F c) = 0.53 · 0.5 + (1− p)3 · 0.5
pX(1) = P (X = 1) = P (X = 1|F )P (F ) + P (X = 1|F c)P (F c) = 3 · 0.53 · 0.5 + 3p(1− p)2 · 0.5
pX(2) = P (X = 2) = P (X = 2|F )P (F ) + P (X = 2|F c)P (F c) = 3 · 0.53 · 0.5 + 3p2(1− p) · 0.5
pX(3) = P (X = 3) = P (X = 3|F )P (F ) + P (X = 3|F c)P (F c) = 0.53 · 0.5 + 3p3 · 0.5
Exercise 3.14. A box has 4 blue balls and 4 red balls.
a. You draw 3 balls with replacement and X denotes number of blue balls drawn. Find the pmf of X.
b. You draw 3 balls without replacement and Y denotes number of blue balls drawn. Find the pmf of Y .
c. Describe in English the difference between the pmf’s of X and Y .
Solution: Let Bi denote the event that you get a blue ball on the i-th draw.
42
i 0 1 2 3
pX(i) 0.125 0.375 0.375 0.125
pY (i) 0.071 0.429 0.429 0.071
Table 3: Tabular representation of the pmf’s of X and Y
.
a.
pX(0) = P (B
c
1 ∩Bc2 ∩Bc3) = P (Bc1)P (Bc2)P (Bc3) = 8−1
pX(1) = P (B1 ∩Bc2 ∩Bc3) + P (Bc1 ∩B2 ∩Bc3) + P (Bc1 ∩Bc2 ∩B3) = 3 · 8−1
pX(2) = P (B1 ∩B2 ∩Bc3) + P (B1 ∩Bc2 ∩Bc3) + P (B1 ∩Bc2 ∩Bc3) = 3 · 8−1
pX(3) = P (B1 ∩B2 ∩B3) = 8−1
b.
pY (0) = P (B
c
1 ∩Bc2 ∩Bc3) = =
4
8
· 3
7
· 2
6
pY (1) = P (B1 ∩Bc2 ∩Bc3) + P (Bc1 ∩B2 ∩Bc3) + P (Bc1 ∩Bc2 ∩B3) = 3 ·
4
8
· 4
7
· 3
6
pY (2) = P (B1 ∩B2 ∩Bc3) + P (B1 ∩Bc2 ∩Bc3) + P (B1 ∩Bc2 ∩Bc3) = 3 ·
4
8
· 4
7
· 3
6
pY (3) = P (B1 ∩B2 ∩B3) = 4
8
· 3
7
· 2
6
c. Table 3 summarizes the results obtained in the previous items. Observe that sampling with replacement
generates results that are more concentrated in the intermediate values than sampling with replacement.
Exercise 3.15. Let X and Y be two independent discrete variables. Are eX and eY independent?
Solution: For every a ∈ Im(eX) and b ∈ Im(eY ),
P (eX = a ∩ eY = b) = P ({w ∈ Ω : eX(w) = a} ∩ {w ∈ Ω : eY (w) = b})
= P ({w ∈ Ω : X(w) = log(a)} ∩ {w ∈ Ω : Y (w) = log(b)})
= P (X = log(a) ∩ Y = log(b))
= P (X = log(a))P (Y = log(b)) (Independence of X and Y )
= P (eX = a)P (eY = b)
Hence eX and eY are independent.
Exercise 3.16.
a. Let X and Y be two independent random variables. Show that, for any functions f and g, f(X) is indepen-
dent of g(Y ).
b. Show an example of discrete random variables X, Y and functions f and g such that f(X) is independent
of g(X) but X is not independent of Y .
43
a.
P (f(X) = a ∩ g(Y ) = b) = P ({w ∈ Ω : f(X(w)) = a} ∩ {w ∈ Ω : g(X(w)) = b})
= P ({w ∈ Ω : X(w) ∈ f−1[a]} ∩ {w ∈ Ω : X(w) = g−1[b]})
= P (X ∈ f−1[a] ∩ Y ∈ g−1[b]})
= P (X ∈ f−1[a])P (Y ∈ g−1[b]})
= P (f(X) = a)P (g(X) = b)
b. Let Ω = {0, 1}, P ({0}) = P ({1}) = 0.5 and X(w) = Y (w) = w. Observe that
P (X = 0 ∩ Y = 0) = P ({0} ∩ {0}) = P ({0} = 0.5
P (X = 0)P (Y = 0) = P ({0})P ({0}) = 0.5 · 0.5 = 0.25
Hence, X is not independent of Y . Let Letf(0) = f(1) = g(0) = g(1) = 0. Observe that f(X) and g(Y )
can only assume the value 0.
P (f(X) = 0 ∩ g(X) = 0) = P ({0, 1} ∩ {0, 1}) = 1
P (f(X) = 0)P (g(X) = 0) = P ({0, 1})P ({0, 1}) = 1
Hence, f(X) and g(Y ) are independent.
44
3.2 (Conditional) Expected value
Definition 3.17. Let X be a discrete random variable and A an event. The expected value of X given A is
denoted by E[X|A] and
E[X|A] =
∑
w∈Ω
X(w)P({w}|A)
Definition 3.18. Let X be a discrete random variable. The expected value of X is denoted by E[X] and is equal
to E[X|Ω]. That is,
E[X] =
∑
w∈Ω
X(w)P({w})
Lemma 3.19 (Law of the unconscious statistician). Let X be a discrete random variable with pmf pX and that
assumes values in χ:
E[f(X)] =
∑
x∈χ
f(x) · pX(x)
Proof.
E[f(X)] =
∑
w∈Ω
f(X(w))P({w})
=
∑
x∈χ
∑
w:X(w)=x
f(X(w))P({w})
=
∑
x∈χ
f(x)
∑
w:X(w)=x
P({w})
=
∑
x∈χ
f(x) · pX(x)
In particular, observe that it follows from Lemma 3.19 that E[X] =
∑
x∈χ x · px(x). That is, E[X] is the weighted
average of the values of χ using pX as the weights.
Example 3.20 (Continuation of Example 3.1).

Outros materiais