Baixe o app para aproveitar ainda mais
Prévia do material em texto
7.3 Conditional probability density function We often wish to use observations to indirectly learn about other unobserved quantities. In order to do so, we must define how the observation of a random variable changes the distribution of another random variable. This is formally done by using the conditional probability density function, presented in Definition 7.20 Definition 7.20. Let X and Y be two vectors of random variables. The pdf of X given Y is defined as fX|Y(x|y) = f(X,Y)(x,y) fY(y) The conditional pdf, fX|Y(x|y), is the density function of X given that Y = y. When (X,y) is discrete, there exists a straightforward justification for this definition. Indeed, Lemma 7.21. If (X,Y) is discrete, then fX|Y(x|y) = P (X = x|Y = y) Proof. fX|Y(x|y) = f(X,Y)(x,y) fY(y) Definition 7.20 = P (X = x,Y = y) P (Y = y) Definition 7.6 = P (X = x|Y = y) Definition 2.42 When (X,Y) is continuous, the justification is more technical. This occurs because P (Y = Y) = 0 and, therefore, one cannot directly use Definition 2.42 to condition on {Y = y}. However, informally, if dx ≈ 0 and dy ≈ 0, then fX|Y(x|y)dx = f(X,Y)(x,y)dxdy fY(y)dy Definition 7.20 ≈ P (x ≤ X ≤ x + dx,y ≤ Y ≤ y + dy) P (y ≤ Y ≤ y + dy) Definition 7.6 = P (x ≤ X ≤ x + dx|y ≤ Y ≤ y + dy) Definition 2.42 In words, if (X,Y) is continuous, then fX|Y(x|y) determines how much probability is concentrated near x given that Y = y. Example 7.22. In Example 7.10, fX|Y (0|0) = f(X,Y )(x, y) fY (y) = 0.2 0.5 = 0.4 Definition 7.20 fX|Y (0|1) = 0.4 0.5 = 0.8 125 Also observe that fX|Y (1|0) = P (X = 1|Y = 0) Lemma 7.21 = 1− P (X = 0|Y = 0) Lemma 2.4 = 1− fX|Y (0|0) = 0.6 Lemma 7.21 fX|Y (1|1) = 1− fX|Y (0|1) = 0.2 Example 7.23. Let (X,Y ) be a continuous vector of random variables such that f(X,Y )(x, y) = 15 2 x(2− x− y)I(x > 0)I(y < 1) Notice that fY (y) = ∫ ∞ −∞ f(X,Y )(x, y)dx Lemma 7.11 = ∫ 1 0 15 2 x(2− x− y)I(y < 1)dx = 15 2 ( 2 3 − y 2 )I(y < 1) (18) Therefore, fX|Y (x|y) = f(X,Y )(x, y) fY (y) Definition 7.20 = 15 2 x(2− x− y)I(x > 0)I(y < 1) 15 2 ( 2 3 − y2 )I(y < 1) eq. (18) = x(2− x− y) 2 3 − y2 I(x > 0) = 6x(2− x− y) 4− 3y I(x > 0) Definition 7.24. P (X ∈ A|Y = y) = ∫ A fX|Y(x|y)dx Conditional pdf’s have similar properties as marginal pdf’s. For example, if one integrates out one of the coordinates of a conditional pdf, then one obtains the conditional pdf of the remaining coordinates. This result can be seen as a generalization of Theorem 2.66 to random variables. The formal result is stated in Theorem 7.25. Theorem 7.25 (Law of total probability for vectors of random variables).∫ ∞ −∞ fX|Y(x|y)dxi = fX−i|Y (x−i|y) 126 Proof. ∫ ∞ −∞ fX|Y(x|y)dxi = ∫ ∞ −∞ f(X,Y)(x,y) fY(y) dxi Definition 7.20 = 1 fY(y) ∫ ∞ −∞ f(X,Y)(x,y)dxi = 1 fY(y) f(X−i,Y )(x−i,y) Lemma 7.11 = fX−i|Y (x−i|y) Definition 7.20 Also, the conditional distribution of X given Y can be obtained from the distribution of Y given X and the marginal distribution of X. This result is presented in Theorem 7.26 and is a generalization of Bayes Theorem for vectors of random variables. Theorem 7.26 (Bayes theorem for vectors of random variables). fX|Y(x|y) = fX(x)fY|X(y|x)∫∞ −∞ fX(x)fY|X(y|x)dx Proof. fX|Y(x|y) = f(X,Y)(x,Y) fY(Y) Definition 7.20 = fX(x)fY|X(y|x) fY(Y) Definition 7.20 = fX(x)fY|X(y|x)∫∞ ∞ f(X,Y)(x,y)dx Lemma 7.11 = fX(x)fY|X(y|x)∫∞ −∞ fX(x)fY|X(y|x)dx Definition 7.20 Example 7.27. Let X ∼ Gamma(a, b). Also, Y |X = x ∼ Exponential(x). We can use Theorem 7.26 to find the 127 distribution of X given Y . Note that fX|Y (x|y) = fX(x)fY |X(y|x)∫∞ −∞ fX(x)fY |X(y|x)dx = ba Γ(a)x a−1 exp(−bx) · x exp(−xy)∫∞ −∞ ba Γ(a)x a−1 exp(−bx) · x exp(−xy)dx = xa exp(−(b+ y)x)∫∞ −∞ x (a+1)−1 exp(−(b+ y)x)dx = xa exp(−(b+ y)x) Γ(a+1) ba+1 ∫∞ −∞ ba+1 Γ(a+1)x (a+1)−1 exp(−(b+ y)x)dx = xa exp(−(b+ y)x) Γ(a+1) ba+1 · 1 Definition 5.41 = ba+1 Γ(a+ 1) xa exp(−(b+ y)x) That is, X|Y = y ∼ Gamma(a+ 1, b+ y). 7.3.1 Independence Independence describes a particular type of conditional density that is commonly used in Statistics. Informally, X1 and X2 are independent if, no matter what value is observed for X1, this observation brings no information about X2 (and vice-versa). This is a generalization of the concept of independence between events (Definition 2.46). Independence between random vectors is formally presented in 7.28 Definition 7.28. We say that X1, . . . ,Xd are conditionally independent given Y if, for every x1, . . . ,xd and y, f(X1,...,Xd)|Y(x1, . . . ,xd|y) = d∏ i=1 fXi|Y (xi|y) In particular, we say that X1, . . . ,Xd are independent if Y is empty, that is, for every x1, . . . ,xd f(X1,...,Xd)(x1, . . . ,xd) = d∏ i=1 fXi(xi) Example 7.29. Consider that fX1,X2|θ(x1, x2|t) = tx1+x2(1− t)2−x1−x2I(x1 ∈ {0, 1})I(x2 ∈ {0, 1}) 128
Compartilhar