Buscar

tema4_eng

Prévia do material em texto

Statistics II
Lesson 4. Simple linear regression
Academic Year 2010/11
Lesson 4. Simple linear regression
Contents
I The subject of regression analysis
I The specification of a simple linear regression model
I Least squares estimators: construction and properties
I Inferences about the regression model:
I Inference about the slope
I Inference about the variance
I Estimation of a mean response
I Prediction of a new response
Lesson 4. Simple linear regression
Learning objectives
I Know how to construct a simple linear regression model that
describes how a variable X influences another variable Y
I Know now to obtain point estimations of the parameters of this
model
I Know to construct confidence intervals and perform tests about the
parameters of the model
I Know to estimate the mean value of Y for a specified value of X
I Know to predict future values for the dependent (response) variable
Y
Lesson 4. Simple linear regression
Recommended bibliography
I Meyer, P. “Probabilidad y aplicaciones estad´ısticas” (1992)
I Chapter
I Newbold, P. “Estad´ıstica para los negocios y la econom´ıa” (1997)
I Chapter 10
I Pen˜a, D. “Regresio´n y ana´lisis de experimentos”(2005)
I Chapter 5
Introduction
A regression model is a model that describes how a variable X influences
the value of another variable Y .
I X : Independent or explanatory or exogenous variable
I Y : Dependent or response o endogenous variable
The aim is to obtain reasonable estimations of Y for different values of
X from a sample of n pairs of values (x1, y1), . . . , (xn, yn).
Introduction
Examples
I Study how the parents’ height may influence their children’s height
I Estimate the price of a house depending on its surface
I Predict the unemployment level for different ages
I Approximate the grades attained in a subject as a function of the
number of study hours per week
I Forecast the execution time of a program depending on the speed of
the processor
Introduction
Types of relation
I Deterministic: Given the value of X , the value of Y is perfectly
established. They are of the form:
Y = f (X )
Example: The relationship between the temperature measured in
degrees Celsius (X ) and the equivalent measure in degrees
Fahrenheit (Y ) is given by:
Y = 1.8X + 32
P
lo
t 
o
f 
G
ra
d
o
s 
F
ah
re
n
h
ei
t 
v
s 
G
ra
d
o
s 
ce
n
tí
g
ra
d
o
s
0
1
0
2
0
3
0
4
0
G
ra
d
o
s 
ce
n
tí
g
ra
d
o
s
3
2
5
2
7
2
9
2
1
1
2
Grados Fahrenheit
Introduction
Types of relation
I Non deterministic: Given the value of X , the value of Y is not
completely determined. They are of the form:
y = f (x) + u
where u is an unknown perturbation (random variable).
Example: A sample is taken regarding the volume of production (X )
and the total cost (Y ) associated with a product in a corporate
group.
P
lo
t 
o
f 
C
o
s
to
s
 v
s
 V
o
lu
m
e
n
V
o
lu
m
e
n
Costos
2
6
3
1
3
6
4
1
4
6
5
1
5
6
0
2
0
4
0
6
0
8
0
A relation exists but it is not exact
Introduction
Types of relation
I Linear: When the function f (x) is linear,
f (x) = β0 + β1x
I If β1 > 0 there is a positive linear relationship
I If β1 < 0 there is a negative linear relationship
Relación lineal positiva
X
Y
-2 -1 0 1 2
-6
-2
2
6
10
Relación lineal negativa
X
Y
-2 -1 0 1 2
-6
-2
2
6
10
The data show a linear pattern
Introduction
Types of relation
I Nonlinear: When the function f (x) is nonlinear. For exmaple,
f (x) = log(x), f (x) = x2 + 3, . . .
Relación no lineal
X
Y
-2 -1 0 1 2
-4
-3
-2
-1
0
1
2
The data do not show linear patterns
Introduction
Types of relation
I Absence of relation: Whenever f (x) = c , that is, whenever f (x)
does not depend on x
Ausencia de relación
X
Y
-2 -1 0 1 2
-2,5
-1,5
-0,5
0,5
1,5
2,5
Measures of linear dependency
Covariance
A measure of linear dependency is the covariance:
cov (x , y) =
n∑
i=1
(xi − x¯) (yi − y¯)
n − 1
I If there is a positive linear relation, the covariance will be positive
and large
I If there is a negative linear relation, the covariance will be negative
and large in absolute value.
I If there is no relation between the variables or the relation is
significantly linear, the covariance will be close to zero.
but the covariance depends on the units of measurement of the variables
Measures of linear dependency
The correlation coefficient
A measure of linear dependency that doesn’t depend on the units of
measurement is the correlation coefficient:
r(x,y) = cor (x , y) =
cov (x , y)
sxsy
where
s2x =
n∑
i=1
(xi − x¯)2
n − 1 and s
2
y =
n∑
i=1
(yi − y¯)2
n − 1
I −1 ≤ cor (x , y) ≤ 1
I cor (x , y) = cor (y , x)
I cor (ax + b, cy + d) = sign(a) sign(c) cor (x , y)
for any values a, b, c , d
The simple linear regression model
The simple linear regression model assumes that,
yi = β0 + β1xi + ui
where:
I yi represents the value of the dependent variable for the i-th
observation
I xi represents the value of the independent variable for the i-th
observation
I ui represents the error for the i-th observation, which we will assume
to be normal,
ui ∼ N(0, σ)
I β0 and β1 are the regression coefficients:
I β0 : intercept
I β1 : slope
The parameters to estimate are: β0, β1 and σ
The simple linear regression model
Our goal is to obtain estimates βˆ0 and βˆ1 for β0 and β1 to define the
regression line
yˆ = βˆ0 + βˆ1x
that provides the best fit for the data
Example: Assume that the regression line of the previous example is:
Cost = −15.65 + 1.29 VolumePlot 
o
f 
F
it
te
d
 M
o
d
e
l
V
o
lu
m
e
n
Costos
2
6
3
1
3
6
4
1
4
6
5
1
5
6
0
2
0
4
0
6
0
8
0
From this model it is estimated that a company that produces 25
thousand units will have a cost given by:
Cost = −15.65 + 1.29× 25 = 16.6 thousand euros
The simple linear regression model
The difference between each value yi of the response variable and its
estimation yˆi is called a residual:
ei = yi − yˆi
Va
lo
r o
bs
e
rv
a
do
 
 
 
 
 
 
 
 
D
a
to
 
(y)
R
e
ct
a
 
de
 
re
gr
es
ió
n
e
st
im
a
da
Example (cont.): Undoubtedly, a certain company that has produced
exactly 25 thousand units is not going to have a cost exactly equal to
16.6 thousand euros. The difference between the estimated cost and the
real one is the error. If for example the real cost of the company is 18
thousand euros, the residual is:
ei = 18− 16.6 = 1.4 thousand euros
Hypotheses of the simple linear regression model
I Linearity: The existing relation between X and Y is linear,
f (x) = β0 + β1x
I Homogeneity: The mean value of the error is zero,
E [ui ] = 0
I Homoscedasticity: The variance of the errors is constant,
Var(ui ) = σ
2
I Independence: The observations are independent,
E [uiuj ] = 0
I Normality: The errors follow a normal distribution,
ui ∼ N(0, σ)
Hypotheses of the simple linear regression model
Linearity
The data have to look reasonably straightPlo
t 
o
f 
F
it
te
d
 M
o
d
e
l
V
o
lu
m
e
n
Costos
2
6
3
1
3
6
4
1
4
6
5
1
5
6
0
2
0
4
0
6
0
8
0
Otherwise, the regression line doesn’t represent the structure of the data
P
lo
t 
o
f 
F
it
te
d
 M
o
d
e
l
X
Y
-5
-3
-1
1
3
5
-641
4
2
4
3
4Hypotheses of the simple linear regression model
Homoscedasticity
The dispersion of the data must be constant for the data to be
homoscedastic
P
lo
t 
o
f 
C
o
s
to
s
 v
s
 V
o
lu
m
e
n
4
0
6
0
8
0
Costos
2
6
3
1
3
6
4
1
4
6
5
1
5
6
V
o
lu
m
e
n
0
2
0
4
0
Costos
If this condition does not hold, the data are said to be heteroscedastic
Hypotheses of the simple linear regression model
Independence
I The data must be independent
I An observation must not give information about the rest of the
observations
I Usually, it is known from the type of the data if they are adequate or
not for this analysis
I In general, time series do not satisfy the independence hypothesis
Normality
I We will assumed that the data are a priori normal
2
R
eg
re
si
ón
Li
ne
al
 
M
od
el
o 
ge
ne
ra
l d
e 
re
gr
es
ió
n
O
bj
et
iv
o:
A
na
liz
ar
 la
 re
la
ci
ón
 e
nt
re
 u
na
 o
 v
ar
ia
s 
va
ria
bl
es
 d
ep
en
di
en
te
s y
 u
n 
co
nj
un
to
 d
e 
fa
ct
or
es
 
in
de
pe
nd
ie
nt
es
.
Ti
po
s d
e 
re
la
ci
on
es
:
-R
el
ac
ió
n 
no
 li
ne
al
 
-R
el
ac
ió
n 
lin
ea
l 
R
eg
re
si
ón
 li
ne
al
 si
m
pl
e
1
2
1
2
(
,
,..
.,
|
,
,..
.,
)
k
l
f
Y
Y
Y
X
X
X
3
R
eg
re
si
ón
Li
ne
al
 
R
eg
re
si
ón
 s
im
pl
e
co
ns
um
o 
y 
pe
so
 d
e 
au
to
m
óv
ile
s
Nú
m
. O
bs
.
P
es
o
Co
ns
um
o
(i)
kg
lit
ro
s/
10
0 
km
1
98
1
11
2
87
8
12
3
70
8
8
4
11
38
11
5
10
64
13
6
65
5
6
7
12
73
14
8
14
85
17
9
13
66
18
10
13
51
18
11
16
35
20
12
90
0
10
13
88
8
7
14
76
6
9
15
98
1
13
16
72
9
7
17
10
34
12
18
13
84
17
19
77
6
12
20
83
5
10
21
65
0
9
22
95
6
12
23
68
8
8
24
71
6
7
25
60
8
7
26
80
2
11
27
15
78
18
28
68
8
7
29
14
61
17
30
15
56
15
0510152025
50
0
70
0
90
0
11
00
13
00
15
00
17
00
Pe
so
 (K
g)
Consumo (litros/100 Km)
4
R
eg
re
si
ón
Li
ne
al
 
M
od
el
o
ix
iy
x 1
0
E
E
�
os
de
sc
on
oc
id
pa
rá
m
et
ro
s
:
,
,
2
1
0
V
E
E
)
,0(
,
2
1
0
V
E
E
N
u
u
x
y
i
i
i
i
o
�
�
 
5
R
eg
re
si
ón
Li
ne
al
 
H
ip
ót
es
is
 d
el
 m
od
el
o
„
Li
ne
al
id
ad
Ž
y i
=
E 0
+
E 1
x i
+
 u
i
„
N
or
m
al
id
ad
Ž
y i|
x i
Ÿ
N
(E
0+
E 1
x i,
V2
)
„
H
om
oc
ed
as
tic
id
ad
Ž
Va
r[
y i|
x i]
=
V2
„
In
de
pe
nd
en
ci
a
Ž
C
ov
[y
i, 
y k
]=
 0
210
VEE
P
ar
ám
et
ro
s
Least squares estimators
Gauss proposed in 1809 the method of least squares for obtaining the
values βˆ0 and βˆ1 that best fit the data:
yˆi = βˆ0 + βˆ1xi
The method consists of minimizing the sum of the squares of the vertical
distances between the data and the estimations, that is, minimize the
sum of the squared residuals
n∑
i=1
e2i =
n∑
i=1
(yi − yˆi )2 =
n∑
i=1
(
yi −
(
βˆ0 + βˆ1xi
))2
6
R
eg
re
si
ón
Li
ne
al
 
M
od
el
o
)
,0(
,
2
1
0
V
E
E
N
u
u
x
y
i
i
i
i
o
�
�
 y i
:V
ar
ia
bl
e 
de
pe
nd
ie
nt
e
x i
: 
V
ar
ia
bl
e 
in
de
pe
nd
ie
nt
e
u i
:
P
ar
te
 a
le
at
or
ia
0V
7
R
eg
re
si
ón
Li
ne
al
 
R
ec
ta
 d
e 
re
gr
es
ió
n
y
ie
iy
x
ix
8
R
eg
re
si
ón
Li
ne
al
 
R
ec
ta
 d
e 
re
gr
es
ió
n
x
y
1
0
ˆ
ˆ
ˆ
E
E
�
 
y
P
en
di
en
te
1Eˆ
x
y
1
0
ˆ
ˆ
E
E
�
 
x
9
R
eg
re
si
ón
Li
ne
al
 
R
es
id
uo
s
N
N
R
es
id
uo
Pr
ev
is
to
V
al
or
ˆ
ˆ
O
bs
er
va
do
V
al
or
1
0
i
i
i
e
x
y
�
�
 
�
�
	
�
E
E
iy
i
i
x
y
1
0
ˆ
ˆ
ˆ
E
E
�
 
ie
ix
Least squares estimators
The results are given by
βˆ1 =
cov(x , y)
s2x
=
n∑
i=1
(xi − x¯) (yi − y¯)
n∑
i=1
(xi − x¯)2
βˆ0 = y¯ − βˆ1x¯
6
R
eg
re
si
ón
Li
ne
al
 
M
od
el
o
)
,0(
,
2
1
0
V
E
E
N
u
u
x
y
i
i
i
i
o
�
�
 y i
:V
ar
ia
bl
e 
de
pe
nd
ie
nt
e
x i
: 
V
ar
ia
bl
e 
in
de
pe
nd
ie
nt
e
u i
:
P
ar
te
 a
le
at
or
ia
0V
7
R
eg
re
si
ón
Li
ne
al
 
R
ec
ta
 d
e 
re
gr
es
ió
n
y
ie
iy
x
ix
8
R
eg
re
si
ón
Li
ne
al
 
R
ec
ta
 d
e 
re
gr
es
ió
n
x
y
1
0
ˆ
ˆ
ˆ
E
E
�
 
y
P
en
di
en
te
1Eˆ
x
y
1
0
ˆ
ˆ
E
E
�
 
x
9
R
eg
re
si
ón
Li
ne
al
 
R
es
id
uo
s
N
N
R
es
id
uo
Pr
ev
is
to
V
al
or
ˆ
ˆ
O
bs
er
va
do
V
al
or
1
0
i
i
i
e
x
y
�
�
 
�
�
	
�
E
E
iy
i
i
x
y
1
0
ˆ
ˆ
ˆ
E
E
�
 
ie
ix
Least squares estimators
Exercise 4.1
The data regarding the production of wheat in tons (X ) and the price of the
kilo of flour in pesetas (Y ) in the decade of the 80’s in Spain were:
Wheat production 30 28 32 25 25 25 22 24 35 40
Flour price 25 30 27 40 42 40 50 45 30 25
Fit the regression line using the method of least squares
Results
βˆ1 =
10X
i=1
xiyi − nx¯y¯
10X
i=1
x2i − nx¯2
=
9734− 10× 28.6× 35.4
8468− 10× 28.62 = −1.3537
βˆ0 = y¯ − βˆ1x¯ = 35.4 + 1.3537× 28.6 = 74.116
The regression line is:
yˆ = 74.116− 1.3537x
Least squares estimators
Exercise 4.1
The data regarding the production of wheat in tons (X ) and the price of the
kilo of flour in pesetas (Y ) in the decade of the 80’s in Spain were:
Wheat production 30 28 32 25 25 25 22 24 35 40
Flour price 25 30 27 40 42 40 50 45 30 25
Fit the regression line using the method of least squares
Results
βˆ1 =
10X
i=1
xiyi − nx¯y¯
10X
i=1
x2i − nx¯2
=
9734− 10× 28.6× 35.4
8468− 10× 28.62 = −1.3537
βˆ0 = y¯ − βˆ1x¯ = 35.4 + 1.3537× 28.6 = 74.116
The regression line is:
yˆ = 74.116− 1.3537x
Least squares estimators
P
lo
t 
o
f 
F
it
te
d
 M
o
d
el
P
ro
d
u
cc
io
n
 e
n
 k
g
.
Precio en ptas.
2
2
2
5
2
8
3
1
3
4
3
7
4
0
2
5
3
0
3
5
4
0
4
5
5
0
R
e
gr
e
s
s
io
n
 
A
n
a
ly
s
is
 
-
 
L
in
e
a
r
 
m
o
de
l:
 
Y
 
=
 
a
 
+
 
b*
X
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
D
e
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
e
c
io
 
e
n
 
pt
a
s
.
I
n
de
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
o
du
c
c
io
n
 
e
n
 
kg
.
-
--
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
St
a
n
da
r
d 
 
 
 
 
 
 
 
 
 
T
P
a
r
a
m
e
t
e
r
 
 
 
 
 
 
 
E
s
t
im
a
t
e
 
 
 
 
 
 
 
 
 
E
r
r
o
r
 
 
 
 
 
 
 
St
a
t
is
t
ic
 
 
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
I
n
t
e
r
c
e
pt
 
 
 
 
 
 
 
 
74
,
11
51
 
 
 
 
 
 
 
 
8,
73
57
7 
 
 
 
 
 
 
 
 
8,
48
41
 
 
 
 
 
 
 
 
 
0,
00
00
Sl
o
pe
 
 
 
 
 
 
 
 
 
 
 
-
1,
35
36
8 
 
 
 
 
 
 
 
 
0,
30
02
 
 
 
 
 
 
 
-
4,
50
92
4 
 
 
 
 
 
 
 
 
0,
00
20
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
A
n
a
ly
s
is
 
o
f 
V
a
r
ia
n
c
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
0βˆ 1βˆ
So
u
r
c
e
 
 
 
 
 
 
 
 
 
 
 
 
 
Su
m
 
o
f 
Sq
u
a
r
e
s
 
 
 
 
 
D
f 
 
M
e
a
n
 
Sq
u
a
r
e
 
 
 
 
F
-
R
a
t
io
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
M
o
de
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
1 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
20
,
33
 
 
 
 
 
 
 
0,
00
20
R
e
s
id
u
a
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
7,
92
5 
 
 
 
 
 
8 
 
 
 
 
 
25
,
99
06
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
o
t
a
l 
(C
o
r
r
.
) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
6,
4 
 
 
 
 
 
9
Co
r
r
e
la
t
io
n
 
Co
e
ff
ic
ie
n
t
 
=
 
-
0,
84
71
4
R
-
s
qu
a
r
e
d 
=
 
71
,
76
47
 
pe
r
c
e
n
t
St
a
n
da
r
d 
E
r
r
o
r
 
o
f 
E
s
t
.
 
=
 
5,
09
81
Estimation of the variance
To estimate the variance of the errors, σ2, we can use,
σˆ2 =
n∑
i=1
e2i
n
which is the maximum likelihood estimator of σ2, but it is a biased
estimator.
An unbiased estimator of σ2 is the residual variance,
s2R =
n∑
i=1
e2i
n − 2
Estimation of the variance
Exercise 4.2
Compute the residual variance in exercise 4.1
Results
We first compute the residuals, ei , from the regression line,
yˆi = 74.116− 1.3537xi
xi 30 28 32 25 25 25 22 24 35 40
yi 25 30 27 40 42 40 50 45 30 25
yˆi 33.5 36.21 30.79 40.27 40.27 40.27 44.33 41.62 26.73 19.96
ei -8.50 -6.21 -3.79 -0.27 1.72 -0.27 5.66 3.37 3.26 5.03
The residual variance is:
s2R =
nX
i=1
e2i
n − 2 =
207.92
8
= 25.99
Estimation of the variance
Exercise 4.2
Compute the residual variance in exercise 4.1
Results
We first compute the residuals, ei , from the regression line,
yˆi = 74.116− 1.3537xi
xi 30 28 32 25 25 25 22 24 35 40
yi 25 30 27 40 42 40 50 45 30 25
yˆi 33.5 36.21 30.79 40.27 40.27 40.27 44.33 41.62 26.73 19.96
ei -8.50 -6.21 -3.79 -0.27 1.72 -0.27 5.66 3.37 3.26 5.03
The residual variance is:
s2R =
nX
i=1
e2i
n − 2 =
207.92
8
= 25.99
Estimation of the variance
R
e
gr
e
s
s
io
n
 
A
n
a
ly
s
is
 
-
 
L
in
e
a
r
 
m
o
de
l:
 
Y
 
=
 
a
 
+
 
b*
X
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
D
e
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
e
c
io
 
e
n
 
pt
a
s
.
I
n
de
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
o
du
c
c
io
n
 
e
n
 
kg
.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
St
a
n
da
r
d 
 
 
 
 
 
 
 
 
 
T
P
a
r
a
m
e
t
e
r
 
 
 
 
 
 
 
E
s
t
im
a
t
e
 
 
 
 
 
 
 
 
 
E
r
r
o
r
 
 
 
 
 
 
 
St
a
t
is
t
ic
 
 
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
I
n
t
e
r
c
e
pt
 
 
 
 
 
 
 
 
74
,
11
51
 
 
 
 
 
 
 
 
8,
73
57
7 
 
 
 
 
 
 
 
 
8,
48
41
 
 
 
 
 
 
 
 
 
0,
00
00
Sl
o
pe
 
 
 
 
 
 
 
 
 
 
 
-
1,
35
36
8 
 
 
 
 
 
 
 
 
0,
30
02
 
 
 
 
 
 
 
-
4,
50
92
4 
 
 
 
 
 
 
 
 
0,
00
20
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-A
n
a
ly
s
is
 
o
f 
V
a
r
ia
n
c
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
So
u
r
c
e
 
 
 
 
 
 
 
 
 
 
 
 
 
Su
m
 
o
f 
Sq
u
a
r
e
s
 
 
 
 
 
D
f 
 
M
e
a
n
 
Sq
u
a
r
e
 
 
 
 
F
-
R
a
t
io
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
M
o
de
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
1 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
20
,
33
 
 
 
 
 
 
 
0,
00
20
R
e
s
id
u
a
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
7,
92
5 
 
 
 
 
 
8 
 
 
 
 
 
25
,
99
06
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
o
t
a
l 
(C
o
r
r
.
) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
6,
4 
 
 
 
 
 
9
Co
r
r
e
la
t
io
n
 
Co
e
ff
ic
ie
n
t
 
=
 
-
0,
84
71
4
R
-
s
qu
a
r
e
d 
=
 
71
,
76
47
 
pe
r
c
e
n
t
St
a
n
da
r
d 
E
r
r
o
r
 
o
f 
E
s
t
.
 
=
 
5,
09
81
2 RSˆ
Inference on the regression model
I Up to this point we have obtained only point estimates for the
regression coefficients
I Using confidence intervals we can obtain a measure of the precision
of the above mentioned estimates
I Using hypothesis testing we can verify if a certain value can be the
true value of the parameter
Inference about the slope
The estimator βˆ1 follows a normal distribution because it is a linear
combination of normals,
βˆ1 =
n∑
i=1
(xi − x¯)
(n − 1)s2X
yi =
n∑
i=1
wiyi
where yi = β0 + β1xi + ui , satisfying yi ∼ N
(
β0 + β1xi , σ
2
)
.
Additionally, βˆ1 is an unbiased estimator for β1,
E
[
βˆ1
]
=
n∑
i=1
(xi − x¯)
(n − 1)s2X
E [yi ] = β1
and its variance is given by
Var
[
βˆ1
]
=
n∑
i=1
(
(xi − x¯)
(n − 1)s2X
)2
Var [yi ] =
σ2
(n − 1)s2X
Thus,
βˆ1 ∼ N
(
β1,
σ2
(n − 1)s2X
)
Confidence intervals for the slope
We want to obtain a confidence interval for β1 at a 1?α level. Since σ
2 is
unknown, it will be estimated using s2R . The basic result when the
variance is unknown is:
βˆ1 − β1√
s2R
(n − 1)s2X
∼ tn−2
that allows us to obtain a confidence interval for β1:
βˆ1 ± tn−2,α/2
√
s2R
(n − 1)s2X
The length of this interval will decrease if:
I The sample size increases
I The variance of the independent observations xi increases
I The residual variance decreases
Hypothesis testing on the slope
Using the previous result we can perform hypothesis testing for β1. In
particular, if the true value of β1 is zero then Y does not depend linearly
on X . Therefore, the following contrast is of special interest:
H0 : β1 = 0
H1 : β1 6= 0
The rejection region for the null hypothesis is:∣∣∣∣∣ βˆ1√s2R/(n − 1)s2X
∣∣∣∣∣ > tn−2,α/2
Equivalently, if the value zero is outside of the confidence interval for β1
at a 1?α level, we reject the null hypothesis at this level. The p-value of
the test is:
p-value = 2 Pr
(
tn−2 >
∣∣∣∣∣ βˆ1√s2R/(n − 1)s2X
∣∣∣∣∣
)
Inference for the slope
Exercise 4.3
1. Compute a 95% confidence interval for the slope of the regression line
obtained in exercise 4.1
2. Test the hypothesis that the price of flour depends linearly on the
production of wheat, using a 0.05 significance level
Results
1. tn−2,α/2 = t8,0.025 = 2.306
−2.306 ≤ −1.3537− β1q
25.99
9×32.04
≤ 2.306
−2.046 ≤ β1 ≤ −0.661
2. As the interval does not contain the value zero, we reject β1 = 0 at the 0.05
level. In fact: ˛˛˛˛
˛ βˆ1ps2R/ (n − 1) s2X
˛˛˛˛
˛ =
˛˛˛˛
˛˛ −1.3537q 25.99
9×32.04
˛˛˛˛
˛˛ = 4.509 > 2.306
p-value = 2 Pr(t8 > 4.509) = 0.002
Inference for the slope
Exercise 4.3
1. Compute a 95% confidence interval for the slope of the regression line
obtained in exercise 4.1
2. Test the hypothesis that the price of flour depends linearly on the
production of wheat, using a 0.05 significance level
Results
1. tn−2,α/2 = t8,0.025 = 2.306
−2.306 ≤ −1.3537− β1q
25.99
9×32.04
≤ 2.306
−2.046 ≤ β1 ≤ −0.661
2. As the interval does not contain the value zero, we reject β1 = 0 at the 0.05
level. In fact: ˛˛˛˛
˛ βˆ1ps2R/ (n − 1) s2X
˛˛˛˛
˛ =
˛˛˛˛
˛˛ −1.3537q 25.99
9×32.04
˛˛˛˛
˛˛ = 4.509 > 2.306
p-value = 2 Pr(t8 > 4.509) = 0.002
Inference for the slope
R
e
gr
e
s
s
io
n
 
A
n
a
ly
s
is
 
-
 
L
in
e
a
r
 
m
o
de
l:
 
Y
 
=
 
a
 
+
 
b*
X
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
D
e
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
e
c
io
 
e
n
 
pt
a
s
.
I
n
de
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
o
du
c
c
io
n
 
e
n
 
kg
.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
St
a
n
da
r
d 
 
 
 
 
 
 
 
 
 
T
P
a
r
a
m
e
t
e
r
 
 
 
 
 
 
 
E
s
t
im
a
t
e
 
 
 
 
 
 
 
 
 
E
r
r
o
r
 
 
 
 
 
 
 
St
a
t
is
t
ic
 
 
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
I
n
t
e
r
c
e
pt
 
 
 
 
 
 
 
 
74
,
11
51
 
 
 
 
 
 
 
 
8,
73
57
7 
 
 
 
 
 
 
 
 
8,
48
41
 
 
 
 
 
 
 
 
 
0,
00
00
Sl
o
pe
 
 
 
 
 
 
 
 
 
 
 
-
1,
35
36
8 
 
 
 
 
 
 
 
 
0,
30
02
 
 
 
 
 
 
 
-
4,
50
92
4 
 
 
 
 
 
 
 
 
0,
00
20
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
--
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
2
2
1
)1
/(
ˆ
X
R
s
n
s
−β
2
2
)1
(
X
R
s
n
s
−
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
A
n
a
ly
s
is
 
o
f 
V
a
r
ia
n
c
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
So
u
r
c
e
 
 
 
 
 
 
 
 
 
 
 
 
 
Su
m
 
o
f 
Sq
u
a
r
e
s
 
 
 
 
 
D
f 
 
M
e
a
n
 
Sq
u
a
r
e
 
 
 
 
F
-
R
a
t
io
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
M
o
de
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
1 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
20
,
33
 
 
 
 
 
 
 
0,
00
20
R
e
s
id
u
a
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
7,
92
5 
 
 
 
 
 
8 
 
 
 
 
 
25
,
99
06
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
o
t
a
l 
(C
o
r
r
.
) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
6,
4 
 
 
 
 
 
9
Co
r
r
e
la
t
io
n
 
Co
e
ff
ic
ie
n
t
 
=
 
-
0,
84
71
4
R
-
s
qu
a
r
e
d 
=
 
71
,
76
47
 
pe
r
c
e
n
t
St
a
n
da
r
d 
E
r
r
o
r
 
o
f 
E
s
t
.
 
=
 
5,
09
81
Inference for the intercept
The estimator βˆ0 follows a normal distribution, as it can be written as a
linear combination of normal random varaibles,
βˆ0 =
n∑
i=1
(
1
n
− x¯wi
)
yi
where wi = (xi − x¯) /ns2X and yi = β0 + β1xi + ui , satisfying
yi ∼ N
(
β0 + β1xi , σ
2
)
. Additionally, βˆ0 is an unbiased estimator for β0,
E
[
βˆ0
]
=
n∑
i=1
(
1
n
− x¯wi
)
E [yi ] = β0
and its variance is
Var
[
βˆ0
]
=
n∑
i=1
(
1
n
− x¯wi
)2
Var [yi ] = σ
2
(
1
n
+
x¯2
(n − 1)s2X
)
implying
βˆ0 ∼ N
(
β0, σ
2
(
1
n
+
x¯2
(n − 1)s2X
))
Confidence interval for the intercept
We wish to obtain a confidence interval for β0 at a 1?α level. Since σ
2 is
unknown, this value will be estimated using s2R . The basic result when
the variance is unknown is:
βˆ0 − β0√
s2R
(
1
n
+
x¯2
(n − 1)s2X
) ∼ tn−2
From it we can obtain a confidence interval for β0:
βˆ0 ± tn−2,α/2
√
s2R
(
1
n +
x¯2
(n−1)s2X
)
The length of the confidence interval decreases if:
I The sample size increases
I The variance of the independent observations xi increases
I The residual variance decreases
I The mean of the independent observations decreases
Hypothesis testing on the intercept
Using the previous result we can perform hypothesis testing for β0. In
particular, if the true value of β0 is zero then the regression line passes
through the origin. Therefore, the following test is of special interest:
H0 : β0 = 0
H1 : β0 6= 0
The critical region for this test is:∣∣∣∣∣∣∣∣
βˆ0√
s2R
(
1
n +
x¯2
(n−1)s2X
)
∣∣∣∣∣∣∣∣ > tn−2,α/2
Equivalently, if zero lies outside the confidence interval for β0 at a level
1−α, we reject the null hypothesis at that level. The p-value is given by:
p-value = 2 Pr
tn−2 >
∣∣∣∣∣∣∣∣
βˆ0√
s2R
(
1
n +
x¯2
(n−1)s2X
)
∣∣∣∣∣∣∣∣

Inference for the intercept
Exercise 4.4
1. Compute a 95% confidence interval for the intercept of the regression line
obtained in exercise 4.1
2. Test the hypothesis that the regression line passes through the origin,
using a 0.05 significance level
Results
1. tn−2,α/2 = t8,0.025 = 2.306
−2.306 ≤ 74.1151− β0r
25.99
“
1
10 +
28.62
9×32.04
” ≤ 2.306⇔ 53.969 ≤ β0 ≤ 94.261
2. As the computed interval does not contain the value zero, we reject that β0 = 0 at the level
0.05. In fact,˛˛˛˛
˛˛˛˛
˛˛
βˆ0s
s2R
„
1
n +
x¯2
(n−1)s2
X
«
˛˛˛˛
˛˛˛˛
˛˛ =
˛˛˛˛
˛˛˛˛ 74.1151r
25.99
“
1
10 +
28.62
9×32.04
”
˛˛˛˛
˛˛˛˛ = 8.484 > 2.306
p-value = 2 Pr(t8 > 8.483) = 0.000
Inference for the intercept
Exercise 4.4
1. Compute a 95% confidence interval for the intercept of the regression line
obtained in exercise 4.1
2. Test the hypothesis that the regression line passes through the origin,
using a 0.05 significance level
Results
1. tn−2,α/2 = t8,0.025 = 2.306
−2.306 ≤ 74.1151− β0r
25.99
“
1
10 +
28.62
9×32.04
” ≤ 2.306⇔ 53.969 ≤ β0 ≤ 94.261
2. As the computed interval does not contain the value zero, we reject that β0 = 0 at the level
0.05. In fact,˛˛˛˛
˛˛˛˛
˛˛
βˆ0s
s2R
„
1
n +
x¯2
(n−1)s2
X
«
˛˛˛˛
˛˛˛˛
˛˛ =
˛˛˛˛
˛˛˛˛ 74.1151r
25.99
“
1
10 +
28.62
9×32.04
”
˛˛˛˛
˛˛˛˛ = 8.484 > 2.306
p-value = 2 Pr(t8 > 8.483) = 0.000
Inference for the intercept
R
e
gr
e
s
s
io
n
 
A
n
a
ly
s
is
 
-
 
L
in
e
a
r
 
m
o
de
l:
 
Y
 
=
 
a
 
+
 
b*
X
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
D
e
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
e
c
io
 
e
n
 
pt
a
s
.
I
n
de
pe
n
de
n
t
 
v
a
r
ia
bl
e
:
 
P
r
o
du
c
c
io
n
 
e
n
 
kg
.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
St
a
n
da
r
d 
 
 
 
 
 
 
 
 
 
T
P
a
r
a
m
e
t
e
r
 
 
 
 
 
 
 
E
s
t
im
a
t
e
 
 
 
 
 
 
 
 
 
E
r
r
o
r
 
 
 
 
 
 
 
St
a
t
is
t
ic
 
 
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
--
-
-
-
-
-
-
-
-
-
-
-
I
n
t
e
r
c
e
pt
 
 
 
 
 
 
 
 
74
,
11
51
 
 
 
 
 
 
 
 
8,
73
57
7 
 
 
 
 
 
 
 
 
8,
48
41
 
 
 
 
 
 
 
 
 
0,
00
00
Sl
o
pe
 
 
 
 
 
 
 
 
 
 
 
-
1,
35
36
8 
 
 
 
 
 
 
 
 
0,
30
02
 
 
 
 
 
 
 
-
4,
50
92
4 
 
 
 
 
 
 
 
 
0,
00
20
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
−
+
2
2
2
0
)1
(
1
ˆ
X
R
s
n
x
n
s
β
 
 
−
+
2
2
2
)1
(
1
X
R
s
n
x
n
s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
A
n
a
ly
s
is
 
o
f 
V
a
r
ia
n
c
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
So
u
r
c
e
 
 
 
 
 
 
 
 
 
 
 
 
 
Su
m
 
o
f 
Sq
u
a
r
e
s
 
 
 
 
 
D
f 
 
M
e
a
n
 
Sq
u
a
r
e
 
 
 
 
F
-
R
a
t
io
 
 
 
 
 
 
P
-
V
a
lu
e
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
M
o
de
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
1 
 
 
 
 
 
52
8,
47
5 
 
 
 
 
 
20
,
33
 
 
 
 
 
 
 
0,
00
20
R
e
s
id
u
a
l 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
7,
92
5 
 
 
 
 
 
8 
 
 
 
 
 
25
,
99
06
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
T
o
t
a
l 
(C
o
r
r
.
) 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
6,
4 
 
 
 
 
 
9
Co
r
r
e
la
t
io
n
 
Co
e
ff
ic
ie
n
t
 
=
 
-
0,
84
71
4
R
-
s
qu
a
r
e
d 
=
 
71
,
76
47
 
pe
r
c
e
n
t
St
a
n
da
r
d 
E
r
r
o
r
 
o
f 
E
s
t
.
 
=
 
5,
09
81
Inference for the variance
The basic result is:
(n − 2) s2R
σ2
∼ χ2n−2
Using this result we can:
I Compute a confidence interval for the variance:
(n − 2) s2R
χ2n−2,α/2
≤ σ2 ≤ (n − 2) s
2
R
χ2n−2,1−α/2
I Perform hypothesis testing of the form:
H0 : σ
2 = σ20
H1 : σ
2 6= σ20
Estimation of the mean response and prediction of a new
response
We consider two types of problems:
1. Estimate the mean value of the variable Y corresponding to a given
value X = x0
2. Predict a future value of the variable Y for a given value X = x0
For example, in exercise 4.1 we might be interested in the following
questions:
1. What will be the mean price of a Kg of flour in those years where
wheat production equals 30 tons?
2. If in a given year wheat production is 30 tons, which will be the price
of a Kg of flour?
In both cases the estimation is:
yˆ0 = βˆ0 + βˆ1x0
= y¯ + βˆ1 (x0 − x¯)
but the precision in the estimations is different
Estimation of a mean response
Taking into account that:
Var (yˆ0) = Var (y¯) + (x0 − x¯)2 Var
(
βˆ1
)
= σ2
(
1
n
+
(x0 − x¯)2
(n − 1) s2X
)
the confidence interval for the mean response is:
yˆ0 ± tn−2,α/2
√√√√s2R
(
1
n
+
(x0 − x¯)2
(n − 1) s2X
)
Prediction of a new response
The variance of the prediction of a new response is the mean squared
error of the prediction:
E
[
(y0 − yˆ0)2
]
= Var (y0) + Var (yˆ0)
= σ2
(
1 +
1
n
+
(x0 − x¯)2
(n − 1) s2X
)
The confidence interval for the prediction of a new response is:
yˆ0 ± tn−2,α/2
√√√√s2R
(
1 +
1
n
+
(x0 − x¯)2
(n − 1) s2X
)
The length of this interval is larger than the one for the preceding case
(we have less precision), as we are not estimating a mean value but a
specific one.
Estimation of the mean response and prediction of a new
response
The intervals for the estimated means are shown in red in the figure
below, while those for the predictions are drawn in pink
It is readily apparent that the size of the latter ones is considerably larger
P
lo
t 
o
f 
F
it
te
d
 M
o
d
el
2
2
2
5
2
8
3
1
3
4
3
7
4
0
P
ro
d
u
cc
io
n
 e
n
 k
g
.
2
5
3
0
3
5
4
0
4
5
5
0
Precio en ptas.

Continue navegando