kNN Master

•

UNILAB

0

Darleison Rodrigues

30.09.2015

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 3, do total de 39 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 6, do total de 39 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 9, do total de 39 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Machine Learning

1.248 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

Exact Bagging of k-Nearest Neighbor
Predictors
Brian Steele and Dave Patterson
Dept. of Mathematical Sciences
University of Montana
1
Overview
• Definitions
• k-nearest neighbor predictors
• Accuracy estimation
• Bagging and exact bagging
• Comparisons
2
Definitions
• Zn denotes a random sample of n obser-
vations from a population
P = {z1, z2, . . . , }
where zi = (xi,yi)
• xi is a covariate vector observed for zi ∈ P
• yi is observed only for zi ∈ Zn
3
Objective: predict y or a function of y using a
predictor η constructed from Zn
• Distance between z and zi ∈ Zn is mea-
sured on the covariates
• {z(1), . . . , z(k)} denotes the k-nearest neigh-
bors of z among Zn
• The elementary k-nearest neighbor predic-
tor of y is
ŷ = k−1
k∑
i=1
y(i)
4
The classification problem
• P is partitioned as c classes
• y is a scalar identifying the group to which
z belongs
• Suppose that the groups are labeled 1, . . . , g
• Ψ is the indicator function of an event.
Hence
Ψ(y = g) =
{
1 if z ∈ Gg
0 if z 6∈ Gg, g = 1, . . . , c.
5
• Suppose that z is drawn at random from
P and that x is observed but not y
• Ψ(y = g) is Bernoulli given x, and
E[Ψ(y = g) | x] = P (y = g | x)
• Let Pg(x) = P (y = g | x)
6
• We prefer to estimate Pg(x) using those
training observations with covariates iden-
tical to x
• There may not be any such xi so we use
the k nearest neighbors
• The k-NN estimator of Pg(x) is
P̂g(x) = k
−1 k∑
i=1
Ψ[y(i) = g]
• The k-NN estimator predictor of class mem-
bership is
η(x) = arg max
g
P̂g(x)
7
• Supposing that there are no ties, then the
realization of P̂g(x) is one of 0,1/k, . . . ,1
• The k-NN classifier is relatively imprecise
when k is small
• The cost of choosing a small k is a poten-
tial increase in the mean square error of
P̂g(x)
8
Bootstrap aggregation
• Bagging is a method of reducing the vari-
ance of a classifier prediction
• Bagging is carried out by drawing B boot-
strap samples from Zn
• The bth sample produces an estimator of
Pg(x) denoted by P ∗bg (x) using the k-nearest
neighbors among the bootstrap sample
9
• The bagging estimator of Pg(x) is
PBAg (x) = B
−1 B∑
b=1
P ∗bg (x),
• The bagging classifier is
ηBA (x) = arg max
g
PBAg (x) (1)
10
• PBAg (x) is a Monte Carlo approximation to
the exact bagging estimator
E
F̂
P ∗g (x)
• F̂ is the empirical distribution function plac-
ing probability mass n−1 at each zi ∈ Zn
• P ∗g (x) is the k-NN estimator of Pg(x) ob-
tained from a random sample from F̂
11
• The exact bagging classifier E
F̂
P ∗g (z) is usu-
ally computationally intractable
• Hence, bagging uses Monte Carlo approxi-
mation to approximate E
F̂
P ∗g (z)
• With the k-NN classifier, it is possible to
compute E
F̂
P ∗g (x) analytically
12
A comparison
• Data were provided by the USDA Forest
Service, Northern Region
• 3709 observations on a g = 15 class pop-
ulation used for land cover mapping via
Landsat imagery (10 covariates)
13
Difference in estimated percent accuracy is plot-
ted against k along with approximate 95% con-
fidence intervals
Number of neighbors (k)
D
iff
er
en
ce
 (%
)
0 10 20 30 40 50
0
1
2
• This is typical: exact bagging produces
small but consistent improvements
14
A digression
• Accuracy estimation for k-nearest neighbor
classifiers requires care
• Cross-validation estimators have small bi-
ases but relatively large variances
15
66
68
70
72
74
76
78
80
0 10 20 30 40 50
Ordinary k-NN
0 10 20 30 40 50
Exact bagging k-NN
Number of neighbors (k)
E
st
im
at
ed
 a
cc
ur
ac
y 
(%
)
Cross-validation
Leave-one-out bootstrap
.632 bootstrap
.632+ bootstrap
Estimated accuracy as a function of neighbor-
hood size k
• The .632 and .632+ bootstrap estimators
are optimistically biased for small k
• The leave-one-out bootstrap estimator is
pessimistically biased
• n-fold cross-validation is most appropriate
for assessing the conditional accuracy of η
given Zn
• Bootstrap estimators are most appropriate
for assessing the unconditional accuracy of
η
k-NN regression
• y is a continuous random variable
• Use a linear combination of the k nearest
covariate vectors x(1), . . . ,x(k) to predict y
• Nk(z) denotes the set of the k nearest neigh-
bors of z among Zn
16
• Assume that in Nk(z),
E(y) = xTβ(z)
• β(z) is specific to Nk(z)
• This model is a compromise between a con-
tinuously varying β and the global param-
eter of ordinary least squares regression
17
• Let
ψ(zi) = Ψ[zi ∈ Nk(z)]
• An estimator for β(z) can be obtained by
minimizing the prediction error
S(z) =
n∑
i=1
ψ(zi)
[
yi − xTi β(z)
]2
18
• A matrix formulation is
S(z) = [y −Xβ(z)]TΨ(z)[y −Xβ(z)]
where
• y = (y1, . . . , yn)T
• X =
(
xij
)
and
• Ψ(z) is a diagonal with ψ(z1), . . . , ψ(zn) on
the diagonal.
19
• Differentiation of S(z) with respect to β(z)
yields the normal equations
XTΨ(z)y = XTΨ(z)Xβ(z)
• If XTΨ(z)X is full rank, then
β̂(z) = [XTΨ(z)X]−1XTΨ(z)y
and
ŷ =XΨ(z)β̂(x)
20
Problems
• Using only k observations implies the mean
square error of ŷ will be large
• Presumably, the β(z)’s vary more smoothly
than the k-NN model implies, and informa-
tion regarding β(z) can be extracted from
neighbors of z that are close to z, but not
in the k-neighborhood
• Observations near the boundary of the k-
neighborhood may be less informative than
those observations that are near the center
of the neighborhood.
• XTΨ(z)X will not be full rank if k < p
21
Improving on k nearest neighbor regression
• We propose to use the exact bootstrap ag-
gregation approach to smooth the param-
eter vectors
• Set equal the bootstrap expectations of
each side of the normal equations:
E
F̂
[XTΨ(z)y] =E
F̂
[XTΨ(z)Xβ(x)]
=E
F̂
[XTΨ(z)X]β(x)
• The solution is
βs(z) = E
F̂
[XTΨ(z)X]−1E
F̂
[XTΨ(z)y].
22
Computing expectations
• Consider
E
F̂
[XTΨ(z)y] =

E
F̂
[xT1Ψ(z)y]
...
E
F̂
[xTpΨ(z)y]

where xj is the ith column of X
• The jth element of this vector is
E
F̂
[xTj Ψ(z)y] =
n∑
i=1
xijyiPF̂ [zi ∈ N
∗
k(z)]
where N∗k(z) is the set of k closest neigh-
bors of z among a bootstrap sample drawn
from F̂
23
Another expression
• Suppose that the rows of X and y have
been arranged according to the distance of
xi from x, i = 1, . . . , n
• The ordered matrix X is Xo
• The ordered vector y by yo
• The diagonal matrix Ak contains PF̂ [z(i) ∈
N∗k(z)] on the diagonal
24
Then
E
F̂
[XTΨ(z)y] =XoAkyo,
E
F̂
[XTΨ(z)X] =XToAkXo
and
βs(z) = (XToAkXo)
−1XoAkyo
25
Computing P
F̂
[z(i) ∈ N∗k(z)]
• This term is the probability that the ith
nearest neighbor to z is the rth closest
among a bootstrap sample
• Note that
P
F̂
[z(i) ∈ N∗k(z)] =
k∑
r=1
P
F̂
[z∗(r) = z(i)]
26
It can be shown that
P
F̂
[z∗(r) = z(i)] =Fi/n(r, n− r+1)
− F(i−1)/n(r, n− r+1)
where
Fp(r, n) =
n !
(r − 1) !(n− r+1) !
×
∫ p
ur−1(1− u)n−rdu
is the cumulative distribution function of a beta
random variable with parameters r and n−r+1
27
Some remarks
• Inverting XToAkXo is much less of a prob-
lem than inverting XTΨ(x)X because
rank(XToAkXo) = min[rank(X
T
oXo), rank(Ak)]
• Both k-nearest neighbor and exact bagging
k-nearest neighbor are varieties of local lin-
ear regression
• Resampling methods are necessary for es-
timating prediction error and intervals
28
Some examples• Time series smoothing: for zt ∈ Nk(z) as-
sume the model
E(yt) = β0(t) + β1(t)t
where yt is the response variable at time t
and β0(t) and β1(t) are intercept and slope
parameters at time t
• β̂1(t) is the estimated rate of change as a
function of time
29
Temperature data for the Northern hemisphere.
k = 18 ∗
Year
1880 1900 1920 1940 1960 1980
-
0.
4
0.
0
k-NN mean
Exact bagging k-NN mean
k-NN regression
Exact bagging k-NN regression
Year
1880 1900 1920 1940 1960 1980
-
0.
01
0.
01
k-NN regression
Exact bagging k-NN regression
∗Jones, P.D. (1988) Journal of Climatology, 1, 654-60
30
USDA Forest Service Forest and Inventory Anal-
ysis (FIA) data from Northern Idaho (n = 973)
• Monitoring program for forest management
with systematically located permanent plot
installations
• Objective: predict forest attributes away
from plot locations using remotely sensed
predictor variables such as Landsat 7 TM
imagery and topographic variables
• For comparative purposes, we measure ac-
curacy according to
1−
∑n
i=1(yi − ŷi)2∑n
i=1(yi − y)2
31
Basal area (entire plot, plots with less than 20
sq. ft. were excluded)
0.35
0.40
0.45
0.50
0.55
0.60
0 100 200 300 400
.632 bootstrap
0 100 200 300 400
Cross-validation
Number of neighbors (k)
1 
- M
S
E
/V
ar
k-NN
Exact bagging k-NN
Ordinary least squares
k-NN regression
Exact bagging k-NN regression
32
Canopy cover (entire plot)
0.35
0.40
0.45
0.50
0.55
0 100 200 300 400
.632 bootstrap
0 100 200 300 400
Cross-validation
Number of neighbors (k)
1 
- M
S
E
/V
ar
k-NN
Exact bagging k-NN
Ordinary least squares
k-NN regression
Exact bagging k-NN regression
33
We fit all possible models (2p) and use the
leave-one-out bootstrap estimates of predic-
tion error (B = 200) to identify a best model
Model fit plotted against p
0.0
0.2
0.4
2 4 6 8
k-NN
2 4 6 8
Exact bagging k-NN
2 4 6 8
Ordinary least squares
0.0
0.2
0.4
k-NN regression Exact bagging k-NN regression
Number of predictors
R
el
at
iv
e 
ac
cu
ra
cy
34
Remarks
• Exact bagging provides modest and con-
sistnt improvements in accuracy
• When k is large, there is little difference
between ordinary k- and exact bagging k-
nearest neighbor predictors
• k-nearest neighbor regression is potentially
valuable in remote sensing application
35
Some references
• Breiman, L. (1996) Bagging predictors. Ma-
chine Learning, 24, 123-140.
• Hutson, A.D., Ernst, M.D. (2000) The ex-
act bootstrap mean and variance of an L-
estimator. J.R.Statist. Soc. B, 62, 89-94.
• Steele, B. M., Patterson, D.A. (2000) Ideal
bootstrap estimation of expected predic-
tion error for k-nearest neighbor classifiers:
Applications for classification and error as-
sessment. Statistics and Computing, 10,
349-55.
36
Additional references
• Efron, B., Tibshirani, R. (1997) Improve-
ments on cross-validation: the 632+ boot-
strap method. emphJournal of the Ameri-
can Statistical Association, 92, 548-560.
• Hall, P., Sarnworth, R.J. (2005) Proper-
ties of bagged nearest neighbour classifiers.
J.R.Statist. Soc. B, 67, 363-379.
• Hastie,T, Tibshirani, R., Friedman, J. (2001)
The Elements of Statistical Learning. Springer,
New York.
• Steele, B.M., Patterson, D.A., Redmond,
R.L. (2003) Toward estimation of map ac-
curacy without a probability sample. En-
vironmental and Ecological Statistics, 10,
333-356.
37