RCodeLab2

•

UFV

0

Lorena Ribeiro

23/09/2014

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 3, do total de 9 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 6, do total de 9 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 9, do total de 9 páginas

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Estatística Aplicada

24.048 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

R Notes 2011 LAB 2 
Topics covered: 
One-way ANOVA
Levene’s test
ANOVA with nested design
> setwd("G:/Courses/A205/R/Lab2")
> lab2a<-read.table('Lab2a.txt', header=T)
# read.table reads white spaces as separators as default. You can specify in 
# the parameters of read.table the type of separator: e.g.: sep=”\t“ for tab
# only as separator. When you use a new command it is useful to ask the
# manual what are the default values and its usage (> ?read.table). 
# alternatively: read.csv can be used for comma separated values or
# read.delim for tab separated values. Remember to look at the manual for the
# usable parameters: you will discover the very handy parameter sep, that
# allows to set how the values are separated: sep=“ “ is the default value
# for read.table, sep=”\t” or sep=”,” can be used for tab and comma separated
# values respectively. 
> lab2a
 Culture N_level
1 3DOk1 24.1
2 3DOk1 32.6
3 3DOk1 27.0
4 3DOk1 28.9
5 3DOk1 31.4
6 3DOk5 19.1
7 3DOk5 24.8
8 3DOk5 26.3
9 3DOk5 25.2
10 3DOk5 24.3
11 3DOk4 17.9
12 3DOk4 16.5
13 3DOk4 10.9
14 3DOk4 11.9
15 3DOk4 15.8
16 3DOk7 20.7
17 3DOk7 23.4
18 3DOk7 20.5
19 3DOk7 18.1
20 3DOk7 16.7
21 3DOk13 14.3
22 3DOk13 14.4
23 3DOk13 11.8
24 3DOk13 11.6
25 3DOk13 14.2
26 Comp 17.3
27 Comp 19.4
28 Comp 19.1
29 Comp 16.9
30 Comp 20.8
# When input data in R you should always check how R has interpreted them.
# Eventually you may need to make some adjustments. You can look at the data
# by typing again the name of the data.frame you created (Lab2_N), if the
# data are composed of thousand or rows you can visualize only the first of
# the last lines with the head and tail commands, respectively. 
# The function str is useful to get a summary of the data and how R is
# interpreting them. If numerical factors, R by default will consider them
# continuous numbers; factors need to be recognized as such and composed of
# discrete levels; a very useful command is str. 
 
> str(lab2a)
'data.frame':	30 obs. of 2 variables:
 $ Culture: Factor w/ 6 levels "3DOk1","3DOk13",..: 1 1 1 1 1 4 4 4 4 4 ...
 $ N_level: num 24.1 32.6 27 28.9 31.4 19.1 24.8 26.3 25.2 24.3 ...
# Or we can ask the question using the is.factor function (if you need to
# convert numerical values to factors you may use the function as.factor)
> lab2a<-read.table('Lab2a.txt', header=T)
> is.factor(lab2a$Culture)
[1] TRUE
> is.factor(lab2a$N_level)
[1] FALSE
# To select specific part of the table 
> lab2a[lab2a$Culture=="Comp",]
 Culture N_level
26 Comp 17.3
27 Comp 19.4
28 Comp 19.1
29 Comp 16.9
30 Comp 20.8
> lab2a[lab2a$N_level>="25",]
 Culture N_level
2 3DOk1 32.6
3 3DOk1 27.0
4 3DOk1 28.9
5 3DOk1 31.4
8 3DOk5 26.3
9 3DOk5 25.2
One way ANOVA
# To define the one-way ANOVA model (result_variable~classification_variable)
# we can use the function lm. It is important to note the order of the
# arguments: the first argument is always the dependent variable (N_level).
# It is followed by the tilde symbol (~) and the independent variable(s).
> model<-lm(N_level~Culture, data=lab2a)
> anova(model)
Analysis of Variance Table
Response: N_level
 Df Sum Sq Mean Sq F value Pr(>F) 
Culture 5 845.72 169.144 25.363 7.537e-09 ***
Residuals 24 160.05 6.669 
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# We use the anova function to visualize the ANOVA table of a linear model
# defined using lm (for aov model use summary). Summary can be used also with
# lm models and returns the R2:
> summary(model)
Call:
lm(formula = N_level ~ Culture, data = lab2a)
Residuals:
 Min 1Q Median 3Q Max 
-4.840 -1.750 0.660 1.245 3.800 
Coefficients:
 Estimate Std. Error t value Pr(>|t|) 
(Intercept) 19.8633 0.4715 42.130 < 2e-16 ***
Culture1 -7.7700 0.8166 -9.515 1.29e-09 ***
Culture2 -2.1433 0.4715 -4.546 0.000132 ***
Culture3 1.2633 0.3334 3.789 0.000896 ***
Culture4 -0.0540 0.2582 -0.209 0.836129 
Culture5 -0.2327 0.2109 -1.103 0.280771 
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
Residual standard error: 2.582 on 24 degrees of freedom [= Root MSE in SAS]
Multiple R-squared: 0.8409,	Adjusted R-squared: 0.8077 
F-statistic: 25.36 on 5 and 24 DF, p-value: 7.537e-09 
# Reminder from previous lab: it is easy to calculate exact p-value from know
# F-values in R using pf. 
> pf(25.36, 5, 24, lower.tail=F)
[1] 7.546316e-09
	?pf
	lower.tail= logical; if TRUE (default), probabilities are P[X <= x], otherwise, P[X > x].
# We can easily calculate the coefficient of variation: 
	
> Nlevel_Mean<-mean(lab2a$N_level)
> Root_MSE<-sqrt(6.67)
> Root_MSE
[1] 2.582634
> Coeff_Var<-Root_MSE/Nlevel_Mean*100
> Coeff_Var
[1] 13.00202
In SAS
R-Square Coeff Var Root MSE Nlevel Mean
0.840866 13.00088 2.582408 19.86333
> plot(lab2a) 
# same result boxplot(N_level ~ Culture, data=lab2a)
# We can easily extract the predicted and the residual values using the
# predict and residual functions. 
> predi<-predict(model)
> predi
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
28.80 28.80 28.80 28.80 28.80 23.94 23.94 23.94 23.94 23.94 14.60 14.60 14.60 14.60 14.60 19.88 19.88 
 18 19 20 21 22 23 24 25 26 27 28 29 30 
19.88 19.88 19.88 13.26 13.26 13.26 13.26 13.26 18.70 18.70 18.70 18.70 18.70
> resi<-residuals(model)
> resi
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
-4.70 3.80 -1.80 0.10 2.60 -4.84 0.86 2.36 1.26 0.36 3.30 1.90 -3.70 -2.70 1.20 0.82 3.52 
 18 19 20 21 22 23 24 25 26 27 28 29 30 
 0.62 -1.78 -3.18 1.04 1.14 -1.46 -1.66 0.94 -1.40 0.70 0.40 -1.80 2.10
# Adding the Predicted and residual vectors to the data.frame lab2a 
> lab2a$predi<-predi
> lab2a$resi<-resi
# We can print the first 10 rows using head function to visualize the new
# table lab2a and use the write.table function to save it a tab delimited
# text file.
> head(lab2a, 10) 
# 10 specifies the number of rows: 5 is default. 
 Culture N_level predi resi
1 3DOk1 24.1 28.80 -4.70
2 3DOk1 32.6 28.80 3.80
3 3DOk1 27.0 28.80 -1.80
4 3DOk1 28.9 28.80 0.10
5 3DOk1 31.4 28.80 2.60
6 3DOk5 19.1 23.94 -4.84
7 3DOk5 24.8 23.94 0.86
8 3DOk5 26.3 23.94 2.36
9 3DOk5 25.2 23.94 1.26
10 3DOk5 24.3 23.94 0.36
> write.table(Lab2a, "try3.txt", sep='\t')
Levene’s test
1- Calculating Levene’s test by hand (ANOVA of the square of the residuals)
> lab2a$resi2<-lab2a$resi^2
> modelLevene<-lm(lab2a$resi2~Culture, data=lab2a)
> anova(modelLevene)
Analysis of Variance Table
Response: lab2a$resi2
 Df Sum Sq Mean Sq F value Pr(>F)
Culture 5 226.99 45.398 1.1528 0.3606
Residuals 24 945.17 39.382 
2- Levene’s test using the ‘car’ package
# To install a new package, we can use the install.packages function.
> install.packages("car")
--- Please select a CRAN mirror for use in this session select close US location]
Content type 'application/zip' length 728229 bytes (711 Kb)
opened URL
downloaded 711 Kb
package 'car' successfully unpacked and MD5 sums checked
# The package is downloaded but not available. Each R session you need to
# activate the downloaded library (it saves memory to have only the necessary
# libraries open)
> library(car)
> help(car)# If you want to change the list of default packages you need to modify the
# Rprofile file. Search the Rprofile file in your computer and open it with a
# txt editor (e.g. word pad). Once it’s open, you have to search for this
# part: 
local({dp <- as.vector(Sys.getenv("R_DEFAULT_PACKAGES"))
 if(identical(dp, "")) # marginally faster to do methods last
 dp <- c("datasets", "utils", "grDevices", "graphics",
 "stats", "methods", "car", "agricolae")
 else if(identical(dp, "NULL")) dp <- character(0)
 else dp <- strsplit(dp, ",")[[1]]
 dp <- sub("[[:blank:]]*([[:alnum:]]+)", "\\1", dp) # strip whitespace
 options(defaultPackages = dp)
 })
# The part I have highlighted are the names of the two additional packages I
# want to be loaded at startup. Add the names of the packages between
# quotation mark. Save the new Rprofile file and restart R console to see the
# changes you have made. A word of advice: don’t change anything else in the
# Rprofile file, if you don’t know what you are doing…
# Running the Levene’s test: deviation from medians is the default. SAS 
# calculates deviation from means. 
> levene.test(model, center=mean)
Levene's Test for Homogeneity of Variance (center = mean)
 Df F value Pr(>F)
group 5 0.5841 0.7119
 24 
# Caution: R/car uses a different version of the Levene’s test than SAS: 
# R version is of Levene is based on absolute deviations (instead of the
# square deviations in SAS). 
3- Alternative homogeneity of variance tests: Bartlett and Fligner-Killeen Tests
> bartlett.test(N_level~Culture, lab2a)
 Bartlett test of homogeneity of variances
data: N_level by Culture 
Bartlett's K-squared = 3.9834, df = 5, p-value = 0.5518
> fligner.test(N_level~Culture, lab2a)
 Fligner-Killeen test of homogeneity of variances
data: N_level by Culture 
Fligner-Killeen:med chi-squared = 2.7889, df = 5, p-value = 0.7325
ANOVA with nested design
> lab2b<-read.table('lab2b.txt', header=T)
> str(lab2b)
'data.frame':	72 obs. of 4 variables:
 $ Trtmt : int 1 1 1 1 1 1 1 1 1 1 ...
 $ Pot : int 1 1 1 1 2 2 2 2 3 3 ...
 $ Plant : int 1 2 3 4 1 2 3 4 1 2 ...
 $ Growth: num 3.5 4 3 4.5 2.5 4.5 5.5 5 3 3 ...
# Since R is not interpreting the first three variables as factors we need to
# declare that they are factors using the as.factor function. 
> lab2b$Plant<-as.factor(lab2b$Plant)
> lab2b$Pot<-as.factor(lab2b$Pot)
> lab2b$Trtmt<-as.factor(lab2b$Trtmt)
> str(lab2b)
'data.frame':	72 obs. of 4 variables:
 $ Trtmt : Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ Pot : Factor w/ 3 levels "1","2","3": 1 1 1 1 2 2 2 2 3 3 ...
 $ Plant : Factor w/ 4 levels "1","2","3","4": 1 2 3 4 1 2 3 4 1 2 ...
 $ Growth: num 3.5 4 3 4.5 2.5 4.5 5.5 5 3 3 ...
To have all the variance components: nested design using linear mixed-effects models with lmer (lme4 package)
> install.packages("lme4")
> library(lme4)
# We design a model where the dependent variable growth is function only of a
# random effect that considers the hierarchical design nesting of ‘Pot’
# within ‘Trtmt’ (1|Trtmt/Pot):
> model<-lmer(Growth ~ 1 + (1|Trtmt/Pot), data=lab2b)
> model
Linear mixed model fit by REML 
Formula: Growth ~ 1 + (1 | Trtmt/Pot) 
 AIC BIC logLik deviance REMLdev
 237.2 246.3 -114.6 230.3 229.2
Random effects:
 Groups Name Variance Std.Dev.
 Pot:Trtmt (Intercept) 0.30469 0.55198 
 Trtmt (Intercept) 2.81464 1.67769 
 Residual 0.93403 0.96645 
Number of obs: 72, groups: Pot:Trtmt, 18; Trtmt, 6
Fixed effects:
 Estimate Std. Error t value
(Intercept) 5.7847 0.7064 8.19
# Now we can calculate the variance components as percentage:
> variances<-c(0.30,2.8,0.93)
> variances/sum(variances)*100
[1] 7.444169 69.478908 23.076923
In SAS
SAS. Nested Random Effects Analysis of Variance for Variable Growth
 Variance Sum of Error Variance Percent
 Source DF Squares F Value Pr > F Term Mean Square Component of Total
 Total 71 255.913194 3.604411 4.053356 100.0000
 Trtmt 5 179.642361 16.69 <.0001 Pot 35.928472 2.814641 69.4398
 Pot 12 25.833333 2.30 0.0186 Error 2.152778 0.304688 7.5169
 Error 54 50.437500 0.934028 0.934028 23.0433
 Growth Mean 5.78472222
 Standard Error of Growth Mean 0.70640396
To have the p-value:
# Using lm
> anova(lm(Growth~Trtmt/Pot, lab2b))
Analysis of Variance Table
Response: Growth
 Df Sum Sq Mean Sq F value Pr(>F) 
Trtmt 5 179.642 35.928 38.4662 < 2e-16 ***
Trtmt:Pot 12 25.833 2.153 2.3048 0.01858 * 
Residuals 54 50.438 0.934
# 38.5 is the incorrect F value for Trtmt because it is using the wrong error
# term. R computes F values using the residual MS as the error term (0.934
# in this case). The calculation of the correct F and P needs to be
# completed by hand. You need to know that the MSE Trtmt needs to be divided
# by the Trtmt/Pot error term (2.153) and not the residual.
> Fvalue_trtmt<-35.928/2.1528
> Fvalue_trtmt 
16.69 	
> pf(16.69, 5, 12, lower.tail=F)
4.880102e-05
> Fvalue_treatment_pot<-2.1528/0.934
> Fvalue_treatment_pot
2.30		
> pf(2.30, 12, 54, lower.tail=F)
0.01882839 
In SAS
SAS: Tests of Hypotheses Using the Type III MS for Pot(Trtmt) as an Error Term
Source DF Type III SS Mean Square F Value Pr > F
Trtmt 5 179.6423611 35.9284722 16.687 <.0001
# We could use lm only and calculate the variance components by hand:
# MSSE= 2error = 0.93
# MSEE= 2 pot= (MSE - 2error)/4= (2.15-0.93)/4= 0.30
# MST= (MST –MSEE)/12= (35.93-2.15)/12=2.81
Nested design with ANOVA with multiple error terms (aov) [not covered in class]
> model_nest<-aov(Growth~Trtmt+Error(Trtmt/Pot), lab2b)
> summary(model_nest)
Error: Trtmt
 Df Sum Sq Mean Sq
Trtmt 5 179.642 35.928
Error: Trtmt:Pot
 Df Sum Sq Mean Sq F value Pr(>F)
Residuals 12 25.8333 2.1528 
Error: Within
 Df Sum Sq Mean Sq F value Pr(>F)
Residuals 54 50.437 0.934 
# Again then the calculation of F and P needs to be completed by hand. You
# need to know that the MSE Trtmt needs to be divided by the Trtmt:Pot error
# term and not the residual. 
PLS205 2011	2.8	R Lab 2