Buscar

Ficha 8 Kruskal Wallis e Wilcoxon Resolução

Prévia do material em texto

CASE STUDY 8: 
RESOLUTION ENSAIOS DE HIPÓTESES 
 
GRAÇA TRINDADE 
ISCTE – IUL 
2012-2013 1 
 
 
CASE STUDY 8: RESOLUTION 
 
Given the importance of an event like Rock in Rio, the municipality ordered a study by a 
research center that collected a random sample of residents of Lisbon. At first, we tried to 
evaluate the impact of this event on the residents of the parish where this has been 
accomplished. 
A. To see if the residents are more receptive to the location of the Rock in Rio in Parque 
da Bela Vista, it was evaluated the degree of satisfaction with the events of 2004 and 
2006. To this end it was constructed two indices of measure the degree of Satisfaction 
(measured on a scale of 1 - not at all satisfied to 10 - very satisfied), having been 
obtained the following results: 
TABLE A: 
 Paired Samples Statistics 
N Mean 
Std. 
Deviation 
Std. 
Error of Mean 
 Pair 1 Degree of Satisfaction 
Rock in Rio in 2004 
207 4,11 2,681 ,186 
 Degree of Satisfaction 
Rock in Rio in 2006 
207 8,37 2,034 ,141 
 
 
TABLE B: Paireds Samples Correlation 
 N Correlation Sig 
 Pair 1 Degree of Satisfaction 
Rock in Rio in 2004 
207 -,451 ,000 
 Degree of Satisfaction 
Rock in Rio in 2006 
 
TABLE C: 
 Paired Differences 
t df Sig (2-tailed) Mean 
Std. 
Deviation 
Std. Error 
Mean 
 Pair 1 Degree of Satisfaction 
Rock in Rio in 2004 - 
Degree of Satisfaction 
Rock in Rio in 2006 
-4,256 4,030 ,280 -15,195 206 ,000 
 
 
 
a) Given the variables in the analysis and the assumptions underlying it, do you consider 
the statistical procedure appropriate? Justify. 
The parametric t test for the mean difference of paired samples is appropriate since the 
variables are quantitative and it is intended to measure the degree of satisfaction of 
the same individuals at two different years (in 2004 and 2006) it is pretended to 
test if the mean difference between the degree of satisfaction with the event from 
2004 for 2006 is zero. This test has one condition and one assumption: 
1. CONDITION - The original variables should be correlated; 
2. ASSUMPTION - Normality of the new variable difference. 
 
CASE STUDY 8: 
RESOLUTION ENSAIOS DE HIPÓTESES 
 
GRAÇA TRINDADE 
ISCTE – IUL 
2012-2013 2 
 
b) What is the relevance of presenting the information contained in Table B? What may 
be concluded from the results presented in that table (=0,05)? Justify. 
Table B allows us to analyze one of the conditions to perform a test for paired samples. From 
the test of the population correlation coefficient in which H0 is equal to zero vs different from 
zero, a decision can be taken 
Ho: =0 
Ha: 0 
Decision: There is a negative sample correlation between the variables which is median 
( ) and with an associated probability of 0.000, almost zero (  0.05), it can be 
rejected the hypothesis of no correlation in the population between the variables under 
analysis. We can proceed with this statistical procedure. 
 
c) Based on these results can we conclude that the inhabitants of that parish from 2004 to 
2006, increased their level of satisfaction with the Rock in Rio ( = 0.05)? Justify. 
Before any decision it is necessary to validate that the variable difference follows a 
normal distribution. From samples size = 207> 30, the central limit theorem validate this 
assumption. So, it can be said that the variable Difference in the Degree of satisfaction 
of the Rock in Rio 2004 to 2006 approximately follows a normal distribution. 
The hypotheses of the principal test are the following: 
H0: D  0, that is, the mean difference between the degrees of satisfaction with the 
Rock in Rio between 2004 and 2006 is greater than or equal to zero 
Ha: D < 0, the mean difference between the degrees of satisfaction with the Rock in Rio 
is less than zero, which means that individuals increased their level of 
satisfaction from 2004 to 2006 
DECISION: with a test value T = -15.195 (the test value is consistent with the alternative 
hypothesis) and an associated probability of 0.000/2, almost zero, there is statistical 
evidence to claim that the population on average the difference between the degree 
of satisfaction with the Rock in Rio between 2004 and 2006 is less than zero. This 
means that there was an increase in the degree of satisfaction of individuals with 
regard to this event between 2004 and 2006. 
 
d) What is the alternative to the non-parametric test performed? Formulate the statistical 
hypotheses for that test. What is the main difference for the test shown in Table C. 
The alternative to the parametric t test for equality of means of paired samples is the 
Wilcoxon test. 
H0: The distribution of the Degree of satisfaction with the Rock in Rio in 2004 is at least 
equal to the distribution of the degree of satisfaction to the same event in 2006 
Ha: the distribution of the Degree of satisfaction with the Rock in Rio in 2004 is lower 
than the distribution of the degree of satisfaction with the same event in 2006 
CASE STUDY 8: 
RESOLUTION ENSAIOS DE HIPÓTESES 
 
GRAÇA TRINDADE 
ISCTE – IUL 
2012-2013 3 
 
The main difference is that like any nonparametric the variable Difference is not treated 
as a quantitative but a qualitative ordinal variable and the values of the variable are 
ordered so that we can speak in terms of the differences between the ranking of the 
values of the two variables instead of the mean of the differences between the values 
of the two variables. Also, in this non-parametric test, there isn’t any assumption to 
validate. 
 
B. It is intended to analyze the relationship between the willingness of residents to 
participate in this event ("Number of days you think you are going to the Rock in Rio in 
2008") and the perception of the ease in buying tickets (measured in prices) to watch 
the performances (measured on a scale of 1 - not available to 10 - very handy). The 
following results were obtained: 
TABLE D: Descriptives 
Degree of perception of buying the tickets 
 
N Mean 
Std. 
Deviation 
Std. 
Error 
95% Confidence interval for 
Mean 
Minimum Maximum Lower Bound Upper Bound 
 None 
1 day 
2 days 
3-5 days 
Total 
34 
105 
51 
33 
223 
2,62 
2,92 
4,75 
9,09 
4,20 
2,374 
1,455 
,440 
1,128 
2,617 
,407 
,102 
,062 
,196 
,175 
1,79 
2,63 
4,62 
8,69 
3,86 
3,45 
3,20 
4,87 
9,49 
4,55 
1 
1 
4 
6 
1 
10 
8 
5 
10 
10 
TABLE E: Test of Homogeneity of Variances 
Levene 
Statistic df1 df2 Sig 
13,387 3 219 ,000 
 
TABLE F: ANOVA 
 
Sum of 
Squares dfr Mean Saqure F Sig 
Between Groups 1063,248 3 354,416 169,963 ,000 
Within Groups 456,672 219 2,085 
Total 1519,919 222 
 
TABLE G: Ranks TABLE H: Test Statistics 
N Mean Rank 
Degree of 
perception of 
buying the 
tickets 
 
None 
1 day 
2 days 
3-5 days 
Total 
34 
105 
51 
33 
223 
63,79 
79,13 
151,66 
204,97 
 
 
Degree of 
perception 
of buying 
the tickets 
Chi-Square 
Df 
Asymp. Sig. 
137,590 
3 
,000 
 
 
CASE STUDY 8: 
RESOLUTION ENSAIOS DE HIPÓTESES 
 
GRAÇA TRINDADE 
ISCTE – IUL 
2012-2013 4 
 
Tabela I: Multiple Comparisons 
Dependent variable: Degree of perception to buy the tickets in 2008 
(I) Number of days to 
go to the Rock in Rio 
(J) Number of days to go to 
the Rock in Rio 
Mean Difference 
(I-J) 
Std. Error Sig 
None 1 day 
2 days 
3-5 days 
-,297 
-2,127 
-6,473 
,431 
,412 
,452 
,901 
,000 
,000 
1 day None 
2 days 
3-5 days 
,297 
-1,831 
-6,177 
,431 
,155 
,242 
,901 
,000 
,000 
2 days None 
1 day 
3-5days 
2,127 
1,831 
-4,346 
,412 
,155 
,206 
,000 
,000 
,000 
3-5 days 
None 
1 day 
2 days 
6,473 
6,177 
4,346 
,452 
,242 
,206 
,000 
,000 
,000 
 The mean difference is significance at the ,05 level. 
 
a) Given the variables in analysis what is the proper procedure? Justify. 
We are presented with two variables: an ordinal treated as nominal with more than 
two categories ("Number of days to go to the Rock in Rio") and a quantitative variable 
"Degree of perception of buying the tickets" measured in a Lickert scale (1- not 
accessible to 10 -very accessible). It is intended to analyze the relationship between the 
willingness of residents to participate in this event and the perception of the ease in 
buying tickets (dependent variable). 
The appropriate procedure will be the single parameter analysis of variance (One-Way 
ANOVA). 
 
b) What are the assumptions of the test shown in Table F? What can be concluded about 
the verification of these assumptions ( = 0.05)? Justify 
In Table F there is the One-Way ANOVA test which should meet three assumptions: 
1. The samples are independent 
2. Normality of variable "Degree of perception to buy the tickets" in each of the 
categories of the independent factor "Number of days to go to the Rock in Rio" - as 
the three samples are greater than 30, by the central limit theorem, it can be 
assumed that the variable "Degree of perception to buy the tickets" follows 
approximately a normal distribution in all categories. 
3. Homoscedasticity of variances population: through the Levene’ test (Table E) it is 
tested the assumption of equal variances: 
H0: the variance of the variable "Degree of perception to buy the tickets" is the 
same for all categories ("Number of days to go to the Rock in Rio) 
Ha: the variance of the variable "Degree of perception to buy the tickets" is 
different in at least one of the categories ("Number of days to go to the Rock in 
Rio) 
CASE STUDY 8: 
RESOLUTION ENSAIOS DE HIPÓTESES 
 
GRAÇA TRINDADE 
ISCTE – IUL 
2012-2013 5 
 
Decision: with a F value = 13.387 and an associated significance virtually zero, H0 is 
rejected and it is considered that the variances of the variable "Degree of 
perception to buy the tickets" are different in at least one of the 
categories ("Number of days to go to the Rock in Rio). 
 
c) What can be concluded about the relationship between the variables in the analysis? 
Formulate the hypotheses to the test and take the appropriate decision (=0,05). 
The appropriate test is the non-parametric Kruskal-Wallis test. 
H0: the mean ranking of the values of the variable "Degree of perception of buying the 
tickets" is the same for all categories of the variable "Number of days to go to the 
Rock in Rio" 
Ha: the mean ranking of the values of the variable "Degree of perception to buy the 
tickets" is different in at least one of the categories of the variable "Number of days 
to go to the Rock in Rio" 
Decision: as the significance associated with the value of the test 
 
 =137.59 is 
practically zero, we reject H0 and it is assumed that the mean ranking of the 
values of the "Degree of perception of buying the tickets" is different in at 
least one of the categories of "Number of days to go to the Rock in Rio" 
 
d) What is the underlying purpose of the statistical procedure to Table I. Say if you think it 
fits the problem and take the conclusions which seem to be more relevant to the 
question under analysis. 
The procedure under the Table I is the Dunnett-C test of multiple comparisons. This test 
is appropriate when the quantitative variable under testing failed the assumption of 
the equality of population variances. Because it was reject H0 in the Kruskal-Wallis 
which means that there is a relationship between the variables, it is desirable to know 
which group(s) is(are) responsible(s) for this(these) rejection. 
It is concluded that those who think going to watch one or no days at Rock in Rio are 
significantly different from those that think to go two or 3-5 days and these two last 
groups are also significantly different between them. 
One can find three groups: 
1. Those that don’t plan to go or think to do just one day, on average, give a great 
difficulty in buying tickets due to the high price. Considered, 2.62 and 2.91 to be 
significantly equal from each other, respectively, despite having a high dispersion 
CASE STUDY 8: 
RESOLUTION ENSAIOS DE HIPÓTESES 
 
GRAÇA TRINDADE 
ISCTE – IUL 
2012-2013 6 
 
(2.374 and 2.91, respectively) and the sample size of those who think just go one 
(105) day to be the triple from the sample size of those who do not think (34). 
2. A group of those that think to go two days are already located in the middle of the 
scale (4.75) and with a small dispersion, that is, they consider the prices to be 
significantly as much or little high. 
3. And finally, those who think going from 3 to 5 days, that are on average at the end 
of the scale and therefore consider that the prices are significantly reasonable 
(9.01). 
e) After the analysis in the preceding paragraphs, do you consider it appropriate to 
calculate a coefficient to measure the degree of association between the variables? 
Compute it and justify. 
To the extent that it was concluded that there is a relationship between variables, 
meaning that the null hypothesis is rejected, it is necessary to determine the degree of 
association between variables. 
Therefore, two different coefficients can be proposed and calculated: 
1. If we consider that one variable (the independent) as metric and the other as 
nominal, the appropriated coefficient will be the ETA, which varies between 0 
and 1, and results from the ANOVA test. 
 √
 
 
 √
 
 
 √
 
 
 
 
Conclusion: The degree of association between variables is very high, is 0.863. 
 
2. If we consider one variable (the independent) as ordinal and the other as 
ordinal, the appropriate coefficient is the Spearman’s Coefficient and results 
from the KRUSKALL-WALLIS test.

Continue navegando