Ficha 8 Kruskal Wallis e Wilcoxon Resolução

•

UNIP

0

Flavio Loss

21/03/2019

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Economia I

77.555 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

CASE STUDY 8:
RESOLUTION ENSAIOS DE HIPÓTESES

GRAÇA TRINDADE
ISCTE – IUL
2012-2013 1

CASE STUDY 8: RESOLUTION

Given the importance of an event like Rock in Rio, the municipality ordered a study by a
research center that collected a random sample of residents of Lisbon. At first, we tried to
evaluate the impact of this event on the residents of the parish where this has been
accomplished.
A. To see if the residents are more receptive to the location of the Rock in Rio in Parque
da Bela Vista, it was evaluated the degree of satisfaction with the events of 2004 and
2006. To this end it was constructed two indices of measure the degree of Satisfaction
(measured on a scale of 1 - not at all satisfied to 10 - very satisfied), having been
obtained the following results:
TABLE A:
Paired Samples Statistics
N Mean
Std.
Deviation
Std.
Error of Mean
Pair 1 Degree of Satisfaction
Rock in Rio in 2004
207 4,11 2,681 ,186
Degree of Satisfaction
Rock in Rio in 2006
207 8,37 2,034 ,141

TABLE B: Paireds Samples Correlation
N Correlation Sig
Pair 1 Degree of Satisfaction
Rock in Rio in 2004
207 -,451 ,000
Degree of Satisfaction
Rock in Rio in 2006

TABLE C:
Paired Differences
t df Sig (2-tailed) Mean
Std.
Deviation
Std. Error
Mean
Pair 1 Degree of Satisfaction
Rock in Rio in 2004 -
Degree of Satisfaction
Rock in Rio in 2006
-4,256 4,030 ,280 -15,195 206 ,000

a) Given the variables in the analysis and the assumptions underlying it, do you consider
the statistical procedure appropriate? Justify.
The parametric t test for the mean difference of paired samples is appropriate since the
variables are quantitative and it is intended to measure the degree of satisfaction of
the same individuals at two different years (in 2004 and 2006) it is pretended to
test if the mean difference between the degree of satisfaction with the event from
2004 for 2006 is zero. This test has one condition and one assumption:
1. CONDITION - The original variables should be correlated;
2. ASSUMPTION - Normality of the new variable difference.

CASE STUDY 8:
RESOLUTION ENSAIOS DE HIPÓTESES

GRAÇA TRINDADE
ISCTE – IUL
2012-2013 2

b) What is the relevance of presenting the information contained in Table B? What may
be concluded from the results presented in that table (=0,05)? Justify.
Table B allows us to analyze one of the conditions to perform a test for paired samples. From
the test of the population correlation coefficient in which H0 is equal to zero vs different from
zero, a decision can be taken
Ho: =0
Ha: 0
Decision: There is a negative sample correlation between the variables which is median
( ) and with an associated probability of 0.000, almost zero (  0.05), it can be
rejected the hypothesis of no correlation in the population between the variables under
analysis. We can proceed with this statistical procedure.

c) Based on these results can we conclude that the inhabitants of that parish from 2004 to
2006, increased their level of satisfaction with the Rock in Rio ( = 0.05)? Justify.
Before any decision it is necessary to validate that the variable difference follows a
normal distribution. From samples size = 207> 30, the central limit theorem validate this
assumption. So, it can be said that the variable Difference in the Degree of satisfaction
of the Rock in Rio 2004 to 2006 approximately follows a normal distribution.
The hypotheses of the principal test are the following:
H0: D  0, that is, the mean difference between the degrees of satisfaction with the
Rock in Rio between 2004 and 2006 is greater than or equal to zero
Ha: D < 0, the mean difference between the degrees of satisfaction with the Rock in Rio
is less than zero, which means that individuals increased their level of
satisfaction from 2004 to 2006
DECISION: with a test value T = -15.195 (the test value is consistent with the alternative
hypothesis) and an associated probability of 0.000/2, almost zero, there is statistical
evidence to claim that the population on average the difference between the degree
of satisfaction with the Rock in Rio between 2004 and 2006 is less than zero. This
means that there was an increase in the degree of satisfaction of individuals with
regard to this event between 2004 and 2006.

d) What is the alternative to the non-parametric test performed? Formulate the statistical
hypotheses for that test. What is the main difference for the test shown in Table C.
The alternative to the parametric t test for equality of means of paired samples is the
Wilcoxon test.
H0: The distribution of the Degree of satisfaction with the Rock in Rio in 2004 is at least
equal to the distribution of the degree of satisfaction to the same event in 2006
Ha: the distribution of the Degree of satisfaction with the Rock in Rio in 2004 is lower
than the distribution of the degree of satisfaction with the same event in 2006
CASE STUDY 8:
RESOLUTION ENSAIOS DE HIPÓTESES

GRAÇA TRINDADE
ISCTE – IUL
2012-2013 3

The main difference is that like any nonparametric the variable Difference is not treated
as a quantitative but a qualitative ordinal variable and the values of the variable are
ordered so that we can speak in terms of the differences between the ranking of the
values of the two variables instead of the mean of the differences between the values
of the two variables. Also, in this non-parametric test, there isn’t any assumption to
validate.

B. It is intended to analyze the relationship between the willingness of residents to
participate in this event ("Number of days you think you are going to the Rock in Rio in
2008") and the perception of the ease in buying tickets (measured in prices) to watch
the performances (measured on a scale of 1 - not available to 10 - very handy). The
following results were obtained:
TABLE D: Descriptives
Degree of perception of buying the tickets

N Mean
Std.
Deviation
Std.
Error
95% Confidence interval for
Mean
Minimum Maximum Lower Bound Upper Bound
None
1 day
2 days
3-5 days
Total
34
105
51
33
223
2,62
2,92
4,75
9,09
4,20
2,374
1,455
,440
1,128
2,617
,407
,102
,062
,196
,175
1,79
2,63
4,62
8,69
3,86
3,45
3,20
4,87
9,49
4,55
1
1
4
6
1
10
8
5
10
10
TABLE E: Test of Homogeneity of Variances
Levene
Statistic df1 df2 Sig
13,387 3 219 ,000

TABLE F: ANOVA

Sum of
Squares dfr Mean Saqure F Sig
Between Groups 1063,248 3 354,416 169,963 ,000
Within Groups 456,672 219 2,085
Total 1519,919 222

TABLE G: Ranks TABLE H: Test Statistics
N Mean Rank
Degree of
perception of
buying the
tickets

None
1 day
2 days
3-5 days
Total
34
105
51
33
223
63,79
79,13
151,66
204,97

Degree of
perception
of buying
the tickets
Chi-Square
Df
Asymp. Sig.
137,590
3
,000

CASE STUDY 8:
RESOLUTION ENSAIOS DE HIPÓTESES

GRAÇA TRINDADE
ISCTE – IUL
2012-2013 4

Tabela I: Multiple Comparisons
Dependent variable: Degree of perception to buy the tickets in 2008
(I) Number of days to
go to the Rock in Rio
(J) Number of days to go to
the Rock in Rio
Mean Difference
(I-J)
Std. Error Sig
None 1 day
2 days
3-5 days
-,297
-2,127
-6,473
,431
,412
,452
,901
,000
,000
1 day None
2 days
3-5 days
,297
-1,831
-6,177
,431
,155
,242
,901
,000
,000
2 days None
1 day
3-5days
2,127
1,831
-4,346
,412
,155
,206
,000
,000
,000
3-5 days
None
1 day
2 days
6,473
6,177
4,346
,452
,242
,206
,000
,000
,000
 The mean difference is significance at the ,05 level.

a) Given the variables in analysis what is the proper procedure? Justify.
We are presented with two variables: an ordinal treated as nominal with more than
two categories ("Number of days to go to the Rock in Rio") and a quantitative variable
"Degree of perception of buying the tickets" measured in a Lickert scale (1- not
accessible to 10 -very accessible). It is intended to analyze the relationship between the
willingness of residents to participate in this event and the perception of the ease in
buying tickets (dependent variable).
The appropriate procedure will be the single parameter analysis of variance (One-Way
ANOVA).

b) What are the assumptions of the test shown in Table F? What can be concluded about
the verification of these assumptions ( = 0.05)? Justify
In Table F there is the One-Way ANOVA test which should meet three assumptions:
1. The samples are independent
2. Normality of variable "Degree of perception to buy the tickets" in each of the
categories of the independent factor "Number of days to go to the Rock in Rio" - as
the three samples are greater than 30, by the central limit theorem, it can be
assumed that the variable "Degree of perception to buy the tickets" follows
approximately a normal distribution in all categories.
3. Homoscedasticity of variances population: through the Levene’ test (Table E) it is
tested the assumption of equal variances:
H0: the variance of the variable "Degree of perception to buy the tickets" is the
same for all categories ("Number of days to go to the Rock in Rio)
Ha: the variance of the variable "Degree of perception to buy the tickets" is
different in at least one of the categories ("Number of days to go to the Rock in
Rio)
CASE STUDY 8:
RESOLUTION ENSAIOS DE HIPÓTESES

GRAÇA TRINDADE
ISCTE – IUL
2012-2013 5

Decision: with a F value = 13.387 and an associated significance virtually zero, H0 is
rejected and it is considered that the variances of the variable "Degree of
perception to buy the tickets" are different in at least one of the
categories ("Number of days to go to the Rock in Rio).

c) What can be concluded about the relationship between the variables in the analysis?
Formulate the hypotheses to the test and take the appropriate decision (=0,05).
The appropriate test is the non-parametric Kruskal-Wallis test.
H0: the mean ranking of the values of the variable "Degree of perception of buying the
tickets" is the same for all categories of the variable "Number of days to go to the
Rock in Rio"
Ha: the mean ranking of the values of the variable "Degree of perception to buy the
tickets" is different in at least one of the categories of the variable "Number of days
to go to the Rock in Rio"
Decision: as the significance associated with the value of the test

=137.59 is
practically zero, we reject H0 and it is assumed that the mean ranking of the
values of the "Degree of perception of buying the tickets" is different in at
least one of the categories of "Number of days to go to the Rock in Rio"

d) What is the underlying purpose of the statistical procedure to Table I. Say if you think it
fits the problem and take the conclusions which seem to be more relevant to the
question under analysis.
The procedure under the Table I is the Dunnett-C test of multiple comparisons. This test
is appropriate when the quantitative variable under testing failed the assumption of
the equality of population variances. Because it was reject H0 in the Kruskal-Wallis
which means that there is a relationship between the variables, it is desirable to know
which group(s) is(are) responsible(s) for this(these) rejection.
It is concluded that those who think going to watch one or no days at Rock in Rio are
significantly different from those that think to go two or 3-5 days and these two last
groups are also significantly different between them.
One can find three groups:
1. Those that don’t plan to go or think to do just one day, on average, give a great
difficulty in buying tickets due to the high price. Considered, 2.62 and 2.91 to be
significantly equal from each other, respectively, despite having a high dispersion
CASE STUDY 8:
RESOLUTION ENSAIOS DE HIPÓTESES

GRAÇA TRINDADE
ISCTE – IUL
2012-2013 6

(2.374 and 2.91, respectively) and the sample size of those who think just go one
(105) day to be the triple from the sample size of those who do not think (34).
2. A group of those that think to go two days are already located in the middle of the
scale (4.75) and with a small dispersion, that is, they consider the prices to be
significantly as much or little high.
3. And finally, those who think going from 3 to 5 days, that are on average at the end
of the scale and therefore consider that the prices are significantly reasonable
(9.01).
e) After the analysis in the preceding paragraphs, do you consider it appropriate to
calculate a coefficient to measure the degree of association between the variables?
Compute it and justify.
To the extent that it was concluded that there is a relationship between variables,
meaning that the null hypothesis is rejected, it is necessary to determine the degree of
association between variables.
Therefore, two different coefficients can be proposed and calculated:
1. If we consider that one variable (the independent) as metric and the other as
nominal, the appropriated coefficient will be the ETA, which varies between 0
and 1, and results from the ANOVA test.
√

√

Conclusion: The degree of association between variables is very high, is 0.863.

2. If we consider one variable (the independent) as ordinal and the other as
ordinal, the appropriate coefficient is the Spearman’s Coefficient and results
from the KRUSKALL-WALLIS test.