[R] question for aov and kruskal
Rolf Turner
r.turner at auckland.ac.nz
Wed Mar 12 22:33:10 CET 2008
I thought your question was well expressed and that you followed the
posting guide better than most.
I'm no expert on such issues, but I'd like to kick in a few opinions
(with which others may disagree).
(1) All of the anova stuff is based on the assumption of homogeneity
of variance. However my understanding is that the model is
quite robust
to this assumption. Problems may arise if there are small sample
sizes in some cells and if the small samples are associated with
large variances. Otherwise there is not all that much of a worry.
(2) The Tukey test is indeed based on the assumption of equal sample
sizes. The version of the test for unbalanced data is an
approximation.
My understanding is that it's a pretty good approximation.
(3) For multiple comparisons after applying the Kruskal-Wallis test:
Experts
on non-parametric statistics may know about more powerful
methods, but
I would be inclined simply to apply a Bonferroni correction to a
collection
of pairwise tests (e.g. wilcox.test). Just multiply the p-
values by
the number of pairwise comparisons, k-choose-2 where k is the
number of
groups (= 3-choose-2 = 3 in your toy example).
(4) Generally speaking I would say that if a classical test and a non-
parametric
test give different answers, then your data are being coy about
revealing
their true import. I would have very little faith in either
answer, and
would claim that you really need more data.
Unfortunately this need can rarely be satisfied. If you have to
make a
decision one way or another, then you should go with the non-
parametric
answer.
(5) Finally, my universal prescription is: ``When in doubt, simulate.''
I.e. simulate multiple data sets on the basis of models fitted to,
or related to, your real data. Run the possible tests on the
simulated
data sets. Since these data are simulated, you know what the right
answer is. Count up how often you get the right answer.
Such an exercise can be quite revealing.
HTH
cheers,
Rolf Turner
On 13/03/2008, at 9:19 AM, eugen pircalabelu wrote:
> Hi,
>
> My data was only a toy example that matched the real situation,
> with real data, but i could not have posted the entire data.set and
> so i gave a self contained example of what i thought was my
> problem. Of course you can see with the naked eye that the data is
> unbalanced, (this was done intentionally) but like i said this was
> only a toy example, mimicking a problem from a real data set.
>
> Thank you and have a great ahead!
>
>
> David Hewitt <dhewitt37 at gmail.com> wrote:
>
>
>> I have the following problem: how appropriate is my aov model
>> under the
>> violation of anova assumptions?
>>
>> Example:
>> a<-c(1,1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3,3)
>> b<-c(101,1010,200,300,400, 202, 121, 234, 55,555,66,76,88,34,239,
>> 30, 40,
>> 50,50,60)
>> z<-data.frame(a, b)
>> fligner.test(z$b, factor(z$a))
>> aov(z$b~factor(z$a))->ll
>> TukeyHSD(ll)
>>
>> Now from the aov i found that my model is unbalanced, and from the
>> flinger test i found out that the assumption of homogeneity of
>> variances
>> is rejected. Could my Tukey comparison be a valid one under these
>> violations? From what i read the Tukey test is valid only when the
>> model
>> is balanced and when the assumption of homogeneity of variances is
>> not
>> rejected, am i wrong? Can anyone tell me what would be the correct
>> test in
>> this case?
>>
>> Doing a non-parametric Kruskal - wallis test would give me a
>> different
>> result. But what would be the correct multiple comparison test in
>> this
>> case?
>>
>
> You shouldn't have needed aov to tell you that the data (not the
> model) are
> unbalanced. I could see that without running the code! Seriously,
> you might
> need to think more about the type of model you're using, and what
> you want
> to know, and then consider how to estimate the effect sizes of
> interest.
>
>
> -----
> David Hewitt
> Virginia Institute of Marine Science
> http://www.vims.edu/fish/students/dhewitt/
> --
> View this message in context: http://www.nabble.com/question-for-
> aov-and-kruskal-tp15955385p15976643.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> ---------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
######################################################################
Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
More information about the R-help
mailing list