[R] [OT] 1 vs 2-way anova technical question

Mon Nov 21 20:31:54 CET 2011

Thanks Thierry:

I had missed that the OP's failure to read the formula docs and use of
I(A*B) was what caused the error. Mea Culpa.

However, I actually agree with Giovanni's remarks about the difference
between what is typically taught and what one faces in practice. Where
we disagree is that I think data analysts with limited statistical
backgrounds should consult with local statisticians instead of trying
to muddle through on their own thru lists like this. This is not meant
to be arrogance on my part -- though it may seem to come across that
way -- but rather a plea for good science. I believe that bad
statistics --> bad science, a problem that I see as pervasive and
inimical to scientific progress, especially in today's data saturated
world.

But enough of my off topic B.S. Please reply privately to not waste
yet more space here (positively or negatively -- stone throwers need
to catch them, too).

Cheers,

-- Bert

On Mon, Nov 21, 2011 at 8:20 AM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
> Giovanni,
>
> Have you tried Bert suggestion 2)? Because his log(R) ~ A*B + C + D is NOT the same as your log(R)~A+B+I(A*B)+C+D
>
> Note that I(A * B) means: create a new variable that is the product of A and B. Which is not meaningfull if A and B are factors (hence the warning you got).
> So I(A * B) is not the interaction between A and B. You need A:B if you want the interaction.
>
> Thierry
>
>
>> -----Oorspronkelijk bericht-----
>> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> Namens Giovanni Azua
>> Verzonden: maandag 21 november 2011 17:00
>> Aan: r-help at r-project.org
>> Onderwerp: Re: [R] [OT] 1 vs 2-way anova technical question
>>
>> Hello Bert,
>>
>> Thank you for taking the time to try to answer.
>>
>> 1) I know this, however if one is interested in only interaction between two
>> specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and
>> not the implicit 2-ways that would otherwise be computed.
>>
>> 2) True, but it fails.
>>
>> 3) No, I don't have any factors with one level, I never said that. It would not be a
>> 2^k experiment otherwise, my OP states this clearly, this is a 2^k experimental
>> design ___2___
>>
>> 4) this is only your judgmental attitude that many people unfortunately have in
>> some of these lists, focussing on ad-hominem judgements or even attacks to try
>> to prove their superiority without actually answering nor adding any value to the
>> question at hand. I have taken many graduate courses in subjects that have all
>> Statistics in the title and passed all of them. However, as an experienced
>> Software Engineer working for more than 10 years in the field, I can tell you that
>> there is a huge difference between solving toy problems to implementing real-
>> life complex projects.  Same rules apply here, one thing is the toy examples one
>> finds in R books and course exercises and another totally different story is the
>> real life data I am trying to model. I'm a student in the quantitative part and
>> learning, so I do have some gaps, I am curious and trying to learn and I think
>> there is no shame in that. If this makes you upset maybe you should ask to split
>> the list in two or more: "Advanc!
>>  ed-PhD-black-belt-10th-dan-in-Statistics-and-R level" list and "newbies" list.
>>
>> Best regards,
>> Giovanni
>>
>> On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote:
>>
>> > Giovanni:
>> >
>> > 1. Please read ?formula and/or An Introduction to R for how to specify
>> > linear models in R.
>> >
>> > 2. Correct specification of what you want (if I understand correctly)
>> > is
>> > log(R) ~ A*B + C + D
>> >
>> > 3. ... which presumably will also fail because some of your factors
>> > have only one level, which means that you cannot use them in your
>> > model.
>> >
>> > 4. ... which, in turn, suggests you don't know what your doing
>> > statistically and should seek local assistance, especially in trying
>> > to interpret a fit to an unbalanced model (you can't do it as you
>> > probably think you can).
>> >
>> > I should say in your defense that posts on this list indicate that
>> > point 4 is a widely shared problem among posters here.
>> >
>> > Cheers,
>> > Bert
>> >
>> > On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua <bravegag at gmail.com>
>> wrote:
>> >> Hello,
>> >>
>> >> Couple of clarifications:
>> >> - A,B,C,D are factors and I am also interested in possible
>> >> interactions but the model that comes out from aov R~A*B*C*D violates
>> >> the model assumptions
>> >> - My 2^k is unbalanced i.e. missing data and an additional level I
>> >> also include in one of the factors i.e. C
>> >> - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry
>> for my confusion.
>> >> - I tried to create an aov model with less interactions this way but I get the
>> following error:
>> >>
>> >> model.aov <- aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
>> >> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
>> >>  contrasts can be applied only to factors with 2 or more levels In
>> >> addition: Warning message:
>> >> In Ops.factor(A, B) : * not meaningful for factors
>> >>
>> >> Here I was trying to say: do a one-way anova except for the A and B factors
>> for which I would like to get their 2-way interactions ...
>> >>
>> >> Thanks in advance,
>> >> Best regards,
>> >> Giovanni
>> >>
>> >> On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> I know there is plenty of people in this group who can give me a
>> >>> good answer :)
>> >>>
>> >>> I have a 2^k model where k=4 like this:
>> >>> Model 1) R~A*B*C*D
>> >>>
>> >>> If I use the "*" in R among all elements it means to me to explore all
>> interactions and include them in the model i.e. I think this would be the so called
>> 2-way anova. However, if I do this, it leads to model violations i.e. the
>> homoscedasticity is violated, the normality assumption of the sample errors i.e.
>> residuals is violated etc. I tried correcting the issues using different standard
>> transformations: log, sqrt, Box-Cox forms etc but none really improve the result.
>> In this case even though the model assumptions do not hold, some of the
>> interactions are found to significatively influence the response variable. But then
>> shall I trust the results of this Model 1) given that the assumptions do not hold?
>> >>>
>> >>> Then I try this other model where I exclude the interactions (is this the 1-
>> way anova?):
>> >>> Model 2) R~A+B+C+D
>> >>>
>> >>> In this one the model assumptions hold except the existence of some
>> outliers and a slightly heavy tail in the QQ-plot.
>> >>>
>> >>> Given that the assumptions for Model 1) do not hold, I assume I should
>> ignore the results altogether for Model 1) or? or instead can I safely use the Sum
>> Sq. of Model 1) to get my table of percent of variations?
>> >>>
>> >>> This to me was a bit counter-intuitive since I assumed that if there was
>> collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I
>> included those interactions, my model would be more accurate ... ok this turned
>> into a brand new topic of model selection but I am mostly interested in the
>> question: if model is violated can I or must I not use the results e.g. Sum Sqr for
>> that model?
>> >>>
>> >>> Can anyone advice please?
>> >>>
>> >>> btw I have bought most books on R and statistical analysis. I have
>> researched them all and the ANOVA coverage is very shallow in most of them
>> specially in the R-sy ones, they just offer a slightly pimped up version of the R-
>> help.
>> >>>
>> >>> I am also unofficially following a course on ANOVA from the university I am
>> registered in and most examples are too simplistic and either the assumptions
>> just hold easily or the assumptions don't hold and nothing happens.
>> >>>
>> >>> Thanks in advance,
>> >>> Best regards,
>> >>> Giovanni
>> >>>
>> >>
>> >>
>> >>        [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help at r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Bert Gunter
>> > Genentech Nonclinical Biostatistics
>> >
>> > Internal Contact Info:
>> > Phone: 467-7374
>> > Website:
>> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb
>> > -biostatistics/pdb-ncb-home.htm
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm