[R] [OT] 1 vs 2-way anova technical question

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Mon Nov 21 17:20:17 CET 2011


Giovanni,

Have you tried Bert suggestion 2)? Because his log(R) ~ A*B + C + D is NOT the same as your log(R)~A+B+I(A*B)+C+D

Note that I(A * B) means: create a new variable that is the product of A and B. Which is not meaningfull if A and B are factors (hence the warning you got).
So I(A * B) is not the interaction between A and B. You need A:B if you want the interaction.

Thierry


> -----Oorspronkelijk bericht-----
> Van: r-help-bounces op r-project.org [mailto:r-help-bounces op r-project.org]
> Namens Giovanni Azua
> Verzonden: maandag 21 november 2011 17:00
> Aan: r-help op r-project.org
> Onderwerp: Re: [R] [OT] 1 vs 2-way anova technical question
> 
> Hello Bert,
> 
> Thank you for taking the time to try to answer.
> 
> 1) I know this, however if one is interested in only interaction between two
> specific factors then in R one uses I(A*B*C) meaning 3-way anova for that and
> not the implicit 2-ways that would otherwise be computed.
> 
> 2) True, but it fails.
> 
> 3) No, I don't have any factors with one level, I never said that. It would not be a
> 2^k experiment otherwise, my OP states this clearly, this is a 2^k experimental
> design ___2___
> 
> 4) this is only your judgmental attitude that many people unfortunately have in
> some of these lists, focussing on ad-hominem judgements or even attacks to try
> to prove their superiority without actually answering nor adding any value to the
> question at hand. I have taken many graduate courses in subjects that have all
> Statistics in the title and passed all of them. However, as an experienced
> Software Engineer working for more than 10 years in the field, I can tell you that
> there is a huge difference between solving toy problems to implementing real-
> life complex projects.  Same rules apply here, one thing is the toy examples one
> finds in R books and course exercises and another totally different story is the
> real life data I am trying to model. I'm a student in the quantitative part and
> learning, so I do have some gaps, I am curious and trying to learn and I think
> there is no shame in that. If this makes you upset maybe you should ask to split
> the list in two or more: "Advanc!
>  ed-PhD-black-belt-10th-dan-in-Statistics-and-R level" list and "newbies" list.
> 
> Best regards,
> Giovanni
> 
> On Nov 21, 2011, at 3:55 PM, Bert Gunter wrote:
> 
> > Giovanni:
> >
> > 1. Please read ?formula and/or An Introduction to R for how to specify
> > linear models in R.
> >
> > 2. Correct specification of what you want (if I understand correctly)
> > is
> > log(R) ~ A*B + C + D
> >
> > 3. ... which presumably will also fail because some of your factors
> > have only one level, which means that you cannot use them in your
> > model.
> >
> > 4. ... which, in turn, suggests you don't know what your doing
> > statistically and should seek local assistance, especially in trying
> > to interpret a fit to an unbalanced model (you can't do it as you
> > probably think you can).
> >
> > I should say in your defense that posts on this list indicate that
> > point 4 is a widely shared problem among posters here.
> >
> > Cheers,
> > Bert
> >
> > On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua <bravegag op gmail.com>
> wrote:
> >> Hello,
> >>
> >> Couple of clarifications:
> >> - A,B,C,D are factors and I am also interested in possible
> >> interactions but the model that comes out from aov R~A*B*C*D violates
> >> the model assumptions
> >> - My 2^k is unbalanced i.e. missing data and an additional level I
> >> also include in one of the factors i.e. C
> >> - I was referring in the OP to the 4-way interactions and not 2-way, I'm sorry
> for my confusion.
> >> - I tried to create an aov model with less interactions this way but I get the
> following error:
> >>
> >> model.aov <- aov(log(R)~A+B+I(A*B)+C+D,data=throughput)
> >> Error in `contrasts<-`(`*tmp*`, value = "contr.treatment") :
> >>  contrasts can be applied only to factors with 2 or more levels In
> >> addition: Warning message:
> >> In Ops.factor(A, B) : * not meaningful for factors
> >>
> >> Here I was trying to say: do a one-way anova except for the A and B factors
> for which I would like to get their 2-way interactions ...
> >>
> >> Thanks in advance,
> >> Best regards,
> >> Giovanni
> >>
> >> On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:
> >>
> >>> Hello,
> >>>
> >>> I know there is plenty of people in this group who can give me a
> >>> good answer :)
> >>>
> >>> I have a 2^k model where k=4 like this:
> >>> Model 1) R~A*B*C*D
> >>>
> >>> If I use the "*" in R among all elements it means to me to explore all
> interactions and include them in the model i.e. I think this would be the so called
> 2-way anova. However, if I do this, it leads to model violations i.e. the
> homoscedasticity is violated, the normality assumption of the sample errors i.e.
> residuals is violated etc. I tried correcting the issues using different standard
> transformations: log, sqrt, Box-Cox forms etc but none really improve the result.
> In this case even though the model assumptions do not hold, some of the
> interactions are found to significatively influence the response variable. But then
> shall I trust the results of this Model 1) given that the assumptions do not hold?
> >>>
> >>> Then I try this other model where I exclude the interactions (is this the 1-
> way anova?):
> >>> Model 2) R~A+B+C+D
> >>>
> >>> In this one the model assumptions hold except the existence of some
> outliers and a slightly heavy tail in the QQ-plot.
> >>>
> >>> Given that the assumptions for Model 1) do not hold, I assume I should
> ignore the results altogether for Model 1) or? or instead can I safely use the Sum
> Sq. of Model 1) to get my table of percent of variations?
> >>>
> >>> This to me was a bit counter-intuitive since I assumed that if there was
> collinearity among factors (and there is e.g. I(A*B*C)) the Model 1) and I
> included those interactions, my model would be more accurate ... ok this turned
> into a brand new topic of model selection but I am mostly interested in the
> question: if model is violated can I or must I not use the results e.g. Sum Sqr for
> that model?
> >>>
> >>> Can anyone advice please?
> >>>
> >>> btw I have bought most books on R and statistical analysis. I have
> researched them all and the ANOVA coverage is very shallow in most of them
> specially in the R-sy ones, they just offer a slightly pimped up version of the R-
> help.
> >>>
> >>> I am also unofficially following a course on ANOVA from the university I am
> registered in and most examples are too simplistic and either the assumptions
> just hold easily or the assumptions don't hold and nothing happens.
> >>>
> >>> Thanks in advance,
> >>> Best regards,
> >>> Giovanni
> >>>
> >>
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help op r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
> > --
> >
> > Bert Gunter
> > Genentech Nonclinical Biostatistics
> >
> > Internal Contact Info:
> > Phone: 467-7374
> > Website:
> > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb
> > -biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list