[BioC] when do linear models work?

Arne.Muller at aventis.com Arne.Muller at aventis.com
Fri Mar 5 11:21:29 MET 2004


Hello,

thanks for your reply. This clearifies the situation a bit. In terms of ANOVA
this makes a lot more sense!

Nevertheless, if you create a lm in R, you can apply summary() or anova(),
giving you different p-values. I was wondering what the differnece is, does
summary() is the p-value for the coefficients?

In addition, the anova is based on the lm, if the relatioship between the
factor levels is not lenear, does it matter?

	kind regards,

	Arne

ps: please let me know if you think this discussion get too much off topic -
i.e. to much stats rather than BioC.

> -----Original Message-----
> From: James MacDonald [mailto:jmacdon at med.umich.edu]
> Sent: 04 March 2004 21:28
> To: Muller, Arne PH/FR; bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] when do linear models work?
> 
> 
> The linear model fit here is not what you think. Since we are using
> factors, this is an analysis of variance model, so there is no
> assumption of linearity per se. In other words, we are not testing to
> see if there is a linear relationship between say, treatment and no
> treatment. Instead what we are testing is to see if there is a
> difference in the mean expression of each gene at the two (or more)
> factor levels.
> 
> So if you are testing the five different treatment levels you mention,
> you are really testing to see if the mean expression level 
> for each gene
> is the same at all levels or not. If they are not, you then 
> have to fit
> contrasts to see where they differ. You can also fit 
> different contrasts
> to see if, say, the mean expression is the same at 0 mM and 
> 0.1 mM, but
> then changes at 0.25 mM (here you would be comparing the mean 
> expression
> of the 0 mM and 0.1 mM samples to the 0.25 mM samples).
> 
> If the book(s) you are reading cover ANOVA, you should take a look at
> those sections, especially the parts about design matrices and
> contrasts.
> 
> HTH,
> 
> Jim
> 
> 
> 
> James W. MacDonald
> Affymetrix and cDNA Microarray Core
> University of Michigan Cancer Center
> 1500 E. Medical Center Drive
> 7410 CCGC
> Ann Arbor MI 48109
> 734-647-5623
> 
> >>> <Arne.Muller at aventis.com> 03/04/04 01:48PM >>>
> Hello All,
> 
> I've two fundamental problems with linear models (lm), maybe you can
> help me
> to clearify these issues:
> 
> 1. Irrespective of how many factors you use in your expriment, the
> relationship is always assumed to be linear. If you've a response
> vector Y
> and vector X of indeppendent variables, the Y ~ X basically assumes a
> straight line (with some kind of slope). If you do say Y ~ X + Z then
> one can
> think of the lm as a *flat* surface. The same is true for higher
> dimensions
> (X ~ dose + time + batch + gender + ... )
> 
> This assumtion is realy dangerous I think, since many
> treatment/response
> relationships are not linear. For example think about an experiment:
> I've 5
> doses 0.0mM, 0.10mM, 0.25mM, 0.5mM and 1.0mM of a drug with which cell
> cultures get treated. The 0.1mM dose causes hardly any change in gene
> expression, whereas there's a big difference in gene expression at
> 0.25mM.
> Then at 0.5mM and 1.0mM the reponse is not much stronger than at
> 0.25mM. 
> 
> If one just looks at a single gene, then expression of this gene goes
> up
> quite strongly from 0.1mM to 0.25mM, and then expression flattens out
> for the
> higher doses. The response reaches saturation. Other 
> resposnes are more
> like
> a logistic curve. This is a typical scenario.
> 
> The problem is that many genes within one experiment behave like
> described
> above, otheres change linear others exponetial ...
> 
> Could I still use lm for this kind of experiment? Would I've to decide
> on a
> gene by gene basis?
> 
> 2. Some of the factors such as treament (T) for an experiment can only
> take
> say 2 distinct values: treated (t) and untreated (ut). Does a model
> such as Y
> ~ T make any sense in this case?
> 
> Doesn't this assume a linear relationship between just 2 "clouds" of
> data
> (assume there are many samples for each factor level)? Even if one can
> clearly distinguish between t and ut - assuming a straight line may
> wrong.
> This is like drawing a straight line between two points. Just like in
> my
> example above with the different doses, you may have already reached
> some
> kind of saturation. Using such a model for prediction would then give
> wrong
> results.
> 
> However, if one just wants to distinguish between t and ut, would the
> lm be a
> valid method?
> 
> I'm reading some "beginners" literature about lm's, and I'm 
> just trying
> to
> understand what's going on ... .
> 
> Maybe you could comment on this. I'd be very interested in any
> explanation or
> clearification.
> 
> 	kind regards,
> 
> 	Arne
> 
> --
> Arne Muller, Ph.D.
> Toxicogenomics, Aventis Pharma
> arne dot muller domain=aventis com
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch 
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list