[BioC] Fwd: limma modeling, paired samples

Mon Jun 9 16:07:58 CEST 2014

Hi Riba,

On 6/9/2014 8:14 AM, Riba Michela wrote:
> Hi,
> I'm writing again dealing with a paired sample design:
> the experimental setting involves 9 patients, 3 disease stages and
> microarray expression data
> according to the included target file
>
>
>
>
> target<- readTargets("targetPT.txt")
> head(target)
>
>
> Genotype <- factor(target$Genotype)
> Disease<- factor(target$Disease, levels=c("stageA", "stageB", "stageC"))
>
>   I have performed a paired samples analysis using
> *design <- model.matrix(~Genotype+Disease)*
> *
> *
> in order to sort out genes differentially expressed between stages A and
> B for example
> but I noticed that the first patient and the first disease stage (in
> alphabetical order) disappears in the fit
> using
> colnames (fit)

The patient and disease don't disappear; they are absorbed into the 
intercept term. The model you are fitting is called a 'factor effects' 
model, and all the coefficients are interpreted as differences between a 
given sample type and the 'baseline', which in this case is the Stage A 
disease for Genotype 1.

In other words:

 > colnames(design)
  [1] "(Intercept)"   "DiseasestageB" "DiseasestageC" "Genotypept02" 
"Genotypept03"  "Genotypept04"  "Genotypept06"
  [8] "Genotypept09"  "Genotypept10"  "Genotypept13"  "Genotypept14"

DiseasestageB can be interpreted as Stage B - Stage A after controlling 
for the paired nature of your data. The DiseasestageC coefficient is 
interpreted analogously.

This is a basic concept of linear modeling, and if you are getting 
tripped up on the basics then I would highly recommend finding a local 
statistician who can help you.

Best,

Jim

>
> I tried to use
> *design <- model.matrix(~0+Genotype+Disease)*
> to explicit the coefficient in intercept
> and the first Disease type disappears
>
> I tried again
> *design <- model.matrix(~0+Disease+**Genotype**)*
> and again the first patient in alphabetical order disappears
>
> I do not have sufficient mathematical  education to understand exactly
> what shoud fit the needs
> I would prefer this last model formula to extract using a contrast
> matrix the differentially expressed genes between stages considering the
> variability due to different patients
> because it explicits all the disease stages,
> anyhow I would ask what could be the best way to address this problem
> and what could be the mistakes behind (i.e. I do not have all disease
> conditions for all the 9 patients,.. )
>
> I thank you very much for attention,
>
>
>
> Michela
>
>
>  > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.18.13
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099