[BioC] Fwd: limma modeling, paired samples
James W. MacDonald
jmacdon at uw.edu
Mon Jun 9 16:07:58 CEST 2014
Hi Riba,
On 6/9/2014 8:14 AM, Riba Michela wrote:
> Hi,
> I'm writing again dealing with a paired sample design:
> the experimental setting involves 9 patients, 3 disease stages and
> microarray expression data
> according to the included target file
>
>
>
>
> target<- readTargets("targetPT.txt")
> head(target)
>
>
> Genotype <- factor(target$Genotype)
> Disease<- factor(target$Disease, levels=c("stageA", "stageB", "stageC"))
>
> I have performed a paired samples analysis using
> *design <- model.matrix(~Genotype+Disease)*
> *
> *
> in order to sort out genes differentially expressed between stages A and
> B for example
> but I noticed that the first patient and the first disease stage (in
> alphabetical order) disappears in the fit
> using
> colnames (fit)
The patient and disease don't disappear; they are absorbed into the
intercept term. The model you are fitting is called a 'factor effects'
model, and all the coefficients are interpreted as differences between a
given sample type and the 'baseline', which in this case is the Stage A
disease for Genotype 1.
In other words:
> colnames(design)
[1] "(Intercept)" "DiseasestageB" "DiseasestageC" "Genotypept02"
"Genotypept03" "Genotypept04" "Genotypept06"
[8] "Genotypept09" "Genotypept10" "Genotypept13" "Genotypept14"
DiseasestageB can be interpreted as Stage B - Stage A after controlling
for the paired nature of your data. The DiseasestageC coefficient is
interpreted analogously.
This is a basic concept of linear modeling, and if you are getting
tripped up on the basics then I would highly recommend finding a local
statistician who can help you.
Best,
Jim
>
> I tried to use
> *design <- model.matrix(~0+Genotype+Disease)*
> to explicit the coefficient in intercept
> and the first Disease type disappears
>
> I tried again
> *design <- model.matrix(~0+Disease+**Genotype**)*
> and again the first patient in alphabetical order disappears
>
> I do not have sufficient mathematical education to understand exactly
> what shoud fit the needs
> I would prefer this last model formula to extract using a contrast
> matrix the differentially expressed genes between stages considering the
> variability due to different patients
> because it explicits all the disease stages,
> anyhow I would ask what could be the best way to address this problem
> and what could be the mistakes behind (i.e. I do not have all disease
> conditions for all the 9 patients,.. )
>
> I thank you very much for attention,
>
>
>
> Michela
>
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] it_IT.UTF-8/it_IT.UTF-8/it_IT.UTF-8/C/it_IT.UTF-8/it_IT.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] limma_3.18.13
>
> loaded via a namespace (and not attached):
> [1] tools_3.0.2
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list