[BioC] LIMMA lmscFit: difference between linear models
Koen Bossers
k.bossers at nih.knaw.nl
Mon Sep 26 10:46:36 CEST 2005
Dear all,
I'm currently studying gene expression in human brain samples using
Agilent arrays (4 patients, 4 controls). I am analyzing the data using a
single channel approach (lmscFit), which I think valid for my dataset,
as all channels from each individual nicely cluster together, apart from
all other channels.
The hybridization setup is as follows:
--------------
Cy3 Cy5
US12302316_251182110152_S01_A01 pat2 ctrl3
US12302316_251182110153_S01_A01 pat3 ctrl2
US12302316_251182110154_S01_A01 pat1 ctrl4
US12302316_251182110155_S01_A01 ctrl4 pat4
US12302316_251182110156_S01_A01 ctrl1 pat1
US12302316_251182110157_S01_A01 pat4 ctrl1
US12302316_251182110158_S01_A01 ctrl2 pat4
US12302316_251182110159_S01_A01 pat2 ctrl1
US12302316_251182110160_S01_A01 ctrl3 pat1
US12302316_251182110176_S01_A01 pat3 ctrl4
US12302316_251182110177_S01_A01 ctrl2 pat2
US12302316_251182110178_S01_A01 ctrl3 pat3
--------------
The first analysis I tried was the following: I replaced all individual
labels with a generic one (thus: pat1 becomes pat, ctrl2 becomes ctrl),
and calculated a contrast between pat&ctrl:
--------------
targets2 <- targetsA2C(targets)
u <- unique(targets2$Target)
f <- factor(targets2$Target, levels=u)
design <- model.matrix(~0+f)
colnames(design) <- u
corfit <- intraspotCorrelation(MA, design)
fit <- lmscFit(MA, design, correlation=corfit$consensus)
cont.matrix <- makeContrasts(pat-ctrl,levels=design)
fit2 <- contrasts.fit(fit, cont.matrix)
fit2 <- eBayes(fit2)
--------------
This analysis does not yield many significantly regulated genes (5 or
so), which is likely due to the small number of biological replicates
and the large diversity in the human population.
I also tried another approach, leaving the individual labels intact, and
fitting a linear model in the following manner:
--------------
cont.matrix <- makeContrasts((pat1+pat2+pat3+pat4-ctrl1-ctrl2-ctrl3-ctrl4)/4
,levels=design)
--------------
Is this linear model valid?
This analysis yields loads of significantly regulated genes (hundreds)!
Neither the MA plot or the M values in fit2 look suspicious, so I do not
have a reason to distrust this data.
I do not really understand why there is such a large discrepancy between
the two analysis methods. Is this due to the way replication is handled?
Could anybody comment on the validity of these two analyses, taking into
account individual variation in the human population, and the way
replication is handled in LIMMA?
Thank you very much,
Koen Bossers
--
Koen Bossers, PhD student
Netherlands Institute for Brain Research
Meibergdreef 33
1105 AZ Amsterdam, The Netherlands
Phone: 0031-20-5665512
Email: k.bossers at nih.knaw.nl
More information about the Bioconductor
mailing list