[BioC] To many differentially expressed genes produced by LIMMA and dye-effect question
Vladimir Krasikov
krasikov at science.uva.nl
Tue Jan 8 01:40:35 CET 2008
Dear Naomi
Thanks a lot for the fast reply.
Naomi Altman wrote:
> Dear Vladimir,.
> I am a bit confused by the output because, while the code seems to
> indicate that the analysis is on the log2 scale, the DecideTests
> output does not.
>
> If the data are on the natural scale, the distributional assumptions
> of limma are incorrect. The denominator adjustment will pull the
> estimated variance down too much, leading to large values of the test
> statistic and high significance values.
>
> To check the scale of the data, just type:
>
> MA.scale[1:5,]
The scale of the data should be ordinary log2 scale of limma as I do
everything according to the limma vignette and help-pages of the
particular functions.
But anyway I checked the M values and they are indeed in log-scale.
> which prints the first 5 genes. If the data are on the log scale, all
> values should be beween 4 and 16.
>
> Also, in a dye-swap design, you really do need to consider the blocks
> (see the limma manual). Presumably the technical replicates are less
> variable than the biological replicates. Also, the additional
> replication inflates the degrees of freedom. Both of these things
> also pull the estimated variance down.
I would need some help here...
I would like to use the existing power of dye-swaps technical replicates
to get as much info as possible
Here are some variants which I've tried and all of them are producing
strange staff:
1.
(the one I already described earlier)
>design <- modelMatrix(targets, ref="Ref")
>design <-cbind(Dye=1, design)
>fit1 <-lmFit(MA.scale,design)
>cont.matrix<-makeContrasts(Dye, A=(A1+....+A11)/11, B=(B1+...+B7)/7,
AvsB=((A1+....+A11)/11-(B1+...+B7)/7), design)
>fit2<-contrasts.fit(fit1,cont.matrix)
>fit3<-eBayes(fit2)
>d<-decideTests(fit3, adjust.method="fdr", p.value=0.001, lfc=log2(1.5))
> summary(d)
Dye A B AvsB
-1 0 7012 7045 448
0 38127 23331 23479 36656
1 6 7790 7609 1029
amount of genes without cutoff on fold change is enormous - more than
16.000 !!!
2a. Blocked dye-swaps
>design <- modelMatrix(targets, ref="Ref")
>design <-cbind(Dye=1, design)
>biolrep<-c(1,1,2,2,3,3,...,18,18)
>corfit <- duplicateCorrelation(MA.scale, design, ndups = 1, block=biolrep)
>corfit$consensus.correlation
[1] NaN
> What is wrong here?
2b.
>design <- modelMatrix(targets, ref="Ref")
>biolrep<-c(1,1,2,2,3,3,...,18,18)
>corfit <- duplicateCorrelation(MA.scale, design, ndups = 1, block=biolrep)
>corfit$consensus.correlation
[1] NaN
> What is wrong here?
2c.
>corfit <- duplicateCorrelation(MA.scale, ndups = 1, block=biolrep)
>corfit$consensus.correlation
[1] -0.9645838
# finally I've got good correlation by excluding the design from the
function
>design <- modelMatrix(targets, ref="Ref")
>design <-cbind(Dye=1, design)
>fit1 <- lmFit(MA.scale, design, block = biolrep, cor = corfit$consensus)
>cont.matrix<-makeContrasts(Dye, A=(A1+....+A11)/11, B=(B1+...+B7)/7,
AvsB=((A1+....+A11)/11-(B1+...+B7)/7), design)
>fit2<-contrasts.fit(fit1,cont.matrix)
>fit3<-eBayes(fit2)
> d <- decideTests(fit3, adjust.method="fdr", p.value=0.01)
> summary(d)
Dye A B AvsB
-1 4818 6360 7950 7
0 28704 25049 22021 38098
1 4611 6724 8162 28
>
# Oops :) No regulation
# 2d. all the same as 2c but in lmFit() excluded block variable:
>fit1 <- lmFit(MA.scale, design, cor = corfit$consensus)
>cont.matrix<-makeContrasts(Dye, A=(A1+....+A11)/11, B=(B1+...+B7)/7,
AvsB=((A1+....+A11)/11-(B1+...+B7)/7), design)
>fit2<-contrasts.fit(fit1,cont.matrix)
>fit3<-eBayes(fit2)
> d <- decideTests(fit3, adjust.method="fdr", p.value=0.01)
> summary(d)
Dye RA SPA RAvsSPA
-1 4818 15172 15440 10159
0 28704 6369 5335 18074
1 4611 16592 17358 9900
>
> d <- decideTests(fit3, adjust.method="fdr", p.value=0.01, lfc=log2(1.5))
> summary(d)
Dye RA SPA RAvsSPA
-1 0 7013 7046 450
0 38127 23330 23477 36653
1 6 7790 7610 1030
> The same amount of regulation as in variant 1
>
Please may you explain me how to use blocks and duplicateCorrelation
because I don't understand outcome
and which of the above scenarios is correct?
The question about enormous amount of statistically reliable regulated
genes still worries me...
I can't explain it
I will appreciate any comments on this matter
Regards
Vladimir Krasikov
> --Naomi
>
> At 07:51 PM 1/6/2008, you wrote:
>> Dear List
>> I hope somebody may give me an explanation of strange phenomena
>>
>> I need some explanation about my results obtained by LIMMA analysis of
>> my microarray data.
>> I have some experience with limma and some other packages from
>> Bioconductor
>> however I must say that I'm rather biologist than bioinformatician.
>>
>> Briefly description of my experiment:
>> I'm comparing two groups of patients
>> with different forms of one disease (let's say group A and group B)
>> RNA from 18 patients (11 from A and 7 from B) were hybridized to 36 44K
>> Agilent human microarrays
>> All of microarrays were performed against common reference and each one
>> had a dye-swap hybridization.
>>
>> targets:
>> filename sampleID Cy3 Cy5
>> 1 A1.ST.txt A1 Ref A1
>> 2 A1.DS.txt A1 A1 Ref
>> ....
>> 21 A11.ST.txt A11 Ref A11
>> 22 A11.DS.txt A11 A11 Ref
>> 23 B1.ST.txt B1 Ref B1
>> 24 B1.DS.txt B1 B1 Ref
>> ....
>> 35 B7.ST.txt B7 Ref B7
>> 36 B7.DS.txt B7 B7 Ref
>>
>> after importing the data, removing all types of control spots from
>> dataset,
>> performing "loess" within array normalization like and "scale" between
>> normalization:
>> >MA.offset <- normalizeWithinArrays(RG, method="loess",
>> bc.method="normexp", offset = 50)
>> >MA.scale <- normalizeBetweenArrays(MA.offset, method="scale")
>>
>> >dim(MA.scale)
>> [1] 38133 36
>>
>> Design matrix is rather simple in my case:
>> >design <- modelMatrix(targets, ref="Ref")
>> and to account for the possible dye-effect include:
>> >design <-cbind(Dye=1, design)
>>
>> and then linear model:
>> >fit1 <-lmFit(MA.scale,design)
>> >cont.matrix<-makeContrasts(Dye,
>> + A=(A1+....+A11)/11,
>> + B=(B1+...+B7)/7,
>> +
>> AvsB=((A1+....+A11)/11-(B1+...+B7)/7),
>> + design)
>> >fit2<-contrasts.fit(fit1,cont.matrix)
>> >fit3<-eBayes(fit2)
>>
>> So far so good but then:
>>
>> >d<-decideTests(fit3, adjust.method="fdr", p.value=0.001)
>> >summary(d)
>> Dye A B AvsB
>> -1 3026 14909 14530 8161
>> 0 32497 6720 7881 21544
>> 1 2610 16504 15722 8428
>> gives me terrible amount of regulations and dye-effects:
>>
>> even with incredibly stricken adjustments and p.value cutoff I'm getting
>> ennormous amount of regulation:
>> >d<-decideTests(fit3, adjust.method="bonferroni", p.value=0.0001)
>> >summary(d)
>> Dye A B AvsB
>> -1 320 12390 11606 2584
>> 0 37496 12633 14242 31896
>> 1 317 13110 12285 3653
>>
>> Even if to play with cut-off on fold change the amount of the data with
>> fold change above 1.5 is 1000 genes for up-regulation...:(
>>
>> In general it can't be true as far as I understand biological problem
>> and statistical surrounding of the data,
>> It's virtualy not possible that different human patients could produce
>> such a similar expression profile to each other (I mean within one
>> group).
>> Form other hand it is violating major assumption of microarray
>> dif.expression testing that only small proportion of the genes could be
>> differentially expressed.
>>
>> May anybody give me a hint on what I'm doing not correct in data
>> treatment
>>
>> I tried to fit the model slightly different way and got completely
>> ironic results which I can't interpret myself at all,
>> so I'm lost afterwards
>>
>> >biolrep <- c(1,1,2,2,3,3,....,18,18)
>> >corfit<-duplicateCorrelation(MA.scale, ndups=1, block=biolrep)
>> >corfit$consensus
>> [1] -0.9645838
>> which is not bad as I do understand that there is nice negative
>> correlation between dye-swaps
>> however there is nowhere stated that all this arrays are belonging to
>> two different groups (A and B),
>> (probably there is no matter for this function as it calculates the
>> correlation between each pair only)
>>
>> Thanks a lot for any help and advise.
>>
>> > sessionInfo()
>> R version 2.6.1 (2007-11-26)
>> i386-pc-mingw32
>>
>> locale:
>> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>> States.1252;LC_MONETARY=English_United
>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] statmod_1.3.1 limma_2.12.0
>> >
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Which afterwards gives a really simple design:
>>
>>
>>
>>
>>
>>
>> --
>> V. Krasikov
>> Swammerdam Institute for Life Sciences
>> Plant Pathology
>> University of Amsterdam
>> Kruislaan 318
>> 1098SM Amsterdam
>>
>> Telephone: +31(0)20 5257839
>> Telefax: +31(0)20 5257934
>> E-mail: krasikov at science.uva.nl
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> Naomi S. Altman 814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics 814-863-7114 (fax)
> Penn State University 814-865-1348 (Statistics)
> University Park, PA 16802-2111
>
--
V. Krasikov
Swammerdam Institute for Life Sciences
Plant Pathology
University of Amsterdam
Kruislaan 318
1098SM Amsterdam
Telephone: +31(0)20 5257839
Telefax: +31(0)20 5257934
E-mail: krasikov at science.uva.nl
More information about the Bioconductor
mailing list