[BioC] Understanding the columns in the limma results output
James W. MacDonald
jmacdon at uw.edu
Wed Aug 29 22:18:50 CEST 2012
Hi Jorge,
On 8/29/2012 4:15 PM, Jorge Miró wrote:
> Hi,
>
> My bad too :) What I really meant with fold change was the fold ratio
> change between the groups. I though that was what was in topTable but
> now I think I get it. The fold change that is printed in limma is the
> difference between the means in log2 and not the ratio of the mean of
> the gene expressions in their linear form.
>
> The data I'm analysing is from RMA and from PLIER so I have linear
> values for the gene expressions summarized with PLIER. Should I
> log2-transform them before passing them to the functions in limma?
Yes.
>
> Also, if I now wanted to calculated the ratios between the RMA gene
> expressions, do you recommend that I transform the expressions to
> linear values before calculating the ratio or is it OK to take the
> ratio in log2?
You don't take the ratio, you take the difference. Remember that
log(x/y) = log(x) - log(y).
Best,
Jim
>
> Best regards
> Jorge
>
> On Wed, Aug 29, 2012 at 4:56 PM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
> Hi Jorge,
>
> My bad. The coefficients for write.fit() are the fold changes, and
> you do get a column for each. As an example, with fake data.
>
> > library(limma)
> > mat <- matrix(rnorm(12e5), ncol = 12)
> > design <- model.matrix(~0+factor(rep(1:4, each = 3)))
> > contrast <- matrix(c(1,-1,0,0,1,0,-1,0,1,0,0,-1), ncol = 3)
> > fit2 <- contrasts.fit(fit, contrast)
> > fit2 <- eBayes(fit2)
> > rslt <- decideTests(fit2)
> > write.fit(fit2, rslt, "tmp.txt")
> > dat <- read.table("tmp.txt", header=T)
> > head(dat)
> Coef.1 Coef.2 Coef.3 t.1 t.2 t.3 p.value.1 p.value.2
> p.value.3 F
> 1 -0.837 -0.352 -0.430 -1.02 -0.43 -0.53 0.30621 0.66699
> 0.59932 0.35
> 2 -1.200 -0.599 0.524 -1.47 -0.73 0.64 0.14148 0.46259
> 0.52034 1.67
> 3 -1.014 -1.533 -1.515 -1.24 -1.87 -1.85 0.21627 0.06176
> 0.06492 1.54
> 4 0.520 0.308 1.059 0.64 0.38 1.30 0.52390 0.70573
> 0.19449 0.60
> 5 -0.848 0.269 -0.611 -1.04 0.33 -0.75 0.29880 0.74132
> 0.45410 0.81
> 6 -0.094 -0.245 -0.474 -0.11 -0.30 -0.58 0.90847 0.76396
> 0.56088 0.13
> F.p.value Res.1 Res.2 Res.3
> 1 0.78697 0 0 0
> 2 0.17130 0 0 0
> 3 0.20359 0 0 0
> 4 0.61671 0 0 0
> 5 0.48698 0 0 0
> 6 0.94300 0 0 0
>
> > nrow(mat)
> [1] 100000
> > nrow(dat)
> [1] 100000
>
> An alternative you might consider is the writeFit() function from
> my affycoretools package, which outputs the data in a slightly
> different format.
>
> Best,
>
> Jim
>
>
>
>
> On 8/29/2012 2:55 AM, Jorge Miró wrote:
>
> Hi James,
>
> thanks for the explanation. I do not really understand the
> columns yet. Shouldn't the FC be printed for every comparison
> as is done for the Coef-columns? I just get one A-column.
> Is there any way of printing the results to a file with the
> same columns as in topTable()?
> What I'm really want to have in the output is a p-value
> column, an FC column and an adjusted p-value column.
>
> Best regards
>
>
> On Tue, Aug 28, 2012 at 5:41 PM, James W. MacDonald
> <jmacdon at uw.edu <mailto:jmacdon at uw.edu> <mailto:jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>>> wrote:
>
> Hi Jorge,
>
>
> On 8/28/2012 11:20 AM, Jorge Miró wrote:
>
> Hi,
>
> I have run the commands below to get an analysis of
> differential
> expressions in my Affymetrix arrays
>
> #Prepare the design and contrast matrices for my
> comparisons
> of the three
> groups in a loop-manner.
>
> design<- model.matrix(~0+factor(c(1,1,1,2,2,2,3,3,3)))
> colnames(design)<- c('GroupA', 'GroupB', 'GroupC')
> contrast.matrix<- makeContrasts(GroupB-GroupA,
> GroupC-GroupA,
>
> GroupC-GroupB, levels=design)
>
> #Check design and contrast matrices
>
> design
>
> GroupA GroupB GroupC
> 1 1 0 0
> 2 1 0 0
> 3 1 0 0
> 4 0 1 0
> 5 0 1 0
> 6 0 1 0
> 7 0 0 1
> 8 0 0 1
> 9 0 0 1
> attr(,"assign")
> [1] 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$`factor(c(1, 1, 1, 2, 2, 2, 3, 3, 3))`
> [1] "contr.treatment"
>
> contrast.matrix
>
> Contrasts
> Levels GroupB - GroupA GroupC - GroupA GroupC - GroupB
> GroupA -1 -1 0
> GroupB 1 0 -1
> GroupC 0 1 1
>
> #Fitting the eset to to the design and contrast
>
> fit<- lmFit(exprs, design)
> fit2<- contrasts.fit(fit, contrast.matrix)
>
> #Computing the statistics
>
> fit2<- eBayes(fit2)
>
>
> Then I check the results with topTable and get the
> following
> columns in the
> output
>
> topTable(fit2)
>
> GroupB...GroupA GroupC...GroupA GroupC...GroupB
> AveExpr F
> P.Value adj.P.Val
> 25031 2.3602203 2.4273830 0.06716267
> 5.021412
> 29.06509
> 7.844834e-05 0.9587773
> 12902 -0.4572897 -0.5680943 -0.11080467
> 7.516681
> 25.41608
> 1.365021e-04 0.9587773
> 7158 -0.4478660 -0.4296077 0.01825833
> 7.057833
> 23.48871
> 1.880100e-04 0.9587773
> 18358 -0.1002647 0.3304903 0.43075500
> 7.352807
> 22.78417
> 2.125096e-04 0.9587773
> 28768 -0.7695883 -1.3837750 -0.61418667
> 3.983044
> 22.47514
> 2.244612e-04 0.9587773
> 28820 -0.1708800 -0.9939680 -0.82308800
> 5.470826
> 18.25071
> 5.081473e-04 0.9587773
> 15238 -0.4850297 -0.4658157 0.01921400
> 7.071662
> 17.15191
> 6.440979e-04 0.9587773
> 24681 -0.3759717 -0.3486450 0.02732667
> 9.281578
> 16.47813
> 7.493077e-04 0.9587773
> 19246 -0.8675393 -0.5082140 0.35932533
> 8.123538
> 16.27776
> 7.845150e-04 0.9587773
> 28808 0.2601277 0.6909140 0.43078633
> 4.814602
> 16.21283
> 7.963487e-04 0.9587773
>
> I want to export my results and write
>
> results<- decideTests(fit2)
> write.fit(fit2, results, "limma_results.txt",
> adjust="BH")
>
> Now don't get the same columns as when using topTable
> which is
> quite
> confusing. Why don't I get the FC for the comparisons
> between
> the different
> groups as if I run topTable with the coef parameter (
> "topTable(fit2,
> coef=1)" )? The columns I get are the following
>
>
> The simple answer is that they are two different functions
> with
> different goals. But note that you do get the same
> information.
>
>
>
> A
>
> Coef.GroupB - GroupA
> Coef.GroupC - GroupA
> Coef.GroupC - GroupB
>
> t.GroupB - GroupA
> t.GroupC - GroupA
> t.GroupC - GroupB
>
> p.value.GroupB - GroupA
> p.value.GroupC - GroupA
> p.value.GroupC - GroupB
>
> p.value.adj.GroupB - GroupA
> p.value.adj.GroupC - GroupA
> p.value.adj.GroupC - GroupB
>
> F
> F.p.value
>
> Res.GroupB - GroupA
> Res.GroupC - GroupA
> Res.GroupC - GroupB
>
>
> Could some body please try to explain what do the
> columns A,
> Coef, F,
> F.p.value and Res mean?
>
>
> A - are your log fold change values
> Coef - are your coefficients (you set up a cell means
> model, so
> these are the sample means)
> F - is an F-statistic, which tests the null hypothesis
> that none
> of the sample means are different
> F.p.value - is the p-value for the F-statistic
> Res - is the results matrix you passed into write.fit(),
> showing
> which contrast(s) were significant
>
> Best,
>
> Jim
>
>
>
>
>
> #Session info
>
> sessionInfo()
>
> R version 2.15.0 (2012-03-30)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=Swedish_Sweden.1252
> LC_CTYPE=Swedish_Sweden.1252
> LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C
> LC_TIME=Swedish_Sweden.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets
> methods
> base
>
> other attached packages:
> [1] limma_3.12.1 Biobase_2.16.0
> BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] affylmGUI_1.30.0 IRanges_1.14.4
> oneChannelGUI_1.22.10
> stats4_2.15.0 tcltk_2.15.0
> Best regards
> JMA
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
> <mailto:Bioconductor at r-project.org
> <mailto:Bioconductor at r-project.org>>
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> -- James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list