[BioC] edgeR topTags output (logFC)
Gordon K Smyth
smyth at wehi.EDU.AU
Wed Aug 28 07:52:33 CEST 2013
Hi Susan,
Thanks, I see now.
Your experiment has 152 genotype x treatment combinations, and all of them
occur in your experiment. I strongly suggest that, instead of using an
interaction model formula, you simply define a factor that takes 152
levels and then make comparisons between the levels. This fits the same
model as you have fitted, but is parametrized in a way that is more
interpretable and useful. See Section 3.3.1 of the edgeR User's Guide.
This is almost always the best approach because you know exactly what each
comparison means.
I am not a fan of interaction model formulas like geno*trt for genomic
experiments. I don't think that the coefficients have useful meanings for
most experiments. In particular, the linear effect terms (like trtshade)
cannot be interpreted because of the present of the interactions.
On Tue, 27 Aug 2013, Susan Bush wrote:
> Okay, sorry. Here's more information:
>
> The model design I am using is *design <- model.matrix( ~ geno * trt)*,
> where "geno" has 76 values and "trt" has 2 (sun, shade)
>
> Using glmLRT, I am testing "trt", so * lrt <- glmLRT(dge, fit , coef
> = grep("trt", colnames(design))*
>
> The logFC values returned by topTags, then, are for columns
> *trtshade, geno2.trtshade, geno3.trtshade
> *, etc.
This will give you a test of whether any of the genotypes show a treatment
effect. This test is on 76 degrees of freedom, so there should be 76
columns for this test. I could explain what the individual coefficients
mean mathematically, but I don't think that they are useful for you.
> I also test "geno", where the logFC values are for columns *geno2,
> geno3, geno2.trtshade, geno3.trtshade*, etc.
This is a test of whether there any differences between any of the
genotypes in either of the treatment conditions (on 150 degrees of
freedom).
> Ultimately I test the interaction, so logFC values are for columns
> *geno2.trtshade,
> geno3.trtshade, geno4.trtshade*, etc.
All the genotypes are versus geno1, and treatment is versus sun, so the
coefficient "geno2.trtshade" measures how much larger the logFC for shade
vs sun is for geno2 as compared to geno1. In other words, the coefficient
means:
geno2.trtshade =
(geno2.shade - geno2.sun) - (geno1.shade - geno1.sun)
Best wishes
Gordon
> I am trying to figure out what these logFC coefficients are compared to, in
> each case. If I am interested in specific genotype effects, will I get a
> more appropriate answer by testing "geno2" individually, rather than "geno"
> all at once?
>
> Thanks - I do appreciate it.
>
>
>
>
>
>
>
> *****
> Susan Bush, Ph. D.
> Postdoctoral Scholar
> Maloof Lab, Dept of Plant Biology
> University of California, Davis
> 1 Shields Avenue
> Davis, CA 95616
>
> email: smbush at ucdavis.edu
> lab: 530-752-8086
> cell: 507-828-8958
>
>
> On Tue, Aug 27, 2013 at 5:05 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Susan,
>>
>> Date: Mon, 26 Aug 2013 18:28:03 -0700 (PDT)
>>> From: "Susan [guest]" <guest at bioconductor.org>
>>> To: bioconductor at r-project.org, smbush at ucdavis.edu
>>> Subject: [BioC] edgeR topTags output (logFC)
>>>
>>>
>>> I'm just wanting to confirm my understanding of the edgeR topTags()
>>> output. I'm using a design matrix with many coefficients (76 genotypes, 2
>>> treatments[sun, shade]).
>>>
>>> 1) In the topTags output, I am looking at logFC which represents the
>>> estimated or predicted means -- NOT coefficients, correct?
>>>
>>
>> Incorrect. As the abbreviation suggests, logFC gives you
>> log-fold-changes. logFC are indeed coefficients, not group means.
>>
>> 2) I have selected one parental genotype to be my reference genotype, and
>>> I am comfortable interpreting the logFC values for the other genotypes as
>>> comparisons to this genotype. I have selected one treatment as the
>>> reference (sun). When I see the logFC values for 'trtshade', I assume this
>>> is also in comparison to the reference genotype in the reference treatment.
>>>
>>
>> Yes, assuming that trt is what you have called the treatment factor, then
>> trtshade would the coefficient estimating the log-fold-change between the
>> shade and sun treatments.
>>
>> My question is about looking at 'geno2.trtshade' values - are these
>>> values in comparison to 'trtshade' or 'geno2' or the reference
>>> genotype/reference treatment values?
>>>
>>
>> Without knowing what model you have fitted, or which coefficients you have
>> tested for, it is impossible to tell.
>>
>> Best wishes
>> Gordon
>>
>> For example: this is calling glmLRT with coef = grep("trt",
>>> colnames(Design)) to identify treatment effects
>>>
>>> Row.names logFC.trtsha logFC.genoIL1.1.trtsha logCPM
>>> LR PValue FDR
>>> 1 Solyc08g078300.2.1 0.9440264 -0.379030370 4.207443
>>> 1135.8061 1.450273e-188 1.637938e-184
>>> 2 Solyc06g060830.2.1 0.9630343 0.038988710 4.511893
>>> 1095.1525 2.539194e-180 1.433883e-176
>>> 3 Solyc06g067910.2.1 1.2468599 -0.096499016 4.795789
>>> 1040.5813 2.722511e-169 1.024935e-165
>>> 4 Solyc06g008580.2.1 1.1500683 -0.451731914 3.300982
>>> 696.1677 6.043868e-101 1.706486e-97
>>> 5 Solyc07g017600.2.1 0.5026292 0.007208458 5.209938
>>> 694.2070 1.451776e-100 3.279271e-97
>>> 6 Solyc10g083760.1.1 -0.2877568 0.045952894 7.033394
>>> 651.8629 2.232759e-92 4.202796e-89
>>>
>>> Thanks so much!
>>>
>>>
>>> -- output of sessionInfo():
>>>
>>> sessionInfo()
>>>>
>>> R version 3.0.1 (2013-05-16)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=C
>>> LC_COLLATE=C LC_MONETARY=C LC_MESSAGES=C LC_PAPER=C
>>> [8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
>>> LC_MEASUREMENT=C LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] splines stats graphics grDevices utils datasets methods
>>> base
>>>
>>> other attached packages:
>>> [1] ggplot2_0.9.3.1 reshape_0.8.4 plyr_1.8 edgeR_3.2.3
>>> limma_3.16.4
>>>
>>> loaded via a namespace (and not attached):
>>> [1] MASS_7.3-26 RColorBrewer_1.0-5 colorspace_1.2-2
>>> dichromat_2.0-0 digest_0.6.3 grid_3.0.1 gtable_0.1.2
>>> labeling_0.1
>>> [9] munsell_0.4 proto_0.3-10 reshape2_1.2.2 scales_0.2.3
>>> stringr_0.6.2 tools_3.0.1
>>>
>>
>>
>> ______________________________**______________________________**__________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________**______________________________**__________
>>
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list