[BioC] edgeR: glmLRT test

James W. MacDonald jmacdon at uw.edu
Tue Mar 26 14:24:04 CET 2013


Hi KJ,

On 3/26/2013 6:31 AM, KJ Lim wrote:
> Dear Jim and community,
>
> I read the Limma User's Guide and read again the edgeR User's Guide.
>
> I believe the section 3.5 "Comparison both Between and Within 
> Subjects"  in the edgeR User's Guide is suitable to answer my 
> question: Do phenotypes HS & LS have effect on the area (SW is control)?
>
> I constructed the design matrix as:
>
> > Phenotype <- factor(targets$phenotype, levels=c("HS", "LS"))
> > Area <- factor(targets$area, levels=c("SW","TZ"))
> > Tree <- gl(2,2, length=8)
>
> > design <- model.matrix(~Phenotype+Phenotype:Tree+Phenotype:Area)
>
> > colnames(design)
>   [1] "(Intercept)"       "PhenotypeLS"        "PhenotypeHS:Tree2"
>   [4] "PhenotypeLS:Tree2"  "PhenotypeHS:AreaTZ" "PhenotypeLS:AreaTZ"
>
> Then carried out the Likelihood Ratio Test (LRT):
>
> >lrtHvsL <- glmLRT(fit, contrast=c(0,0,0,0,1,-1))
>
> > summary(decideTestsDGE(lrtHvsL))
>           [,1]
>   -1       1
>    0  27849
>    1       0
>
> Based on the LRT result, can I concluded that phenotypes HS & LS have 
> NO effect on the area? If this is not the way, kindly please advice.

This brings us to a pedantic discussion of what a p-value means, what a 
statistical test is actually testing, etc. This is in general boring for 
non statisticians, but is a fairly critical point that people should 
understand.

Without getting into the nuances (if interested, there was a recent 
discussion on Simply Statistics; see point 4 here; 
http://simplystatistics.org/2013/03/17/sunday-datastatistics-link-roundup-31713/), 
when you are doing a statistical test you are only trying to reject the 
null hypothesis, which in this case is that there is no interaction 
between area and treatment. Failure to reject the null does not imply 
the converse of proving the null true.

There are any number of reasons why you couldn't reject the null here. 
You might not have enough data, there might be an uncontrolled variable 
that is messing things up, etc. At this point all you can say is that 
you have no evidence to support an interaction between treatment and area.

Best,

Jim


>
> Thank you very much for your time and help.
>
> Have a nice day.
>
> Best regards,
> KJ Lim
>
>
>
>
> On 25 March 2013 09:20, KJ Lim <jinkeanlim at gmail.com 
> <mailto:jinkeanlim at gmail.com>> wrote:
>
>     Dear Jim,
>
>     Thanks for your prompt relied and suggestion.  I will have a look
>     on the Limma User's Guide.
>     Thank you.
>
>     Best regards,
>     KJ Lim
>
>
>     On 22 March 2013 17:28, James W. MacDonald <jmacdon at uw.edu
>     <mailto:jmacdon at uw.edu>> wrote:
>
>         Hi KJ,
>
>
>         On 3/22/2013 11:17 AM, KJ Lim wrote:
>
>             Dear Jim,
>
>             Thanks for your replied.
>
>             I used read.delim to read my data into R session. The
>             target object is:
>
>                files        phenotype   area
>             1 H3TZ.txt   HS           TZ
>             2 H3SW.txt  HS           SW
>             3 H4TZ.txt   HS           TZ
>             4 H4SW.txt  HS           SW
>             5 L2TZ.txt    LS           TZ
>             6 L2SW.txt   LS           SW
>             7 L3TZ.txt    LS           TZ
>             8 L3SW.txt   LS           SW
>
>             I built the factors as:
>
>             > phenotype <- factor(targets$phenotype)
>             > area <- factor(ts.targets$area, levels=c("TZ","SW"))
>
>             I want to know do the phenotypes (HS & LS) have effect on
>             the area. I built the design matrix in each way and I
>             found the colnames is not as you have suggested.
>
>
>         Right. But that isn't the point I was trying to make (that the
>         colnames would be something in particular). Instead, I was
>         pointing out that the coefficients being estimated will be the
>         same regardless, and the only difference is the order in the
>         design matrix.
>
>
>
>             > colnames(model.matrix(~phenotype+area))
>               [1] "(Intercept)" "phenotypeL"  "areaTZ"
>             > colnames(model.matrix(~area+phenotype))
>               [1] "(Intercept)" "areaTZ"      "phenotypeL"
>
>             Perhaps, I made mistakes in between?
>
>
>         No, you illustrated my point as well. The two coefficients of
>         note here are areaTZ and phenotypeL, and the only difference
>         between the design matrices is the order that they appear.
>
>         But you make a further statement above 'I want to know do the
>         phenotypes (HS & LS) have effect on the area.'. This looks to
>         me like you want the interaction term, which captures the
>         difference in phenotype, given the area (or conversely, the
>         differences in area, given phenotype).
>
>         I believe there is at least one example of an interaction term
>         in the limma User's Guide that you can look at to gain
>         understanding.
>
>         Best,
>
>         Jim
>
>
>
>             Thanks for your time. Have a nice weekend.
>
>             Best regards,
>             KJ Lim
>
>             On 22 March 2013 16:17, James W. MacDonald <jmacdon at uw.edu
>             <mailto:jmacdon at uw.edu> <mailto:jmacdon at uw.edu
>             <mailto:jmacdon at uw.edu>>> wrote:
>
>                 Hi KJ,
>
>                 The short answer is that it doesn't matter, contingent
>             upon you
>                 having set up the experiment correctly. Have you tried
>                 constructing the design matrix each way?
>
>             > phenotype <- factor(rep(1:2, each=8))
>             > area <- factor(rep(1:2, times = 8))
>             > colnames(model.matrix(~area+phenotype))
>                 [1] "(Intercept)" "area2"       "phenotype2"
>             > colnames(model.matrix(~phenotype+area))
>                 [1] "(Intercept)" "phenotype2"  "area2"
>
>                 The interpretation of each coefficient will be the
>             same in both
>                 instances.
>
>                 Best,
>
>                 Jim
>
>
>
>
>                 On 3/22/2013 9:57 AM, KJ Lim wrote:
>
>                     Dear edgeR community,
>
>                     Good day.
>
>                     I have sets of RNA-Seq data of 2 phenotype (HighS,
>             LowS)
>                     plants with area
>                     TZ&  SW (control).  I used exactTest to study
>             which genes that
>                     are
>
>                     differential expressed in area TZ compare to SW of
>             each phenotype.
>
>                     I would like to know do the phenotypes have effect
>             on the
>                     area, in this
>                     case I should use glmLRT test. Before fit into the
>             GLM, I
>                     defined my design
>                     matrix as:
>
>                        ~phenotype+area
>
>                     Could someone please advice me regarding the
>             design matrix?
>                     Should I define
>                     my design matrix as  ~area+phenotype ?
>
>                     Thank you very much for your time and help.
>
>                     Have a nice weekend.
>
>                     Best regards,
>                     KJ Lim
>
>                             [[alternative HTML version deleted]]
>
>                     _______________________________________________
>                     Bioconductor mailing list
>             Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>
>             <mailto:Bioconductor at r-project.org
>             <mailto:Bioconductor at r-project.org>>
>
>             https://stat.ethz.ch/mailman/listinfo/bioconductor
>                     Search the archives:
>             http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>                 --     James W. MacDonald, M.S.
>                 Biostatistician
>                 University of Washington
>                 Environmental and Occupational Health Sciences
>                 4225 Roosevelt Way NE, # 100
>                 Seattle WA 98105-6099
>
>
>
>         -- 
>         James W. MacDonald, M.S.
>         Biostatistician
>         University of Washington
>         Environmental and Occupational Health Sciences
>         4225 Roosevelt Way NE, # 100
>         Seattle WA 98105-6099
>
>
>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list