[BioC] Questions about edgeR

Mark Robinson mark.robinson at imls.uzh.ch
Tue Dec 20 09:51:37 CET 2011


Hi Rabe,

You could make it easier by giving us more information.  It's always a good plan to describe your objects and send the output of  sessionInfo().  Since I don't know what is in all of your objects (e.g. 'my_groups_factor', 'my_data', etc.), I'll make some guesses given what you've described. 

Below is a snippet of code (taken almost entirely from the glmFit/glmLRT documentation -- have you read this?) that hard-codes dispersion at 0.1 for 5 samples from different groups with no replication:

--------------
     nlibs <- 5
     ntags <- 100
     dispersion.true <- 0.1
     
     # Make first a factor with 5 levels
     x <- factor(1:5)
     design <- model.matrix(~x)
     
     # Generate count data
     y <- rnbinom(ntags*nlibs,mu=8,size=1/dispersion.true)
     y <- matrix(y,ntags,nlibs)
     colnames(y) <- paste("s",1:5,sep="")
     rownames(y) <- paste("Gene",1:ntags,sep="")
     d <- DGEList(y)
     
     # Normalize
     d <- calcNormFactors(d)
     
     # Fit the NB GLMs
     fit <- glmFit(d, design, dispersion=0.1)
     
     # Likelihood ratio test for differences b/w groups
     results <- glmLRT(d, fit, coef=2:5)
     topTags(results)
--------------

Perhaps, you can modify this to suit your purpose.

Of course, as I hope you can appreciate, this is not as ideal as having biological replicates and estimating the dispersion through the sophisticated methods available ... 

Regards,
Mark



On 20.12.2011, at 09:31, Asma rabe wrote:

> Hi Mark,
> 
> Thank you for help.
> 
> when i tried :
> 
> >mm<-model.matrix(~0+my_groups_factor)
> > fit<-glmfit(my_data,mm,dispresion_value)
> > fit<-glmLRT(my_data,fit)
> > topTags(fit)
> ---------------
> 
> i got:
> 
> Error in abs(object$table$logFC) : 
>   Non-numeric argument to mathematical function
> ----------------
> Any idea?
> 
> Thank you,
> Rabe
> 
> 
> On Tue, Dec 20, 2011 at 5:02 AM, Mark Robinson <mark.robinson at imls.uzh.ch> wrote:
> Hi Rabe,
> 
> Some comments below.
> 
> 
> On 19.12.2011, at 10:57, Asma rabe wrote:
> 
> > Hi All,
> >
> > I have RNA-seq data for different time points 0hr,2hr,4hr,6hrs and 8hrs.
> >
> > I would like to use edgeR to get genes that are differentially expressed
> > throughout the experiment.
> > In a recent edgeR user's manual ,it has been suggested to use a dispersion
> > value of 0.1 in case of no replicates for organisms that are genetically
> > similar to humans.
> > so it is possible to be used for mouse data for example,what about other
> > organisms like yeast ??
> 
> Yes, it is possible.  But, as mentioned in the manual, none of these solutions are ideal.
> 
> 
> > As exact test is used to get DE genes across 2 samples and i need to get DE
> > genes across my time course experiment,i used GLM.
> >
> > when i  used GLM model to find DE genes across my samples i ran the
> > following commands:
> >
> > mm<-model.matrix(~0+my_groups_factor)
> > fit<-glmfit(my_data,mm,dispresion_value)
> > fit<-glmLRT(data,fit)
> > topTags(fit)
> >
> > -----
> > i got:
> >
> > coefficient: group8
> >                 logConc         logFC          PValue            FDR
> > gene1         .....                 .....               ....
> >    ....
> > gene2         ..                       ....
> > ....                   ....
> > ..
> > .
> > ..
> >
> >
> > ---
> > I have a basic question(sorry i'm not statistician)
> > Why the last gorup 8hrs was chooosen to be the coef.??
> 
> Have a look at the documentation for glmLRT and specifically the 'coef' argument:
> 
> ?glmLRT
> 
> ----
> coef:     … By default, the
>          last column of the design matrix is dropped to form the
>          design matrix for the null model.
> ----
> 
> You have 5 levels for your time course (therefore, design matrix has 5 columns).  To test for any difference between groups, I would suggest reparameterizing your design matrix and then specifying the 'coef' argument.  Something like the following:
> 
> mm <- model.matrix(~my_groups_factor)
> fit <- glmFit(my_data,mm,dispresion_value)
> fit <- glmLRT(data,fit,coef=2:5)
> topTags(fit)
> 
> Hope that helps.
> 
> Regards,
> Mark
> 
> 
> >
> > Thank you in advance.
> > Best Regards,
> > Rabe
> 
> 
> ----------
> Prof. Dr. Mark Robinson
> Bioinformatics
> Institute of Molecular Life Sciences
> University of Zurich
> Winterthurerstrasse 190
> 8057 Zurich
> Switzerland
> 
> v: +41 44 635 4848
> f: +41 44 635 6898
> e: mark.robinson at imls.uzh.ch
> o: Y32-J-34
> w: http://tiny.cc/mrobin
> 
> 
> 

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y32-J-34
w: http://tiny.cc/mrobin



More information about the Bioconductor mailing list