[BioC] Regarding an issue with easyRNASeq - from Surjyendu Ray.

Sun Aug 26 16:27:24 CEST 2012

Dear Surjyendu Ray,

The issue is corrected in package version 1.2.5 (stable) and version 1.3.13 (devel).

These will be available in a couple of days from Bioconductor. In the meanwhile, the following code resolves the issue and gives the expected results.

## libs
library(easyRNASeq)
library(RnaSeqTutorial)
library(BSgenome.Dmelanogaster.UCSC.dm3)

## conditions
conditions <- rep(c("A","B"),each=2)
names(conditions) <- dir(system.file("extdata",package="RnaSeqTutorial"),pattern="[A,C,T,G]{6}\\.bam$")

## count
obj <- easyRNASeq(
                  filesDirectory=
                  system.file(
                              "extdata",package="RnaSeqTutorial"),
                  pattern="[A,C,T,G]{6}\\.bam$",
                  readLength=30L,
                  organism="Dmelanogaster",
                  chr.sizes=as.list(seqlengths(Dmelanogaster)),
                  annotationMethod="rda",
                  annotationFile=system.file("data","gAnnot.rda",package="RnaSeqTutorial"),
                  count="exons",outputFormat="edgeR",
                  format="bam",
                  normalize=FALSE,
                  conditions=conditions)

## calculate the normalization factors
obj <- calcNormFactors(obj)

## plot the normalization factor per sample pairs
apply(combn(rownames(obj$samples),2),2,function(co,obj){plotNormalizationFactors(obj,co[1],co[2])},obj)

## plot the MDS
plotMDS.DGEList(obj, main = "MDS of all conditions", labels = rownames(obj$samples))

## calculate the dispersion
obj <- estimateCommonDisp(obj)

## plot the dispersion estimate
obj <- estimateTagwiseDisp(obj)
plotDispersionEstimates(obj)
plotMeanVar(obj, show.raw.vars = TRUE, show.tagwise.vars = TRUE,
            NBline = TRUE, main="Mean-variance (tag variances against tag abundance)")
legend("bottomright",col=c("gray60","lightskyblue","darkred","dodgerblue3",1),
       pch=c("o","o","x",rep(NA,2)),lty=c(rep(NA,3),1,1),lwd=c(rep(NA,3),4,1),
       ,pt.cex=0.6,c("raw tagwise variances","gene estimated variance",
          "100 bin averaged raw variance","common dispersion est. var.",
          "poisson variance"))

Best,

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

On 22 Aug 2012, at 23:17, Surjyendu Ray wrote:

> Dear Sir,
>              Please do let me know what you think is going wrong in the analysis.
>          Thanking you,
>                                   Yours faithfully,
>                                   Surjyendu Ray. 
> 
> On Thu, Aug 16, 2012 at 5:13 PM, Surjyendu Ray <surjray at gmail.com> wrote:
> Dear Sir,
>              The following is the output of the sessioninfo() command:
> 
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
>  [7] LC_PAPER=C                 LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods  
> [8] base     
> 
> other attached packages:
>  [1] easyRNASeq_1.2.4       ShortRead_1.14.4       latticeExtra_0.6-19   
>  [4] RColorBrewer_1.0-5     lattice_0.20-6         Rsamtools_1.8.6       
>  [7] DESeq_1.8.3            locfit_1.5-8           BSgenome_1.24.0       
> [10] GenomicRanges_1.8.12   Biostrings_2.24.1      IRanges_1.14.4        
> [13] edgeR_2.6.10           limma_3.12.1           biomaRt_2.12.0        
> [16] Biobase_2.16.0         genomeIntervals_1.12.0 BiocGenerics_0.2.0    
> [19] intervals_0.13.3      
> 
> loaded via a namespace (and not attached):
>  [1] annotate_1.34.1      AnnotationDbi_1.18.1 bitops_1.0-4.1      
>  [4] DBI_0.2-5            genefilter_1.38.0    geneplotter_1.34.0  
>  [7] grid_2.15.1          hwriter_1.3          RCurl_1.91-1        
> [10] RSQLite_0.11.1       splines_2.15.1       stats4_2.15.1       
> [13] survival_2.36-14     XML_3.9-4            xtable_1.7-0        
> [16] zlibbioc_1.2.0      
> 
> Please advise.
> 
>             Thanking you,
>                                          Yours faithfully,
>                                          Surjyendu Ray.             
> 
> On Thu, Aug 16, 2012 at 3:48 AM, Nicolas Delhomme <delhomme at embl.de> wrote:
> Dear Surjyendu Ray,
> 
> It seems to be an issue with the edgeR API. I would need more information from you, can you please run in your R session:
> 
> library(easyRNASeq)
> sessionInfo()
> 
> and send me the results?
> 
> Thanks,
> 
> Nico
> 
> P.S. I've Cc'ed the Bioconductor mailing list as it might help other users. If you do not know this mailing list and are doing analyses in R/Bioc, I would advise you to subscribe to it.
> 
> ---------------------------------------------------------------
> Nicolas Delhomme
> 
> Genome Biology Computational Support
> 
> European Molecular Biology Laboratory
> 
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
> 
> 
> 
> 
> 
> On Aug 16, 2012, at 5:53 AM, Surjyendu Ray wrote:
> 
> > Dear Sir,
> >              I was running an experiment with two samples and three replicates for each sample. Here are the commands I used:
> >
> > conditions = c( "HCV1", "HCV1", "HCV1", "HCM1", "HCM1", "HCM1" )
> > names(conditions) <- c( "accepted_hits_ATCACG.sorted.bam", "accepted_hits_CGATGT.sorted.bam", "accepted_hits_TTAGGC.sorted.bam", "accepted_hits_CAGATC.sorted.bam", "accepted_hits_ACTTGA.sorted.bam", "accepted_hits_GATCAG.sorted.bam" )
> >
> > and the easyRNASeq command:
> > FM_HCV1_HCM1.count.table <- easyRNASeq( filesDirectory = "/panfs/storage.local/genacc/home/sr09m/Biomedicine research/New Drosophila analysis/Tophat_temp", organism = "Dmelanogaster", chr.sizes = as.list( seqlengths( Dmelanogaster )), readLength = 76L, annotationMethod = "gff", annotationFile = "/panfs/storage.local/genacc/home/sr09m/Biomedicine research/New Drosophila analysis/dmel-all-r5.46.gff", format = "bam", count = "exons", filenames = c( "accepted_hits_ATCACG.sorted.bam", "accepted_hits_CGATGT.sorted.bam", "accepted_hits_TTAGGC.sorted.bam", "accepted_hits_CAGATC.sorted.bam", "accepted_hits_ACTTGA.sorted.bam", "accepted_hits_GATCAG.sorted.bam" ), normalize = TRUE, outputFormat = "edgeR", conditions = conditions )
> >
> > However, I got an error:
> > Error in match.arg(trend, c("none", "movingave", "tricube")) :
> >   'arg' must be NULL or a character vector
> >
> > The entire log of outputs is:
> > Checking arguments...
> > Fetching annotations...
> > Read 12568191 records
> > Summarizing counts...
> > Processing accepted_hits_ATCACG.sorted.bam
> > Processing accepted_hits_CGATGT.sorted.bam
> > Processing accepted_hits_TTAGGC.sorted.bam
> > Processing accepted_hits_CAGATC.sorted.bam
> > Processing accepted_hits_ACTTGA.sorted.bam
> > Processing accepted_hits_GATCAG.sorted.bam
> > Preparing output
> > Calculating library sizes from column totals.
> > Normalizing counts
> > Error in match.arg(trend, c("none", "movingave", "tricube")) :
> >   'arg' must be NULL or a character vector
> >
> >                Please advise as to what may be going wrong.
> >           Thanking you,
> >                                      Yours faithfully,
> >                                      Surjyendu Ray.
> 
> 
>