[BioC] easyRNASeq Error in counting from C.elegans

Fri May 25 08:55:21 CEST 2012

Dear Yonggan,

Can you please rerun the command with validity.check=TRUE? That would tell us if there are inconsistencies between the chromosome names retrieved from biomaRt and the ones present in your bam file, which is what I suspect. Can you as well indicate you R and easyRNASeq version, i.e. pasting the output of the sessionInfo() command, once you have loaded easyRNASeq?

There has been similar question on the mailing list previously, please see the posts:
http://thread.gmane.org/gmane.science.biology.informatics.conductor/38983
http://thread.gmane.org/gmane.science.biology.informatics.conductor/39629/focus=39684
as they might be of some help to you.

In the development version, (http://bioconductor.org/packages/2.11/bioc/html/easyRNASeq.html, version 1.3.3), the vignette has been updated to contain these use cases. The vignette has been clarified and important points to process RNA-Seq data now stand out.

HTH,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

On May 24, 2012, at 7:15 PM, Yonggan Wu wrote:

> 
> Dear All and Nico,
> 
> Have anyone be able to do the counting from C.elegans? 
> 
> Here is what I tried but result in no luck:
> 1) bam was generate from tophat1.4.1, with ucsc ce6 as reference. For easyRNASeq counting, the *.rda was used as reference (see the code below)
> obj <- fetchAnnotation(new('RNAseq',
>                            organismName="Celegans"
>                            ),
>                        method="biomaRt")
> gAnnot <- genomicAnnotation(obj)
> save(gAnnot,file="ensemble_gAnnot_hsa.rda")
> Here is the error message:
> > rnaSeq=easyRNASeq(filesDirectory="/cluster/easyrnaseq",
> + #organism="Celegans",
> + organism="Celegans",
> + chr.sizes=chr.sizes,
> + readLength=50L,
> + annotationMethod="rda",
> + annotationFile="/cluster/database/gtf/cel/ensemble_gAnnot_cel.rda",
> + #annotationMethod="biomaRt",
> + format="bam",
> + outputFormat="RNAseq",
> + gapped=TRUE,#for tophat bam
> + normalize=TRUE,
> + filenames="ucsc.bam",
> + count="genes",
> + summarization="geneModels",
> + validity.check=FALSE,
> + nbCore=5
> + )
> Checking arguments... 
> Fetching annotations... 
> Computing gene models... 
> Summarizing counts... 
> Processing ucsc.bam 
> Error in aggregate.data.frame(as.data.frame(x), ...) : 
>   no rows to aggregate
> In addition: Warning messages:
> 1: In easyRNASeq(filesDirectory = "/cluster/easyrnaseq",  :
>   There are 8526 synthetic exons as determined from your annotation that overlap! This implies that some reads will be counted more than once! Is that really what you want?
> 2: In min(match(uniqueClasses, classOrder)) :
>   no non-missing arguments to min; returning Inf
> 3: In SimpleAtomicList(result) : NAs introduced by coercion
> 2) I tried the same code with human bam file (except the species was changed to human), it works.
> 3) I also re-did the alignment with ensemble genome references, but still failed while counting.
> 4) The annotation files was changed to "gtf", and the gtf was download from ensemble. again, it is not working.
> 5) The organism was changed to "custom", still not working.
> 
> All hints on this issue were very much appreciated.
> 
> -- 
> All The Best,
> Yonggan
> ------------------------------------------------------------------------------------------
> Yonggan Wu, M.S.
> Bioinformatic Scientist
> Ocean Ridge Biosciences LLC
> 10475 Riverside Drive Suite 1
> Palm Beach Gardens, FL 33410
> Phone : 561-223-3152
> Fax : 561-740-8710
> yongganw at oceanridgebio.com
> http://www.oceanridgebio.com
>