[BioC] Devel version of easyRNASeq: using the simpleRNASeq method gives an error

Thu Mar 27 15:40:13 CET 2014

Hej Sylvain!

I just brought it back to the list so that everyone benefits from the answer :-) ; see inline.

---------------------------------------------------------------
Nicolas Delhomme

Nathaniel Street Lab
Department of Plant Physiology
Umeå Plant Science Center

Tel: +46 90 786 5478
Email: nicolas.delhomme at plantphys.umu.se
SLU - Umeå universitet
Umeå S-901 87 Sweden
---------------------------------------------------------------

On 27 Mar 2014, at 15:24, Sylvain Foisy Ph. D. <sylvain.foisy at diploide.net> wrote:

> Hi Nicolas,
> 
> Thanks for the input ;-) I am running this:
> 
> [1] easyRNASeq_1.9.5

OK the latest is 1.9.7 but that’s not going to solve your problem yet… hopefully by the end of the week, I’ll have fixed the simpleRNASeq function.

> 
> I actually started to use the R-devel branch to try to see if I could overcome a problem I am having with the current version (1.8.6).

Actually 1.8.7 is the latest and fixes a bug that addresses your last question below.

> I realized that my first attempts at DE analysis were using tophat2 alignments unfiltered for badly mapped reads.

That’s a good idea since easyRNASeq was not paying attention to that previously apart from mentioning it in the vignette. The new function which you tried is meant to fix this among other things.

> I therefore used samtools to keep only the properly paired reads (-f 2) and tried to run easyRNASeq again on the new BAM files. Whenever I used “exons" or "transcripts" as counts, it would work ok but I would get lots (about half of my BAM files) of these:
> 
> 20: In fetchCoverage(rnaSeq, format = format, filename = filename,  ... :
>  You enforce UCSC chromosome conventions, however the provided alignments are not compliant. Correcting it.
> 21: In fetchCoverage(rnaSeq, format = format, filename = filename,  ... :
>  The read length stored in the object (probably provided as argument): 100 
> is not the same as the ones: 100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88, 87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75, 74, 73, 72, 71, 7
> 0, 69, 68, 67, 66, 65, 64, 63, 62, 61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50 determined from the file: /shares/new-data/htseq_datastore/RNASeq_dat
> aster/123456_CELLTYPE_accepted_hits_properly_paired.bam
> 
> For the UCSC warning, I do believe that it stands from the fact that I am using the genes.gtf file from Illumina's iGenome archive for the NCBI annotation source.

Yes. This is also a remnant of early coding, when RNA-Seq analyses were not so standardised and that’s also going to disappear in the new function.

> The second one leaves me clueless :-( 

Again, a remnant of the time when aligner could not deal with variable length reads. easyRNASeq used to check for the read size as a sanity check, which he does not anymore obviously, but the warning stayed. I have to admit that I simply got used to see that warning but could have removed it eons ago. It will be gone in the next function also.

> 
> if I tried using "genes" and summeraization=“geneModels”, i would get these but I am also unable to write the count data to file, complaining that the count.table object cannot be written to file using the write.table function…

Try 1.8.7 then. And instead of the genes / geneModels paradigm, have a look at the section 7.1 of the vignettes for a more accurate (IMO) and faster approach at looking for geneModels (what I now call "synthetic transcripts” rather than "gene models" as that last one was confusing because of its many meanings, e.g. if you think of a genic locus in an assembly project.

> All of these were not observed on the unfiltered data, which flags me as something that samtools did…

This looks more like some quality-based read trimming was done on the data rather than having anything to do with samtools. Anyway, using 1.8.7 should help getting your count.table written out properly. Let me know if not.

Cheers,

Nico

> 
> Any ideas?
> 
> Best regards
> 
> S
> 
> On Mar 27, 2014, at 10:01 AM, Nicolas Delhomme <delhomme at embl.de> wrote:
> 
>> Hej Sylvain!
>> 
>> Indeed that function is under very active development and I’ve broken it times and again :-)
>> 
>> I would suggest you to keep using the easyRNASeq function until the next Bioc release (planned in ~10 days). I’m rushing to get the simpleRNASeq in the next release as bug-free as possible and I expect it to be fairly unstable by then.
>> 
>> Anyway, can you tell me which easyRNASeq version you are using (run sessionInfo() or package.version(“easyRNASeq”) in your R session)? Just so that I make sure I’ve got that bug squashed already.
>> 
>> Thanks,
>> 
>> Nico
>> 
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>> 
>> Genome Biology Computational Support
>> 
>> European Molecular Biology Laboratory
>> 
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>> 
>> 
>> 
>> 
>> 
>> On 27 Mar 2014, at 14:40, Sylvain Foisy Ph. D. <sylvain.foisy at diploide.net> wrote:
>> 
>>> Hi to all,
>>> 
>>> I am trying to use the R-devel branch for easyRNASeq to get gene counts from filtered BAM files and I get the following error:
>>> 
>>>> exp<-simpleRNASeq(bamFiles = bfl, param = rParam, nnodes = 1, verbose = TRUE)
>>> Error in (function (classes, fdef, mtable)  : 
>>> unable to find an inherited method for function ‘simpleRNASeq’ for signature ‘"BamFileList”’
>>> 
>>> According to the help material, simpleRNASeq should need this:
>>> 
>>>   simpleRNASeq(bamFiles = BamFileList(),
>>>     param = RnaSeqParam(), nnodes = 1, verbose = FALSE)
>>> 
>>> When I check the nature of my bfl object, I get this:
>>> 
>>>> bfl
>>> BamFileList of length 103
>>> names(103): 123456_CELLTYPE_B_accepted_hits_properly_paired.bam …
>>> 
>>> So bfl is the right type… Any ideas or is this a bug? It’s the devel version after all ;-)
>>> 
>>> Best regards
>>> 
>>> S
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
>> 
>> Email secured by Check Point
>