[BioC] QuasR: how to use an indexed reference genome?

Michael Stadler michael.stadler at fmi.ch
Fri May 17 08:41:25 CEST 2013


Dear Paul,

You are right, it is not possibile to use pre-built indices with QuasR,
such as those downloaded from the bowtie developers.

This is because QuasR needs to be able to go back to the original
sequences, for example when working with bisulfite converted reads,
which it does by accessing the fasta or BSgenome. Since we cannot
control the sequences that went into an externally provided index, it
would be more complicated to guarantee a consistent state (identical
sequences in BSgenome and index, but also identical sequence names and
order, etc.).

I do understand that building the index is tedious and will take a few
hours, but it has to be done only once, and from there on will be
available for futher QuasR analyses. This is still fast compared to the
CPU-time that goes into the analysis of most NGS datasets.

If more people would prefer to use pre-built indices, and if the BioC
agrees to host such big packages, it would also be conceivable to
provide index packages corresponding to the BSgenome packages for download.

Regards,
Michael



On 16.05.2013 22:48, Paul Shannon wrote:
> I am new to QuasR, and alos quite new to aligning short reads to reference genomes more generally.
> I cannot figure out how to use a pre-built indexed reference genome file with QuasR.   The examples supplied with the package work nicely.
> Scaling up to using all of hg19 raises problems for me.  I apologize if I am missing the obvious.
> 
> To illustrate the problem, I call QuasR's qAlign method with just two arguments (quoting from the man page):
> 
>     sampleFile:  a text file listing input sequence files and  sample names
> 
>     genome: the reference genome for primary alignments, one of:
> 
>             * a string referring to a "BSgenome" package (e.g.
>               ""BSgenome.Hsapiens.UCSC.hg19""), which will be
>               downloaded automatically from Bioconductor if not present
> 
>             * the name of a fasta sequence file containing one or
>               several sequences (chromosomes) to be used as a reference
> 
> QuasR apparently invokes the bowtie indexing program when supplied either of the two "genome" options:  a BSgenome package, or a fasta file.  But since indexing takes a long time -- hours, apparently --  I hoped to provide a ready-made index file, and found some described here:
> 
> 
>    http://bowtie-bio.sourceforge.net/tutorial.shtml
> 
> specifically 
> 
>    ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip
> 
> Various attempts to specify this file, or any of its contents (unzipped) to QuasR fail with these messages:
> 
> 
>    Error: The specified genome /Users/pshannon/s/data/public/bowtie/indexes/hg19.1.ebwt does not have the extension of a fasta file (fa,fasta,fna)> 
>    Error: The specified genome has to be a file and not a directory: /Users/pshannon/s/data/public/bowtie/indexes
> 
> 
> I'll be grateful for advice on how to do this properly.
> 
> Thanks,
> 
>  - Paul
> 
> 
>> sessionInfo()
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
>  [1] Rsamtools_1.13.14     BSgenome_1.29.0       Biostrings_2.29.2     QuasR_1.1.4           GenomicRanges_1.13.12 XVector_0.1.0        
>  [7] IRanges_1.19.8        BiocGenerics_0.7.2    Rbowtie_1.1.3         BiocInstaller_1.11.1 
> 
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.23.11  Biobase_2.21.2         DBI_0.2-7              GenomicFeatures_1.13.8 RCurl_1.95-4.1        
>  [6] RSQLite_0.11.3         ShortRead_1.19.3       XML_3.95-0.2           biomaRt_2.17.0         bitops_1.0-5          
> [11] compiler_3.0.0         grid_3.0.0             hwriter_1.3            lattice_0.20-15        rtracklayer_1.21.5    
> [16] stats4_3.0.0           tools_3.0.0            zlibbioc_1.7.0        
> 
> 
> 
>



More information about the Bioconductor mailing list