[BioC] QuasR: how to use an indexed reference genome?
Paul Shannon
pshannon at fhcrc.org
Thu May 16 22:48:47 CEST 2013
I am new to QuasR, and alos quite new to aligning short reads to reference genomes more generally.
I cannot figure out how to use a pre-built indexed reference genome file with QuasR. The examples supplied with the package work nicely.
Scaling up to using all of hg19 raises problems for me. I apologize if I am missing the obvious.
To illustrate the problem, I call QuasR's qAlign method with just two arguments (quoting from the man page):
sampleFile: a text file listing input sequence files and sample names
genome: the reference genome for primary alignments, one of:
* a string referring to a "BSgenome" package (e.g.
""BSgenome.Hsapiens.UCSC.hg19""), which will be
downloaded automatically from Bioconductor if not present
* the name of a fasta sequence file containing one or
several sequences (chromosomes) to be used as a reference
QuasR apparently invokes the bowtie indexing program when supplied either of the two "genome" options: a BSgenome package, or a fasta file. But since indexing takes a long time -- hours, apparently -- I hoped to provide a ready-made index file, and found some described here:
http://bowtie-bio.sourceforge.net/tutorial.shtml
specifically
ftp://ftp.ccb.jhu.edu/pub/data/bowtie_indexes/hg19.ebwt.zip
Various attempts to specify this file, or any of its contents (unzipped) to QuasR fail with these messages:
Error: The specified genome /Users/pshannon/s/data/public/bowtie/indexes/hg19.1.ebwt does not have the extension of a fasta file (fa,fasta,fna)>
Error: The specified genome has to be a file and not a directory: /Users/pshannon/s/data/public/bowtie/indexes
I'll be grateful for advice on how to do this properly.
Thanks,
- Paul
> sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsamtools_1.13.14 BSgenome_1.29.0 Biostrings_2.29.2 QuasR_1.1.4 GenomicRanges_1.13.12 XVector_0.1.0
[7] IRanges_1.19.8 BiocGenerics_0.7.2 Rbowtie_1.1.3 BiocInstaller_1.11.1
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.23.11 Biobase_2.21.2 DBI_0.2-7 GenomicFeatures_1.13.8 RCurl_1.95-4.1
[6] RSQLite_0.11.3 ShortRead_1.19.3 XML_3.95-0.2 biomaRt_2.17.0 bitops_1.0-5
[11] compiler_3.0.0 grid_3.0.0 hwriter_1.3 lattice_0.20-15 rtracklayer_1.21.5
[16] stats4_3.0.0 tools_3.0.0 zlibbioc_1.7.0
More information about the Bioconductor
mailing list