[BioC] gmapR problem using a GmapGenome package with gsnap()
Robert Castelo
robert.castelo at upf.edu
Wed May 15 17:59:16 CEST 2013
hi,
using the gmapR package i built the GmapGenome package from the
Bioconductor hg19 human genome package as follows:
library(gmapR)
library(BSgenome.Hsapiens.UCSC.hg19)
gmapGenomePath <- file.path("./HsapGenome")
gmapGenomeDirectory <- GmapGenomeDirectory(gmapGenomePath, create=TRUE)
gmapGenome <- GmapGenome(genome=Hsapiens, directory=gmapGenomeDirectory,
name="hg19", create=TRUE)
makeGmapGenomePackage(gmapGenome=gmapGenome,
version="1.0.0",
maintainer="<robert.castelo at upf.edu>",
author="Robert Castelo",
destDir=".",
license="Artistic-2.0",
pkgName="GmapGenome.Hsapiens.UCSC.hg19")
afterwards, i bundled the package from the shell with:
$ R CMD build GmapGenome.Hsapiens.UCSC.hg19
and i install it with
$ R CMD INSTALL GmapGenome.Hsapiens.UCSC.hg19_1.0.0.tar.gz
when i try to use the package in a call to gsnap() it seems that there
is no way to pass the *name* of the GmapGenome object to the
command-line that works under the hoods, i.e.:
library(GmapGenome.Hsapiens.UCSC.hg19)
gsnapParam <- GsnapParam(genome=GmapGenome.Hsapiens.UCSC.hg19,
unique_only=TRUE, nthreads=8)
gsnapOutput <- gsnap(input_a=fastqs[1], input_b=NULL, params=gsnapParam)
Error in .system_gsnap(commandLine("gsnap")) :
Execution of gsnap failed, command-line:
'/opt/R/R-3.0.0/lib64/R/aprilLibrary/gmapR/usr/bin/gsnap
--db=GmapGenome.Hsapiens.UCSC.hg19
--dir=/opt/R/R-3.0.0/lib64/R/aprilLibrary/GmapGenome.Hsapiens.UCSC.hg19/extdata
--nthreads=8 --npaths=1 --quiet-if-excessive --nofails
--format=sam f.fastq > f.sam 2>&1'; last output line: ''
the error is not very explanatory, but basically the command-line
argument "--db=GmapGenome.Hsapiens.UCSC.hg19" should have been
"--db=hg19". if i proceed that way in command-line, then it works.
i've looked at the arguments of gsnap() and GsnapParam() but found no
way to specify the genome object *name* that it would go to the "--db"
parameter. i guess it should be a matter for the function GsnapParam()
to fetch the prefix of the files in
list.files(system.file("extdata", package="GmapGenome.Hsapiens.UCSC.hg19"))
[1] "hg19.chromosome" "hg19.chromosome.iit"
"hg19.chrsubset" "hg19.contig"
[5] "hg19.contig.iit" "hg19.genomecomp"
"hg19.ref12153gammaptrs" "hg19.ref12153offsetscomp"
[9] "hg19.ref12153positions" "hg19.version"
and plug it into the command-line call.
by the way, not related but another but i incidentally found is then one
tries to pass one *single* end read gzipped fastq file.
library(LungCancerLines)
fastqs <- LunCancerFastqFiles()
gsnapOutput <- gsnap(input_a=fastqs[1], input_b=NULL, params=gsnapParam)
Error in if (file_ext(input_a) == "gz" && file_ext(input_b) == "gz") { :
missing value where TRUE/FALSE needed
cheers,
robert.
ps: sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
[5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8 LC_PAPER=C
LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
base
other attached packages:
[1] GmapGenome.Hsapiens.UCSC.hg19_1.0.0 LungCancerLines_0.0.8
BSgenome.Hsapiens.UCSC.hg19_1.3.19
[4] BSgenome_1.28.0 gmapR_1.2.0
ShortRead_1.18.0
[7] latticeExtra_0.6-24 RColorBrewer_1.0-5
Rsamtools_1.12.3
[10] lattice_0.20-15 Biostrings_2.28.0
GenomicRanges_1.12.3
[13] IRanges_1.18.1 BiocGenerics_0.6.0
vimcom_0.9-8
[16] setwidth_1.0-3 colorout_1.0-0
loaded via a namespace (and not attached):
[1] AnnotationDbi_1.22.5 Biobase_2.20.0 biomaRt_2.16.0
bitops_1.0-5
[5] DBI_0.2-7 GenomicFeatures_1.12.1 grid_3.0.0
hwriter_1.3
[9] RCurl_1.95-4.1 RSQLite_0.11.3 rtracklayer_1.20.2
stats4_3.0.0
[13] tools_3.0.0 VariantAnnotation_1.6.5 XML_3.96-1.1
zlibbioc_1.6.0
More information about the Bioconductor
mailing list