[BioC] gmapR problem using a GmapGenome package with gsnap()

Robert Castelo robert.castelo at upf.edu
Wed May 15 17:59:16 CEST 2013


hi,

using the gmapR package i built the GmapGenome package from the 
Bioconductor hg19 human genome package as follows:

library(gmapR)
library(BSgenome.Hsapiens.UCSC.hg19)
gmapGenomePath <- file.path("./HsapGenome")
gmapGenomeDirectory <- GmapGenomeDirectory(gmapGenomePath, create=TRUE)
gmapGenome <- GmapGenome(genome=Hsapiens, directory=gmapGenomeDirectory, 
name="hg19", create=TRUE)

makeGmapGenomePackage(gmapGenome=gmapGenome,
                       version="1.0.0",
                       maintainer="<robert.castelo at upf.edu>",
                       author="Robert Castelo",
                       destDir=".",
                       license="Artistic-2.0",
                       pkgName="GmapGenome.Hsapiens.UCSC.hg19")

afterwards, i bundled the package from the shell with:

$ R CMD build GmapGenome.Hsapiens.UCSC.hg19

and i install it with

$ R CMD INSTALL GmapGenome.Hsapiens.UCSC.hg19_1.0.0.tar.gz

when i try to use the package in a call to gsnap() it seems that there 
is no way to pass the *name* of the GmapGenome object to the 
command-line that works under the hoods, i.e.:

library(GmapGenome.Hsapiens.UCSC.hg19)

gsnapParam <- GsnapParam(genome=GmapGenome.Hsapiens.UCSC.hg19, 
unique_only=TRUE, nthreads=8)

gsnapOutput <- gsnap(input_a=fastqs[1], input_b=NULL, params=gsnapParam)
Error in .system_gsnap(commandLine("gsnap")) :
   Execution of gsnap failed, command-line: 
'/opt/R/R-3.0.0/lib64/R/aprilLibrary/gmapR/usr/bin/gsnap 
--db=GmapGenome.Hsapiens.UCSC.hg19 
--dir=/opt/R/R-3.0.0/lib64/R/aprilLibrary/GmapGenome.Hsapiens.UCSC.hg19/extdata 
--nthreads=8 --npaths=1 --quiet-if-excessive --nofails
--format=sam f.fastq > f.sam 2>&1'; last output line: ''

the error is not very explanatory, but basically the command-line 
argument "--db=GmapGenome.Hsapiens.UCSC.hg19" should have been 
"--db=hg19". if i proceed that way in command-line, then it works.

i've looked at the arguments of gsnap() and GsnapParam() but found no 
way to specify the genome object *name* that it would go to the "--db" 
parameter. i guess it should be a matter for the function GsnapParam() 
to fetch the prefix of the files in

list.files(system.file("extdata", package="GmapGenome.Hsapiens.UCSC.hg19"))
  [1] "hg19.chromosome"          "hg19.chromosome.iit" 
"hg19.chrsubset"           "hg19.contig"
  [5] "hg19.contig.iit"          "hg19.genomecomp" 
"hg19.ref12153gammaptrs"   "hg19.ref12153offsetscomp"
  [9] "hg19.ref12153positions"   "hg19.version"

and plug it into the command-line call.

by the way, not related but another but i incidentally found is then one 
tries to pass one *single* end read gzipped fastq file.

library(LungCancerLines)
fastqs <- LunCancerFastqFiles()

gsnapOutput <- gsnap(input_a=fastqs[1], input_b=NULL, params=gsnapParam)
Error in if (file_ext(input_a) == "gz" && file_ext(input_b) == "gz") { :
   missing value where TRUE/FALSE needed


cheers,
robert.
ps: sessionInfo()
R version 3.0.0 (2013-04-03)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C 
LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8
  [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8    LC_PAPER=C 
             LC_NAME=C
  [9] LC_ADDRESS=C              LC_TELEPHONE=C 
LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods 
   base

other attached packages:
  [1] GmapGenome.Hsapiens.UCSC.hg19_1.0.0 LungCancerLines_0.0.8 
       BSgenome.Hsapiens.UCSC.hg19_1.3.19
  [4] BSgenome_1.28.0                     gmapR_1.2.0 
       ShortRead_1.18.0
  [7] latticeExtra_0.6-24                 RColorBrewer_1.0-5 
       Rsamtools_1.12.3
[10] lattice_0.20-15                     Biostrings_2.28.0 
      GenomicRanges_1.12.3
[13] IRanges_1.18.1                      BiocGenerics_0.6.0 
      vimcom_0.9-8
[16] setwidth_1.0-3                      colorout_1.0-0

loaded via a namespace (and not attached):
  [1] AnnotationDbi_1.22.5    Biobase_2.20.0          biomaRt_2.16.0 
       bitops_1.0-5
  [5] DBI_0.2-7               GenomicFeatures_1.12.1  grid_3.0.0 
       hwriter_1.3
  [9] RCurl_1.95-4.1          RSQLite_0.11.3          rtracklayer_1.20.2 
      stats4_3.0.0
[13] tools_3.0.0             VariantAnnotation_1.6.5 XML_3.96-1.1 
      zlibbioc_1.6.0



More information about the Bioconductor mailing list