[BioC] Strange error using the \"narrow\" function for a ShortReadQ object

Tomas Bjorklund [guest] guest at bioconductor.org
Tue Apr 29 19:14:58 CEST 2014


I have an issue where I try to do something rather simple but run into a strange issue. 

The application is rather simple. I have a "Barcode" sequence in my amplicon sequences using Illumina MiSeq, that follows a certain pattern.

What I want to achieve is to identify the correct reads containing a barcode and then crop the reads to cover only the barcode and then save it back to a FASTQ file retaining quality and ID information of each barcode. 

Everything works well in this code except for the narrow command.
 

Here is my code:

rawReads <- readFastq("ChAT12-13_GGACTCCT-TATCCTCT_L001_R1_001_50k.fastq")     
rawReads <- rawReads[nFilter()(rawReads)] #Removes reads containing N's
rawReads <- rawReads[width(rawReads) > 55] #drops reads shorter than 55 nt

refBC <- DNAString("TCAGCVHDBVHDBVHDBVHDBVHDBAATTA")
hits <- vcountPattern(refBC, sread(rawReads), max.mismatch=0,fixed=FALSE, with.indels=FALSE)
selected <- rawReads[as.logical(hits)]

trimList <- vmatchPattern(refBC, sread(selected), max.mismatch=0,fixed=FALSE)

selected <- narrow(selected, start = unlist(startIndex(trimList)), end = unlist(endIndex(trimList)))


>From the last row I get the message:

> selected <- narrow(selected, start = unlist(startIndex(trimList)), end = unlist(endIndex(trimList)))
Error in solveUserSEW(width(x), start = start, end = end, width = width) : 
  'start', 'end' or 'width' is longer than 'refwidths'

I cannot figure out how this is possible. The “endIndex" should never be possible to be larger then the length of the sequence that is analyzed in the “vmatchPattern", correct? Am I missing something fundamental?


 -- output of sessionInfo(): 

R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] ShortRead_1.22.0        GenomicAlignments_1.0.0 BSgenome_1.32.0         Rsamtools_1.16.0        GenomicRanges_1.16.2   
 [6] GenomeInfoDb_1.0.2      Biostrings_2.32.0       XVector_0.4.0           IRanges_1.22.3          BiocParallel_0.6.0     
[11] BiocGenerics_0.10.0    

loaded via a namespace (and not attached):
 [1] BatchJobs_1.2       BBmisc_1.6          Biobase_2.24.0      bitops_1.0-6        brew_1.0-6          codetools_0.2-8    
 [7] DBI_0.2-7           digest_0.6.4        fail_1.2            foreach_1.4.2       grid_3.1.0          hwriter_1.3        
[13] iterators_1.0.7     lattice_0.20-29     latticeExtra_0.6-26 plyr_1.8.1          RColorBrewer_1.0-5  Rcpp_0.11.1        
[19] RSQLite_0.11.4      sendmailR_1.1-2     stats4_3.1.0        stringr_0.6.2       tools_3.1.0         zlibbioc_1.10.0   

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list