[BioC] Newcommers question on subsetting IRangesList
Tomas Bjorklund [guest]
guest at bioconductor.org
Mon May 12 19:32:47 CEST 2014
Hi,
I'm new to R and bioconductor so this is probably a trivial question, but I cannot find a solution for this anywhere.
In my workflow, I now utilize a temporary version of of vmatchPattern (found on the net) that allows for indels. This works great, but outputs an IRangesList object that I have issues with when I try to subset it. Here is an example of the output:
IRangesList of length 96979
[[1]]
IRanges of length 2
start end width
[1] 1 7 7
[2] 278 283 6
[[2]]
IRanges of length 2
start end width
[1] 1 7 7
[2] 281 286 6
[[3]]
IRanges of length 2
start end width
[1] 1 7 7
[2] 256 261 6
...
<96976 more elements>
In this case, the same sequence is found twice in each read. What I would like to extract is the "end" of each first occurrence of the string i.e., 7 in the cases above.
say that matchList is the IRangesList object if I use end(matchList) I get a list with both the end of the first and the second occurrence of the string. With every way I try to subset it I get errors. I can get it to work through using as.data.frame but this is very slow when you have millions of matches as in my cases.
I hope that this was reasonably clear.
Thank you all for your help
All the best
Tomas
-- output of sessionInfo():
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin10.8.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] xlsx_0.5.5 muscle_3.8.31-2 Rlibstree_0.3-2 xlsxjars_0.6.0 rJava_0.9-6 ShortRead_1.22.0
[7] GenomicAlignments_1.0.1 BSgenome_1.32.0 Rsamtools_1.16.0 GenomicRanges_1.16.3 GenomeInfoDb_1.0.2 Biostrings_2.32.0
[13] XVector_0.4.0 IRanges_1.22.6 BiocParallel_0.6.0 BiocGenerics_0.10.0
loaded via a namespace (and not attached):
[1] BatchJobs_1.2 BBmisc_1.6 Biobase_2.24.0 bitops_1.0-6 brew_1.0-6 codetools_0.2-8 DBI_0.2-7 digest_0.6.4
[9] fail_1.2 foreach_1.4.2 grid_3.1.0 hwriter_1.3 iterators_1.0.7 lattice_0.20-29 latticeExtra_0.6-26 plyr_1.8.1
[17] RColorBrewer_1.0-5 Rcpp_0.11.1 RSQLite_0.11.4 sendmailR_1.1-2 stats4_3.1.0 stringr_0.6.2 tools_3.1.0 zlibbioc_1.10.0
--
Sent via the guest posting facility at bioconductor.org.
More information about the Bioconductor
mailing list