[Bioc-devel] BamTallyParam argument 'which'

Thomas Sandmann sandmann.thomas at gene.com
Fri Feb 20 23:00:37 CET 2015


Hi Michael,

I noticed that when the tallyVariants function receives a 'which' arguments
(via BamTallyParam), that contains overlapping or duplicated regions,
duplicated rows are returned.

(See below for an example.)

It took me a little while to understand where I was picking duplicates.

Would it be useful to 'reduce' the 'which' GRanges/RangesList object by
default, e.g. before tallying variants, to make sure each base is only
tallied once ?

Best,
Thomas

library(VariantTools)

## 'which' is a set of non-overlapping regions
tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
                                  high_base_quality = 23L,
                                  which = gmapR::TP53Which())
bams <- LungCancerLines::LungCancerBamFiles()
raw.variants <- tallyVariants(bams$H1993, tally.param)
any(duplicated( raw.variants ))  ## FALSE

## 'which' is a set of duplicated regions
tally.param <- TallyVariantsParam(
  gmapR::TP53Genome(),
  high_base_quality = 23L,
  which = c(
    gmapR::TP53Which(),
    gmapR::TP53Which()
    )
)
raw.variants <- tallyVariants(bams$H1993, tally.param)
any(duplicated( raw.variants )) ## TRUE

sort(raw.variants)[1:4]


### SessionInfo()

R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices
[6] utils     datasets  methods   base

other attached packages:
 [1] VariantTools_1.8.0       VariantAnnotation_1.12.9
 [3] Rsamtools_1.18.2         Biostrings_2.34.1
 [5] XVector_0.6.0            GenomicRanges_1.18.4
 [7] GenomeInfoDb_1.2.4       IRanges_2.0.1
 [9] S4Vectors_0.4.0          BiocGenerics_0.12.1
[11] BiocInstaller_1.16.1     roxygen2_4.1.0
[13] devtools_1.7.0

loaded via a namespace (and not attached):
 [1] AnnotationDbi_1.28.1    base64enc_0.1-2
 [3] BatchJobs_1.5           BBmisc_1.9
 [5] Biobase_2.26.0          BiocParallel_1.0.3
 [7] biomaRt_2.22.0          bitops_1.0-6
 [9] brew_1.0-6              BSgenome_1.34.1
[11] checkmate_1.5.1         codetools_0.2-10
[13] DBI_0.3.1               digest_0.6.8
[15] fail_1.2                foreach_1.4.2
[17] GenomicAlignments_1.2.1 GenomicFeatures_1.18.3
[19] gmapR_1.8.0             grid_3.1.2
[21] iterators_1.0.7         lattice_0.20-29
[23] LungCancerLines_0.3.1   Matrix_1.1-5
[25] Rcpp_0.11.4             RCurl_1.95-4.5
[27] RSQLite_1.0.0           rtracklayer_1.26.2
[29] sendmailR_1.2-1         stringr_0.6.2
[31] tools_3.1.2             XML_3.98-1.1
[33] zlibbioc_1.12.0

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list