[Bioc-devel] BamTallyParam argument 'which'

Michael Lawrence lawrence.michael at gene.com
Sat Feb 21 02:48:28 CET 2015


Hmm. I guess we could do at least two things:

1) Return the ID of the which range for each variant, like readVcf does
with its paramRangeID column in the rowData.

2) Do as you suggest, and reduce() the which.

Obviously, these address different use cases. The user can always achieve
#2 by just reduce()ing the which range first. #1 would be more difficult
for the user to implement.

If tallyVariants() did #2, then #1 would be painful to achieve, so if #1
sounds useful at all, then we might want to hold off on #2.

Thoughts?

Michael




On Fri, Feb 20, 2015 at 2:00 PM, Thomas Sandmann <sandmann.thomas at gene.com>
wrote:

> Hi Michael,
>
> I noticed that when the tallyVariants function receives a 'which'
> arguments (via BamTallyParam), that contains overlapping or duplicated
> regions, duplicated rows are returned.
>
> (See below for an example.)
>
> It took me a little while to understand where I was picking duplicates.
>
> Would it be useful to 'reduce' the 'which' GRanges/RangesList object by
> default, e.g. before tallying variants, to make sure each base is only
> tallied once ?
>
> Best,
> Thomas
>
> library(VariantTools)
>
> ## 'which' is a set of non-overlapping regions
> tally.param <- TallyVariantsParam(gmapR::TP53Genome(),
>                                   high_base_quality = 23L,
>                                   which = gmapR::TP53Which())
> bams <- LungCancerLines::LungCancerBamFiles()
> raw.variants <- tallyVariants(bams$H1993, tally.param)
> any(duplicated( raw.variants ))  ## FALSE
>
> ## 'which' is a set of duplicated regions
> tally.param <- TallyVariantsParam(
>   gmapR::TP53Genome(),
>   high_base_quality = 23L,
>   which = c(
>     gmapR::TP53Which(),
>     gmapR::TP53Which()
>     )
> )
> raw.variants <- tallyVariants(bams$H1993, tally.param)
> any(duplicated( raw.variants )) ## TRUE
>
> sort(raw.variants)[1:4]
>
>
> ### SessionInfo()
>
> R version 3.1.2 (2014-10-31)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats4    stats     graphics  grDevices
> [6] utils     datasets  methods   base
>
> other attached packages:
>  [1] VariantTools_1.8.0       VariantAnnotation_1.12.9
>  [3] Rsamtools_1.18.2         Biostrings_2.34.1
>  [5] XVector_0.6.0            GenomicRanges_1.18.4
>  [7] GenomeInfoDb_1.2.4       IRanges_2.0.1
>  [9] S4Vectors_0.4.0          BiocGenerics_0.12.1
> [11] BiocInstaller_1.16.1     roxygen2_4.1.0
> [13] devtools_1.7.0
>
> loaded via a namespace (and not attached):
>  [1] AnnotationDbi_1.28.1    base64enc_0.1-2
>  [3] BatchJobs_1.5           BBmisc_1.9
>  [5] Biobase_2.26.0          BiocParallel_1.0.3
>  [7] biomaRt_2.22.0          bitops_1.0-6
>  [9] brew_1.0-6              BSgenome_1.34.1
> [11] checkmate_1.5.1         codetools_0.2-10
> [13] DBI_0.3.1               digest_0.6.8
> [15] fail_1.2                foreach_1.4.2
> [17] GenomicAlignments_1.2.1 GenomicFeatures_1.18.3
> [19] gmapR_1.8.0             grid_3.1.2
> [21] iterators_1.0.7         lattice_0.20-29
> [23] LungCancerLines_0.3.1   Matrix_1.1-5
> [25] Rcpp_0.11.4             RCurl_1.95-4.5
> [27] RSQLite_1.0.0           rtracklayer_1.26.2
> [29] sendmailR_1.2-1         stringr_0.6.2
> [31] tools_3.1.2             XML_3.98-1.1
> [33] zlibbioc_1.12.0
>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list