[BioC] Filtering BAM files by start position for VariantTools
Valerie Obenchain
vobencha at fhcrc.org
Thu Jul 11 19:22:12 CEST 2013
Sorry, the file here should be the filtered 'dest', not 'fl':
> tally <- tallyVariants(fl, param)
>
>
Valerie
> Valerie
>
>
> On 07/09/2013 02:06 PM, Taylor, Sean D wrote:
>> I am trying to read a specific set of records from a bam file for use
>> in the VariantTools package. I'm trying to construct a which argument
>> (a GRanges object) that will pull in a set of records from all reads
>> that only start at a specified position. (i.e. all reads that start at
>> position 100). So far I have only been able to specify reads that
>> overlap position 100, but have not been able to find a way to define
>> the start site.
>>
>> #Example code:
>>> bams <- LungCancerLines::LungCancerBamFiles()
>>> bam <- bams$H1993
>>> which<-GRanges(seqnames=c('TP53'), IRanges(1110426, width=1), '+')
>>> tally.param <- VariantTallyParam(gmapR::TP53Genome(),
>> + readlen = 100L,
>> + high_base_quality = 23L,
>> + which = which)
>>> raw.variants <- tallyVariants(bam, tally.param)
>>
>> This code shows all the variants at position 1110426, but not all the
>> variants from the reads that start at position 1110426.
>>
>> Ultimately, I am trying to do this for all start positions in my data
>> set, so I would want something that looks like this pseudocode:
>>> raw.variants<-lapply (start(bam), function (x){
>> which<-GRanges(seqnames=c('chrM'), '+', start=x)
>> tally.param<-VariantTallyParam(gmap, readlen=100L, which=which)
>> tallyVariants(bamfile, tally.param)
>> })
>>
>> Thanks,
>> Sean
>>
>>> sessionInfo()
>> R version 3.0.1 (2013-05-16)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> LC_TIME=en_US.UTF-8
>> [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
>> LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C
>> [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] grid parallel stats graphics grDevices utils
>> datasets methods base
>>
>> other attached packages:
>> [1] RSQLite_0.11.4 DBI_0.2-7 BiocInstaller_1.10.2
>> [4] LungCancerLines_0.0.8 GenomicFeatures_1.12.1
>> AnnotationDbi_1.22.5
>> [7] Biobase_2.20.0 gmapR_1.2.0 latticeExtra_0.6-24
>> [10] lattice_0.20-15 RColorBrewer_1.0-5 genoPlotR_0.8
>> [13] ade4_1.5-2 VariantTools_1.2.2
>> VariantAnnotation_1.6.6
>> [16] Rsamtools_1.12.3 Biostrings_2.28.0 GenomicRanges_1.12.4
>> [19] IRanges_1.18.1 BiocGenerics_0.6.0
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.16.0 bitops_1.0-5
>> [3] BSgenome_1.28.0
>> BSgenome.Hsapiens.UCSC.hg19_1.3.19
>> [5] graph_1.38.2 Matrix_1.0-12
>> [7] org.Hs.eg.db_2.9.0 RBGL_1.36.2
>> [9] RCurl_1.95-4.1 rtracklayer_1.20.2
>> [11] stats4_3.0.1 tools_3.0.1
>> [13] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2 XML_3.96-1.1
>> [15] zlibbioc_1.6.0
>>
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list