[Bioc-sig-seq] Rsamtools -getting allele frequencies

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 4 13:56:56 CET 2010


On 11/03/2010 10:32 PM, Paul Leo wrote:
> Hi
> 
> What I want to do is calculate the minor allele frequency of a SNP/
> small indel from  bam files 
> 
> 
> I am not sure what is the PREFERRED way to do this
>  
> I have been running samtools and just do pileup in those regions of
> interest... and read the pileup file with Rsamtools...
> 
> 
> With scanBam on a single position I get back all the reads that span
> that location. So I've tried using the start position and the cigar and
> the DNAStringSet in the seq slot to get the alleles at the position I
> want (works ok)
> 
> Is there a function written that already does that? seems like something
> of general utility , the only complication is the cigar

The GenomicRanges::GappedAlignments class has some facilities for
manipulating cigars; the idea would be to refine the queries with
GappedAlignments, and then to separately extract the relevant sequence;
this might be no better than what you are doing currently.

> 
> Would also be nice to have a existing function with writes the  output
> of scanBam back into sam format (assuming necessary attributes are
> available).

noted.

> 
> ALSO perhaps I am being daft:
> but in scanBamFlag
> 
> isValidVendorRead: A logical(1) indicating whether invalid (FALSE),
>           valid (TRUE), or any (NA) read should be returned. A 'valid'
>           read is one flagged by the vendor as passing quality control
>           criteria.
> 
> so isValidVendorRead=TRUE would return VALID vendor reads?
> at the moment it is returning the INVALID vendor reads
> isValidVendorRead=FALSE returns the valid ones. (Solid reads)

yes, the logic here is reversed. Fixed in devel, in  1.2.1  (release) /
1.3.4 (devel) when they appear.

Thanks,

Martin

> 
> Thanks
> Paul
> 
>> sessionInfo()
> R version 2.13.0 Under development (unstable) (2010-11-03 r53517)
> Platform: x86_64-unknown-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
> LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
> LC_MONETARY=C             
>  [6] LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C
> LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C       
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base     
> 
> other attached packages:
> [1] ShortRead_1.9.2       lattice_0.19-13       Biobase_2.11.2
> Rsamtools_1.3.3       Biostrings_2.19.0     GenomicFeatures_1.3.5
> [7] GenomicRanges_1.3.2   IRanges_1.9.6        
> 
> loaded via a namespace (and not attached):
>  [1] biomaRt_2.7.0      BSgenome_1.19.0    DBI_0.2-5
> grid_2.13.0        hwriter_1.2        RCurl_1.4-4
> RSQLite_0.9-2     
>  [8] rtracklayer_1.11.3 tcltk_2.13.0       tools_2.13.0
> XML_3.2-0         
>>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list