On Thu, Nov 4, 2010 at 1:32 AM, Paul Leo <p.leo@uq.edu.au> wrote:

> Hi
>
> What I want to do is calculate the minor allele frequency of a SNP/
> small indel from  bam files
>
>
> I am not sure what is the PREFERRED way to do this
>
>
samtools mpileup

If followed by bcftools, it produces VCF4 files and includes estimates of
allele frequencies from a set of bam files.  Be sure that you are using the
latest version of samtools (or even svn checkout, which includes a
per-sample depth and strand-bias estimate).

Sean


> I have been running samtools and just do pileup in those regions of
> interest... and read the pileup file with Rsamtools...
>
>
> With scanBam on a single position I get back all the reads that span
> that location. So I've tried using the start position and the cigar and
> the DNAStringSet in the seq slot to get the alleles at the position I
> want (works ok)
>
> Is there a function written that already does that? seems like something
> of general utility , the only complication is the cigar
>
> Would also be nice to have a existing function with writes the  output
> of scanBam back into sam format (assuming necessary attributes are
> available).
>
> ALSO perhaps I am being daft:
> but in scanBamFlag
>
> isValidVendorRead: A logical(1) indicating whether invalid (FALSE),
>          valid (TRUE), or any (NA) read should be returned. A 'valid'
>          read is one flagged by the vendor as passing quality control
>          criteria.
>
> so isValidVendorRead=TRUE would return VALID vendor reads?
> at the moment it is returning the INVALID vendor reads
> isValidVendorRead=FALSE returns the valid ones. (Solid reads)
>
> Thanks
> Paul
>
> > sessionInfo()
> R version 2.13.0 Under development (unstable) (2010-11-03 r53517)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
> LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8
> LC_MONETARY=C
>  [6] LC_MESSAGES=en_AU.UTF-8    LC_PAPER=en_AU.UTF-8       LC_NAME=C
> LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] ShortRead_1.9.2       lattice_0.19-13       Biobase_2.11.2
> Rsamtools_1.3.3       Biostrings_2.19.0     GenomicFeatures_1.3.5
> [7] GenomicRanges_1.3.2   IRanges_1.9.6
>
> loaded via a namespace (and not attached):
>  [1] biomaRt_2.7.0      BSgenome_1.19.0    DBI_0.2-5
> grid_2.13.0        hwriter_1.2        RCurl_1.4-4
> RSQLite_0.9-2
>  [8] rtracklayer_1.11.3 tcltk_2.13.0       tools_2.13.0
> XML_3.2-0
> >
>
>
>
>
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing@r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

	[[alternative HTML version deleted]]

