[BioC] Please help: SNP Frequency Analysis Package in Bioconductor

Sean Davis sdavis2 at mail.nih.gov
Fri Mar 23 12:11:06 CET 2012


On Thu, Mar 22, 2012 at 10:39 PM, Vincent Carey
<stvjc at channing.harvard.edu> wrote:
> On Thu, Mar 22, 2012 at 5:36 PM, Javerjung Sandhu <jsandhu at bcgsc.ca> wrote:
>
>> Dear Sir or Madam,
>>
>> I am interested in using Bioconductor to analyze and compare some data
>> I’ve recently acquired and was wondering if you could help direct me to an
>> appropriate analysis package.
>>
>>
> I have two sets of data, from two separate groups of AML patients.  One set
>> is of 96 patients, the other set is of 178 patients, and I have sequencing
>> data for both groups.  For a few genes of interest the first group’s
>> dataset, it looked like there were a number of SNPs that were found with a
>> higher than normal frequency among the 96 patients. What I would like to do
>> is find out if the frequency of these SNPs is the same in the dataset of
>> 178 patients.  I just need a way of analyzing the frequency of SNPs in a
>> large set of sequences.
>>
>
> This question is not very clear.  If you have already done SNP calling with
> your sequences (using samtools or some other external resource) I suppose
> it would be typical to have results in VCF format.  This can be parsed
> using VariantAnnotation package, and you can tabulate variants and do some
> kind of categorical analysis to compare populations downstream. Some of the
> relevant computations for variant tabulation on the basis of VCF are given
> in the cgdv17 experimental data package vignette.  There are no high level
> functions that I know of for doing population comparisons, so you should
> involve an experienced statistical geneticist if possible.

I will second Vince's comment about involving a statistical
geneticist; it is really easy to do these types of analyses
incorrectly.  As for "how to do it" in R, you could take a look at:

http://www.genabel.org/tutorials/ABEL-tutorial

Outside R, you might take a look at plink.  Samtools also has
rudimentary association testing capabilities.

Sean

> If you have not done SNP calling, and your sequence data are in SAM or BAM
> format, you could use the pileup manipulations of Rsamtools to enumerate
> and tabulate variants.  Examples of such manipulations in the domain of
> transcript variants can be found in the vignette of the ggtut experimental
> data package.
>
>
>> Thanks very much,
>> Jung
>>
>>
>>
>>        [[alternative HTML version deleted]]
>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list