[R] Character SNP data to binary MAF data
Patrick Aboyoun
paboyoun at fhcrc.org
Thu Jan 29 09:41:09 CET 2009
Hadassa,
You may want to check out the snpMatrix package in Bioconductor
http://bioconductor.org/packages/2.3/bioc/html/snpMatrix.html
http://bioconductor.org/packages/2.4/bioc/html/snpMatrix.html
It contains classes that manage this type of information and should
minimize your coding effort.
Patrick
Quoting Thomas Lumley <tlumley at u.washington.edu>:
>
> The first step is to convert your data to all uppercase with toupper().
>
> Then it depends on how tidy the data are: are there missing data, are
> some SNPs monomorphic in your sample, etc.
>
> If there are no missing data you can use
>
> N<-ncol(the_data)
> halfN <- N/2
>
> maf_one_row <-function(arow) {
> rval<-numeric(N)
> if (sum(i<-arow=="A")>halfN) {
> rval[]<-1
> } else if (sum(i<-arow=="C")>halfN){
> rval[i]<-1
> } else if (sum(i<-arow=="T"))>halfN){
> rval[i]<-1
> } else if (sum(i<-arow=="G")>halfN){
> rval[i]<-1
> }
> rval
> }
>
> apply(the_data, 1, maf_one_row)
>
> YOu could also use table() to find the two alleles, but you have to
> make sure that the code still works when there is only one allele.
>
> -thomas
>
> On Thu, 29 Jan 2009, Hadassa Brunschwig wrote:
>
>> Hi
>>
>> An example is as follows. Consider the character 3x6 matrix:
>>
>> a A a T A t
>> G g t T T t
>> A a C C c c
>>
>> For each row I would like to identify the most frequent letter and
>> assign a 1 to it and 0
>> to the less frequent character. That is, in row 1 the most frequent
>> letter is A (I do not differentiate between capital and non-capital
>> letters), in row 2 T and in row 3 C. After the binary conversion
>> the resulting matrix would look like that:
>>
>> 1 1 1 0 1 0
>> 0 0 1 1 1 1
>> 0 0 1 1 1 1
>>
>> Any suggestions on how to do that (and I am sure I am not the first
>> one to try this).
>>
>> Thanks
>> Hadassa
>>
>>
>> On Thu, Jan 29, 2009 at 1:50 AM, Jorge Ivan Velez
>> <jorgeivanvelez at gmail.com> wrote:
>>>
>>> Hi Hadassa,
>>> Do you have a sample of your data and the output you want? It might be
>>> useful for us in order to provide any help to you.
>>> Regards,
>>>
>>> Jorge
>>>
>>>
>>> On Wed, Jan 28, 2009 at 8:36 AM, Hadassa Brunschwig
>>> <hadassa.brunschwig at mail.huji.ac.il> wrote:
>>>>
>>>> Hi
>>>>
>>>> I am sure there is a function out there already but I couldn't find it.
>>>> I have SNP data, that is, a matrix which contains in each row two
>>>> characters (they are different in each row) and I would like to
>>>> convert this matrix to a binary one according to the minor allele
>>>> frequency. For non-geneticists: I want to have a binary matrix
>>>> for which in each row the 0 stands for the less frequent character
>>>> and 1 for the more frequent character.
>>>>
>>>> Thanks for any suggestions.
>>>> Hadassa
>>>>
>>>> --
>>>> Hadassa Brunschwig
>>>> PhD Student
>>>> Department of Statistics
>>>> The Hebrew University of Jerusalem
>>>> http://www.stat.huji.ac.il
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>> --
>> Hadassa Brunschwig
>> PhD Student
>> Department of Statistics
>> The Hebrew University of Jerusalem
>> http://www.stat.huji.ac.il
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Thomas Lumley Assoc. Professor, Biostatistics
> tlumley at u.washington.edu University of Washington, Seattle
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list