[R] Character SNP data to binary MAF data

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Thu Jan 29 09:28:44 CET 2009


2009/1/29 Hadassa Brunschwig <hadassa.brunschwig at mail.huji.ac.il>:
> Hi
>
> An example is as follows. Consider the character 3x6 matrix:
>
> a A a T A t
> G g t T T t
> A a C C c c
>
> For each row I would like to identify the most frequent letter and
> assign a 1 to it and 0
> to the less frequent character. That is, in row 1 the most frequent
> letter is A (I do not differentiate between capital and non-capital
> letters), in row 2 T and in row 3 C. After the binary conversion
> the resulting matrix would look like that:
>
> 1 1 1 0 1 0
> 0 0 1 1 1 1
> 0 0 1 1 1 1

 What if there's a tie for most frequent? Do you want 1s for all the
most frequent characters? Or choose one randomly? Or zeroes?

 Examples: what do the following become:

 A A C C T G
 A A C C T T
 A A A A A A

Or are such cases not possible?

 Some hints for you to work on this yourself:

   help('table') - the table function works out counts of elements of vectors
   help('tolower') - for changing upper to lower case
   help('apply') - for working on rows of data frames

 then check out any basic R tutorial on subscripting and replacement,
and you may need to work out how to loop over things with 'for'. You
should be able to make a working solution in a dozen or so lines of R.
Don't be surprised if some R guru on here does it in 2 or 3 lines of
dense, obfuscated stuff!

Barry




More information about the R-help mailing list