[R] Character SNP data to binary MAF data
Barry Rowlingson
b.rowlingson at lancaster.ac.uk
Thu Jan 29 09:28:44 CET 2009
2009/1/29 Hadassa Brunschwig <hadassa.brunschwig at mail.huji.ac.il>:
> Hi
>
> An example is as follows. Consider the character 3x6 matrix:
>
> a A a T A t
> G g t T T t
> A a C C c c
>
> For each row I would like to identify the most frequent letter and
> assign a 1 to it and 0
> to the less frequent character. That is, in row 1 the most frequent
> letter is A (I do not differentiate between capital and non-capital
> letters), in row 2 T and in row 3 C. After the binary conversion
> the resulting matrix would look like that:
>
> 1 1 1 0 1 0
> 0 0 1 1 1 1
> 0 0 1 1 1 1
What if there's a tie for most frequent? Do you want 1s for all the
most frequent characters? Or choose one randomly? Or zeroes?
Examples: what do the following become:
A A C C T G
A A C C T T
A A A A A A
Or are such cases not possible?
Some hints for you to work on this yourself:
help('table') - the table function works out counts of elements of vectors
help('tolower') - for changing upper to lower case
help('apply') - for working on rows of data frames
then check out any basic R tutorial on subscripting and replacement,
and you may need to work out how to loop over things with 'for'. You
should be able to make a working solution in a dozen or so lines of R.
Don't be surprised if some R guru on here does it in 2 or 3 lines of
dense, obfuscated stuff!
Barry
More information about the R-help
mailing list