[R] Counting occurances of a letter by a factor
Brian Diggs
diggsb at ohsu.edu
Fri Sep 10 22:19:06 CEST 2010
On 9/10/2010 12:40 PM, Davis, Brian wrote:
> I'm trying to find a more elegant way of doing this. What I'm trying
> to accomplish is to count the frequency of letters (major / minor
> alleles) in a string grouped by the factor levels in another column
> of my data frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
> X Y
> 1 CC L
> 2 CC U
> 3<NA> L
> 4 CG U
> 5 GG L
> 6 GC<NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
> C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but
> haven't found a combination that gives a more general solution to
> this problem.
>
> Any ideas?
>
> Brian
You are almost there. The "plyr" package gets you the rest of the way.
You already have something that will, for a group of cases with the
same "Y" value, tabulate the "X" values the way you want. ddply will
split the dataframe up by "Y" values and run that on each part.
library("plyr")
tab <- ddply(DF, .(Y),
function(x) {table(unlist(strsplit(as.character(x$X),"")))})
tab
# Y C G
#1 L 2 2
#2 U 3 1
#3 <NA> 1 1
It is almost what you asked for. If you really want it as a matrix with
named rows:
tab2 <- as.matrix(tab[,-1])
rownames(tab2) <- tab[,1]
It still has an entry for the NA value of "Y", but that can be filtered
as whatever step you like.
--
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University
More information about the R-help
mailing list