[R] Counting occurances of a letter by a factor
Davis, Brian
Brian.Davis at uth.tmc.edu
Fri Sep 10 22:11:32 CEST 2010
I'm my quest for brevity I think I scarified too much clarity.
I'll try to be a little less brief in the hopes of being more clear.
Say I have data frame like this as before:
> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> colnames(DF)<-c("X", "Y")
> DF
X Y
1 CC L
2 CC U
3 <NA> L
4 CG U
5 GG L
6 GC <NA>
I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y
So for DF$Y == "L" there are 2 "C"'s and 2 "G"'s
and for DF$Y == "U" there are 3 "C"'s and 1 "G"
The NA's should not contribute to the counts.
If I had a individual character in DF$X instead of a string like:
> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
> colnames(DF2)<-c("X", "Y")
> DF2
X Y
1 C L
2 C U
3 <NA> L
4 C U
5 G L
6 G <NA>
Then table gives me exactly what I need.
> table(DF2)
Y
X L U
C 1 2
G 1 0
Hopefully this is a little bit clearer what I'm trying to accomplish.
Brian
-----Original Message-----
From: Phil Spector [mailto:spector at stat.berkeley.edu]
Sent: Friday, September 10, 2010 2:52 PM
To: Davis, Brian
Subject: Re: [R] Counting occurances of a letter by a factor
Brian -
Here's the only thing I can come up with to give the
same result as your "ans", but it doesn't seem to correspond
with your description of the problem.
> DF1 = DF
> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
> DF2 = DF
> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
> newDF = rbind(DF1,DF2)
> table(newDF$Y,newDF$X)
C G
L 2 2
U 3 1
- Phil Spector
Statistical Computing Facility
Department of Statistics
UC Berkeley
spector at stat.berkeley.edu
On Fri, 10 Sep 2010, Davis, Brian wrote:
> I'm trying to find a more elegant way of doing this. What I'm trying to accomplish is to count the frequency of letters (major / minor alleles) in a string grouped by the factor levels in another column of my data frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
> X Y
> 1 CC L
> 2 CC U
> 3 <NA> L
> 4 CG U
> 5 GG L
> 6 GC <NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'U', 1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
> C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>
> Any ideas?
>
> Brian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list