[R] Counting occurances of a letter by a factor

Davis, Brian Brian.Davis at uth.tmc.edu
Fri Sep 10 22:11:32 CEST 2010


I'm my quest for brevity I think I scarified too much clarity.

I'll try to be a little less brief in the hopes of being more clear.

Say I have data frame like this as before:
> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
> colnames(DF)<-c("X", "Y")
> DF
     X    Y
1   CC    L
2   CC    U
3 <NA>    L
4   CG    U
5   GG    L
6   GC <NA>

I need to count the frequency of the unique individual characters in DF$X at each factor level in DF$Y

So for DF$Y == "L"  there are 2 "C"'s and 2 "G"'s
and for DF$Y == "U" there are 3 "C"'s and 1 "G"

The NA's should not contribute to the counts.

If I had a individual character in DF$X instead of a string like:

> DF2<-data.frame(c("C", "C", NA, "C", "G", "G"), c("L", "U", "L", "U", "L", NA))
> colnames(DF2)<-c("X", "Y")
> DF2
     X    Y
1    C    L
2    C    U
3 <NA>    L
4    C    U
5    G    L
6    G <NA>

Then table gives me exactly what I need. 

> table(DF2)
   Y
X   L U
  C 1 2
  G 1 0



Hopefully this is a little bit clearer what I'm trying to accomplish.

Brian

-----Original Message-----
From: Phil Spector [mailto:spector at stat.berkeley.edu] 
Sent: Friday, September 10, 2010 2:52 PM
To: Davis, Brian
Subject: Re: [R] Counting occurances of a letter by a factor

Brian -
    Here's the only thing I can come up with to give the 
same result as your "ans", but it doesn't seem to correspond
with your description of the problem.

> DF1 = DF
> DF1$X = sapply(strsplit(as.character(DF$X),''),'[',1)
> DF2 = DF
> DF2$X = sapply(strsplit(as.character(DF$X),''),'[',2)
> newDF = rbind(DF1,DF2)
> table(newDF$Y,newDF$X)

     C G
   L 2 2
   U 3 1

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu



On Fri, 10 Sep 2010, Davis, Brian wrote:

> I'm trying to find a more elegant way of doing this.  What I'm trying to accomplish is to count the frequency of letters (major / minor alleles)  in  a string grouped by the factor levels in another column of my data frame.
>
> Ex.
>> DF<-data.frame(c("CC", "CC", NA, "CG", "GG", "GC"), c("L", "U", "L", "U", "L", NA))
>> colnames(DF)<-c("X", "Y")
>> DF
>     X    Y
> 1   CC    L
> 2   CC    U
> 3 <NA>    L
> 4   CG    U
> 5   GG    L
> 6   GC <NA>
>
> I have an ugly solution, which works if you know the factor levels of Y in advance.
>
>> ans<-rbind(table(unlist(strsplit(as.character(DF[DF[ ,'Y'] == 'L', 1]), ""))),
> + table(unlist(strsplit(as.character(DF[DF[ ,'Y']  == 'U', 1]), ""))))
>> rownames(ans)<-c("L", "U")
>> ans
>  C G
> L 2 2
> U 3 1
>
>
> I've played with table, xtab, tabulate, aggregate, tapply, etc but haven't found a combination that gives a more general solution to this problem.
>
> Any ideas?
>
> Brian
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list