[R] matching each row

Wed Jul 8 21:57:18 CEST 2009

On Jul 8, 2009, at 12:17 PM, tathta wrote:

>
>> From an email suggestion, here are two sample datasets, and my  
>> ideal output:
>
> dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2)
> dataB <-
> data 
> .frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29)
>
> ## mystery operation(s) happen here....
>
> ## ideal output would be:
> dataA <-
> data 
> .frame 
> (unique 
> .id 
> =c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2))
>
>
> so my mystery operation(s) would count the number of times the  
> unique id
> shows up in a given dataset.
> my ideal outputs are as follows:
> countA is the "mystery operation" applied to dataA (counting  
> occurrences
> within the same dataset)
> countB is applied to dataB (counting occurrences within a second  
> dataset).
>
>
>
> My best try so far is to do:
> tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length)
>
> which gives me a matrix with ONE instance of each unique.id and the
> counts...
> (and which I thought was kinda cute)
> but it only works for within a single dataset!

<snip>

Modify my initial proposal:

countA <- as.data.frame(table(dataA$unique.id), responseName = "countA")
countB <- as.data.frame(table(dataB$unique.id), responseName = "countB")

 > countA
   Var1 countA
1    A      1
2    B      2
3    C      1

 > countB
   Var1 countB
1    A      4
2    B      2
3    C      1
4    D      1

dataA <- merge(dataA, countA, by.x = "unique.id", by.y = "Var1")
dataA <- merge(dataA, countB, by.x = "unique.id", by.y = "Var1")

 > dataA
   unique.id  x y countA countB
1         A 11 5      1      4
2         B 12 4      2      2
3         B 14 2      2      2
4         C 13 3      1      1

Note that without 'all.x = TRUE' in the merge() calls, only those  
unique.id's that are common to both datasets will be in the result. If  
you want to include unique.id's that are in A, but not in B, using  
'all.x = TRUE'.

Note also that by default, 'unique.id' will be alpha sorted in the  
output.

HTH,

Marc Schwartz