Matthew Dowle
mdowle at mdowle.plus.com
Wed Jan 26 12:13:02 CET 2011
Note that a key is not actually required, so it's even simpler syntax :
dX = as.data.table(X)
dX[,length(unique(z)),by="x,y"]
x y V1
[1,] 1 1 2
[2,] 1 2 2
[3,] 2 3 2
[4,] 2 4 2
[5,] 3 5 2
[6,] 3 6 2
or passing list() syntax to the 'by' is exactly the same :
dX[,length(unique(z)),by=list(x,y)]
The advantage of using the list() form is you can group by expressions
of columns, for example if x was a date column :
dX[,length(unique(z)),by=list(month(x),y)]
Matthew
"Dennis Murphy" <djmuser at gmail.com> wrote in message
news:AANLkTi=8TYSrRfzfm01m7fpzydh-cLS-J-cMbkAkjXxf at mail.gmail.com...
> Hi:
>
> Here are two more candidates, using the plyr and data.table packages:
>
> library(plyr)
> ddply(X, .(x, y), function(d) length(unique(d$z)))
> x y V1
> 1 1 1 2
> 2 1 2 2
> 3 2 3 2
> 4 2 4 2
> 5 3 5 2
> 6 3 6 2
>
> The function counts the number of unique z values in each sub-data frame
> with the same x and y values. The argument d in the anonymous function is
> a
> data frame object.
>
> # data.table version:
>
> library(data.table)
> dX <- data.table(X, key = 'x, y')
> dX[, list(nz = length(unique(z))), by = 'x, y']
> x y nz
> [1,] 1 1 2
> [2,] 1 2 2
> [3,] 2 3 2
> [4,] 2 4 2
> [5,] 3 5 2
> [6,] 3 6 2
>
> The key columns sort the data by x, y combinations and then find nz in
> each
> data subset.
>
> If you intend to do a lot of summarization/data manipulation in R, these
> packages are worth learning.
>
> HTH,
> Dennis
>
> On Tue, Jan 25, 2011 at 11:25 AM, Ryan Utz <utz.ryan at gmail.com> wrote:
>
>> Hi R-users,
>>
>> I'm trying to find an elegant way to count the number of rows in a
>> dataframe
>> with a unique combination of 2 values in the dataframe. My data is
>> specifically one column with a year, one with a month, and one with a
>> day.
>> I'm trying to count the number of days in each year/month combination.
>> But
>> for simplicity's sake, the following dataset will do:
>>
>> x<-c(1,1,1,1,2,2,2,2,3,3,3,3)
>> y<-c(1,1,2,2,3,3,4,4,5,5,6,6)
>> z<-c(1,2,3,4,5,6,7,8,9,10,11,12)
>> X<-data.frame(x y z)
>>
>> So with dataset X, how would I count the number of z values (3rd column
>> in
>> X) with unique combinations of the first two columns (x and y)? (for
>> instance, in the above example, there are 2 instances per unique
>> combination
>> of the first two columns). I can do this in Matlab and it's easy, but
>> since
>> I'm new to R this is royally stumping me.
>>
>> Thanks,
>> Ryan
>>
>>
>>
>
>
