[R] Grouping data in a data frame: is there an efficient way to do it?
David Winsemius
dwinsemius at comcast.net
Thu Sep 3 00:59:25 CEST 2009
table is reasonably fast. I have more than 4 X 10^6 records and a 2D
table takes very little time:
nUA <- with (TRdta, table(URwbc, URrbc)) # both URwbc and URrbc are
factors
nUA
This does the same thing and took about 5 seconds just now:
xtabs( ~ URwbc + URrbc, data=TRdta)
On Sep 2, 2009, at 6:39 PM, Leo Alekseyev wrote:
> I have a data frame with about 10^6 rows; I want to group the data
> according to entries in one of the columns and do something with it.
> For instance, suppose I want to count up the number of elements in
> each group. I tried something like aggregate(my.df$my.field,
> list(my.df$my.field), length) but it seems to be very slow. Likewise,
> the split() function was slow (I killed it before it completed). Is
> there a way to efficiently accomplish this in R?.. I am almost
> tempted to write an external Perl/Python script entering every row
> into a hashtable keyed by my.field and iterating over the keys...
> Might this be faster?..
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list