[R] Merging rows in dataframes
schragas at post.tau.ac.il
Mon Mar 23 22:58:33 CET 2009
I have a dataframe with 40 columns and around 450,000 rows. The first
column in each row is a factor id and the remaining are numeric. Some
rows have the same ids. What I want to do is to merge each set of rows
sharing the same ids (id set) into one single row (summarizing row)
with that id. To create the summarizing row, I'd like to apply a
different function on each of the original columns in the id set. Some
columns within the summarizing row will equal the mean of the columns
in the id set, others will equal the minimum, others the maximum.
To do this, I tried using the by() function. However, this was
extremely slow (it ran for more than two hours before I stopped it).
Also, it used up all of 16 GB of memory on my machine. Is there any
more efficient function, both in terms of time and memory, to do this
sort of thing?
Thank you very much,
More information about the R-help