[R] Merging rows in dataframes

Gabor Grothendieck ggrothendieck at gmail.com
Wed Mar 25 14:44:12 CET 2009


I've added an example to FAQ 3 on the home page that
illustrates group_concat.
http://sqldf.googlecode.com

On Wed, Mar 25, 2009 at 9:06 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> In the Links box to the right on the sqldf home page click on
> "SQLite - aggregate functions" and lookup group_concat.
>
> On Wed, Mar 25, 2009 at 9:05 AM, Schragi Schwartz
> <schragas at post.tau.ac.il> wrote:
>> Thank you, your answer was extremely helpful. One last problem though: one
>> of the aggregate functions I'd like to apply on the columns is
>> concatentation (equivalent to the paste() function). So if I have a given
>> character column in three separate rows sharing the same ids with the value
>> "apple" in the first, "banana" in the second, and "orange" in the third, in
>> the summarizing row I'd like to receive output in the form
>> "apple|banana|orange". Is there any way to do this?
>>
>> Thanks again,
>> Schragi
>>
>> -----Original Message-----
>> From: Gabor Grothendieck [mailto:ggrothendieck at gmail.com]
>> Sent: Tuesday, March 24, 2009 12:50 AM
>> To: Schraga Schwartz
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Merging rows in dataframes
>>
>> Using sqldf you only need two statements, infile <- file(...) and
>> DF <- sqldf("select min(a), max(b), mean(c), ... from infile group by id").
>> The file statement identifies the filename and the second reads it
>> into sqlite (without
>> going through R), summarizes it and then reads the summarized version
>> into R.  You may also need to provide info on its format if its not in the
>> default format.  See example 4a on home page and the other examples
>> there:
>> http://sqldf.googlecode.com
>>
>>
>> On Mon, Mar 23, 2009 at 5:58 PM, Schraga Schwartz
>> <schragas at post.tau.ac.il> wrote:
>>> Hello,
>>>
>>> I have a dataframe with 40 columns and around 450,000 rows. The first
>> column
>>> in each row is a factor id and the remaining are numeric. Some rows have
>> the
>>> same ids. What I want to do is to merge each set of rows sharing the same
>>> ids (id set) into one single row (summarizing row) with that id. To create
>>> the summarizing row, I'd like to apply a different function on each of the
>>> original columns in the id set. Some columns within the summarizing row
>> will
>>> equal the mean of the columns in the id set, others will equal the
>> minimum,
>>> others the maximum.
>>>
>>> To do this, I tried using the by() function. However, this was extremely
>>> slow (it ran for more than two hours before I stopped it). Also, it used
>> up
>>> all of 16 GB of memory on my machine. Is there any more efficient
>> function,
>>> both in terms of time and memory, to do this sort of thing?
>>>
>>> Thank you very much,
>>> Schraga Schwartz
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>




More information about the R-help mailing list