[R] conditionally merging adjacent rows in a data frame

Gabor Grothendieck ggrothendieck at gmail.com
Wed Dec 9 00:11:40 CET 2009


Here are a couple of solutions.  The first uses by and the second sqldf:

> Lines <- " rt dur tid  mood roi  x
+ 55 5523 200   4  subj   9  5
+ 56 5523  52   4  subj   7 31
+ 57 5523 209   4  subj   4  9
+ 58 5523 188   4  subj   4  7
+ 70 4016 264   5 indic   9 51
+ 71 4016 195   5 indic   4 14"
> d <- read.table(textConnection(Lines), header = TRUE)
>
>
> # solution 1 - see ?by and ?transform
>
> idx <- cumsum( c(TRUE,diff(d$roi)!=0) )
> do.call(rbind, by(d, idx, function(x)
+ transform(x, dur = sum(dur), x = mean(x))[1,,drop = FALSE ]))
    rt dur tid  mood roi  x
1 5523 200   4  subj   9  5
2 5523  52   4  subj   7 31
3 5523 397   4  subj   4  8
4 4016 264   5 indic   9 51
5 4016 195   5 indic   4 14
>
> # solution 2 - see http://sqldf.googlecode.com
>
> dd <- data.frame(d, idx) # idx computed above
> library(sqldf)
> sqldf("select rt, sum(dur) dur, tid, mood, roi, avg(x) x from dd group by idx")
    rt dur tid  mood roi  x
1 5523 200   4  subj   9  5
2 5523  52   4  subj   7 31
3 5523 397   4  subj   4  8
4 4016 264   5 indic   9 51
5 4016 195   5 indic   4 14


On Tue, Dec 8, 2009 at 7:50 AM, Titus von der Malsburg
<malsburg at gmail.com> wrote:
> Hi, I have a data frame and want to merge adjacent rows if some condition is
> met.  There's an obvious solution using a loop but it is prohibitively slow
> because my data frame is large.  Is there an efficient canonical solution for
> that?
>
>> head(d)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 209   4  subj   4  9
> 58 5523 188   4  subj   4  7
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> The desired result would have consecutive rows with the same roi value merged.
> dur values should be added and x values averaged, other values don't differ in
> these rows and should stay the same.
>
>> head(result)
>     rt dur tid  mood roi  x
> 55 5523 200   4  subj   9  5
> 56 5523  52   4  subj   7 31
> 57 5523 397   4  subj   4  8
> 70 4016 264   5 indic   9 51
> 71 4016 195   5 indic   4 14
>
> There's also a solution using reshape.  It uses an index for blocks
>
>  d$index <- cumsum(c(TRUE,diff(d$roi)!=0))
>
> melts and then casts for every column using an appropriate fun.aggregate.
> However, this is a bit cumbersome and also I'm not sure how to make sure that
> I get the original order of rows.
>
> Thanks for any suggestion.
>
>  Titus
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list