[R] aggregating using 'with' function

AC Del Re delre at wisc.edu
Sun Feb 21 03:55:30 CET 2010


OK, this is great, Jim.  Last question: How about if I want the 1 copy
of each id to be selected randomly versus taking the first one?

Thank you,

AC

> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholtman at gmail.com> wrote:
>> I am not sure what you mean by eliminating a row.  Now if you want only one
>> copy of each 'id', and it is the first one, the you can use 'duplicated':
>>
>>> x <- with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
>>> mod1),mean))
>>> x
>>    id mod1      r
>> 1   1    1  0.980
>> 2   4    1  0.640
>> 3   7    1  0.490
>> 4  10    1  0.180
>> 5   1    2  0.295
>> 6   5    2  0.490
>> 7   8    2  0.330
>> 8  11    2  0.600
>> 9   6    3 -0.040
>> 10  9    3  0.580
>> 11 12    3  0.210
>>> subset(x, !duplicated(id))
>>    id mod1     r
>> 1   1    1  0.98
>> 2   4    1  0.64
>> 3   7    1  0.49
>> 4  10    1  0.18
>> 6   5    2  0.49
>> 7   8    2  0.33
>> 8  11    2  0.60
>> 9   6    3 -0.04
>> 10  9    3  0.58
>> 11 12    3  0.21
>>
>>
>> On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <delre at wisc.edu> wrote:
>>>
>>> Perfect! Thanks Jim.
>>>
>>> Do you know how I could then reduce the data even further?
>>> Specifically, reducing it to 1 id per row? In this dataset, id 1 would
>>> have one row eliminated.
>>> Assume the data is much larger and cannot be deleted by visual
>>> inspection and elimination one row at a time.
>>>
>>>
>>> Thank you,
>>>
>>> AC
>>>
>>> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholtman at gmail.com> wrote:
>>> > This seems to work fine (notice the missing 'c(...)'; why did you think
>>> > you
>>> > needed it);
>>> >
>>> >>  with(datas,  aggregate(list(r = r),  by = list(id = id, mod1 =
>>> >> mod1),mean))
>>> >    id mod1      r
>>> > 1   1    1  0.980
>>> > 2   4    1  0.640
>>> > 3   7    1  0.490
>>> > 4  10    1  0.180
>>> > 5   1    2  0.295
>>> > 6   5    2  0.490
>>> > 7   8    2  0.330
>>> > 8  11    2  0.600
>>> > 9   6    3 -0.040
>>> > 10  9    3  0.580
>>> > 11 12    3  0.210
>>> >>
>>> >
>>> >
>>> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre at wisc.edu> wrote:
>>> >>
>>> >> Hi All,
>>> >>
>>> >> I am interested in aggregating a data frame based on 2
>>> >> categories--mean effect size (r) for each 'id's' 'mod1'. The
>>> >> 'with' function works well when aggregating on one category (e.g.,
>>> >> based on 'id' below) but doesnt work if I try 2 categories. How can
>>> >> this be accomplished?
>>> >>
>>> >> # sample data
>>> >>
>>> >> id<-c(1,1,1,rep(4:12))
>>> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8)
>>> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21)
>>> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3)))
>>> >> mod2<-c(1,2,15,rep(3,9))
>>> >> datas<-data.frame(id,n,r,mod1,mod2)
>>> >>
>>> >> # one category works perfect:
>>> >>
>>> >> with(datas,  aggregate(list(r = r),  by = list(id = id),mean))
>>> >>
>>> >>  id          r
>>> >> 1   1  0.5233333
>>> >> 2   4  0.6400000
>>> >> 3   5  0.4900000
>>> >> 4   6 -0.0400000
>>> >> 5   7  0.4900000
>>> >> 6   8  0.3300000
>>> >> 7   9  0.5800000
>>> >> 8  10  0.1800000
>>> >> 9  11  0.6000000
>>> >> 10 12  0.2100000
>>> >>
>>> >> # trying with 2 categories:
>>> >>
>>> >>  with(datas,  aggregate(list(r = r),  by = list(c(id = id, mod1 =
>>> >> mod1)),mean))
>>> >>
>>> >> Error in FUN(X[[1L]], ...) : arguments must have same length
>>> >>
>>> >> Thank you,
>>> >>
>>> >> AC
>>> >>
>>> >> ______________________________________________
>>> >> R-help at r-project.org mailing list
>>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>>> >> PLEASE do read the posting guide
>>> >> http://www.R-project.org/posting-guide.html
>>> >> and provide commented, minimal, self-contained, reproducible code.
>>> >
>>> >
>>> >
>>> > --
>>> > Jim Holtman
>>> > Cincinnati, OH
>>> > +1 513 646 9390
>>> >
>>> > What is the problem that you are trying to solve?
>>> >
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>



More information about the R-help mailing list