[R] aggregating using 'with' function
AC Del Re
delre at wisc.edu
Sun Feb 21 14:40:30 CET 2010
Wow! Jim, this is really impressive. I can't wrap my head around how
you figured this out.
Thank you,
AC
On Sun, Feb 21, 2010 at 12:02 AM, jim holtman <jholtman at gmail.com> wrote:
> This will do it. You can see two different values for id=1:
>
>> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 =
>> mod1),mean))
>> x
> id mod1 r
> 1 1 1 0.980
> 2 4 1 0.640
> 3 7 1 0.490
> 4 10 1 0.180
> 5 1 2 0.295
> 6 5 2 0.490
> 7 8 2 0.330
> 8 11 2 0.600
> 9 6 3 -0.040
> 10 9 3 0.580
> 11 12 3 0.210
>> # choose random duplicate to use
>> do.call(rbind, lapply(split(x, x$id), function(.data)
>> .data[sample(nrow(.data), 1),]))
> id mod1 r
> 1 1 1 0.98
> 4 4 1 0.64
> 5 5 2 0.49
> 6 6 3 -0.04
> 7 7 1 0.49
> 8 8 2 0.33
> 9 9 3 0.58
> 10 10 1 0.18
> 11 11 2 0.60
> 12 12 3 0.21
>>
>> # choose random duplicate to use - try to see if a different one comes up
>> do.call(rbind, lapply(split(x, x$id), function(.data)
>> .data[sample(nrow(.data), 1),]))
> id mod1 r
> 1 1 2 0.295
> 4 4 1 0.640
> 5 5 2 0.490
> 6 6 3 -0.040
> 7 7 1 0.490
> 8 8 2 0.330
> 9 9 3 0.580
> 10 10 1 0.180
> 11 11 2 0.600
> 12 12 3 0.210
>>
>>
>
>
> On Sat, Feb 20, 2010 at 9:50 PM, AC Del Re <acdelre at gmail.com> wrote:
>>
>> OK, this is great, Jim. Last question: How about if I want the 1 copy
>> of each id to be selected randomly versus taking the first one?
>>
>> AC
>>
>> On Sat, Feb 20, 2010 at 8:37 PM, jim holtman <jholtman at gmail.com> wrote:
>> > I am not sure what you mean by eliminating a row. Now if you want only
>> > one
>> > copy of each 'id', and it is the first one, the you can use
>> > 'duplicated':
>> >
>> >> x <- with(datas, aggregate(list(r = r), by = list(id = id, mod1 =
>> >> mod1),mean))
>> >> x
>> > id mod1 r
>> > 1 1 1 0.980
>> > 2 4 1 0.640
>> > 3 7 1 0.490
>> > 4 10 1 0.180
>> > 5 1 2 0.295
>> > 6 5 2 0.490
>> > 7 8 2 0.330
>> > 8 11 2 0.600
>> > 9 6 3 -0.040
>> > 10 9 3 0.580
>> > 11 12 3 0.210
>> >> subset(x, !duplicated(id))
>> > id mod1 r
>> > 1 1 1 0.98
>> > 2 4 1 0.64
>> > 3 7 1 0.49
>> > 4 10 1 0.18
>> > 6 5 2 0.49
>> > 7 8 2 0.33
>> > 8 11 2 0.60
>> > 9 6 3 -0.04
>> > 10 9 3 0.58
>> > 11 12 3 0.21
>> >
>> >
>> > On Sat, Feb 20, 2010 at 8:07 PM, AC Del Re <delre at wisc.edu> wrote:
>> >>
>> >> Perfect! Thanks Jim.
>> >>
>> >> Do you know how I could then reduce the data even further?
>> >> Specifically, reducing it to 1 id per row? In this dataset, id 1 would
>> >> have one row eliminated.
>> >> Assume the data is much larger and cannot be deleted by visual
>> >> inspection and elimination one row at a time.
>> >>
>> >>
>> >> Thank you,
>> >>
>> >> AC
>> >>
>> >> On Sat, Feb 20, 2010 at 6:26 PM, jim holtman <jholtman at gmail.com>
>> >> wrote:
>> >> > This seems to work fine (notice the missing 'c(...)'; why did you
>> >> > think
>> >> > you
>> >> > needed it);
>> >> >
>> >> >> with(datas, aggregate(list(r = r), by = list(id = id, mod1 =
>> >> >> mod1),mean))
>> >> > id mod1 r
>> >> > 1 1 1 0.980
>> >> > 2 4 1 0.640
>> >> > 3 7 1 0.490
>> >> > 4 10 1 0.180
>> >> > 5 1 2 0.295
>> >> > 6 5 2 0.490
>> >> > 7 8 2 0.330
>> >> > 8 11 2 0.600
>> >> > 9 6 3 -0.040
>> >> > 10 9 3 0.580
>> >> > 11 12 3 0.210
>> >> >>
>> >> >
>> >> >
>> >> > On Sat, Feb 20, 2010 at 6:54 PM, AC Del Re <delre at wisc.edu> wrote:
>> >> >>
>> >> >> Hi All,
>> >> >>
>> >> >> I am interested in aggregating a data frame based on 2
>> >> >> categories--mean effect size (r) for each 'id's' 'mod1'. The
>> >> >> 'with' function works well when aggregating on one category (e.g.,
>> >> >> based on 'id' below) but doesnt work if I try 2 categories. How can
>> >> >> this be accomplished?
>> >> >>
>> >> >> # sample data
>> >> >>
>> >> >> id<-c(1,1,1,rep(4:12))
>> >> >> n<-c(10,20,13,22,28,12,12,36,19,12, 15,8)
>> >> >> r<-c(.98,.56,.03,.64,.49,-.04,.49,.33,.58,.18, .6,.21)
>> >> >> mod1<-factor(c(1,2,2, rep(c(1,2,3),3)))
>> >> >> mod2<-c(1,2,15,rep(3,9))
>> >> >> datas<-data.frame(id,n,r,mod1,mod2)
>> >> >>
>> >> >> # one category works perfect:
>> >> >>
>> >> >> with(datas, aggregate(list(r = r), by = list(id = id),mean))
>> >> >>
>> >> >> id r
>> >> >> 1 1 0.5233333
>> >> >> 2 4 0.6400000
>> >> >> 3 5 0.4900000
>> >> >> 4 6 -0.0400000
>> >> >> 5 7 0.4900000
>> >> >> 6 8 0.3300000
>> >> >> 7 9 0.5800000
>> >> >> 8 10 0.1800000
>> >> >> 9 11 0.6000000
>> >> >> 10 12 0.2100000
>> >> >>
>> >> >> # trying with 2 categories:
>> >> >>
>> >> >> with(datas, aggregate(list(r = r), by = list(c(id = id, mod1 =
>> >> >> mod1)),mean))
>> >> >>
>> >> >> Error in FUN(X[[1L]], ...) : arguments must have same length
>> >> >>
>> >> >> Thank you,
>> >> >>
>> >> >> AC
>> >> >>
>> >> >> ______________________________________________
>> >> >> R-help at r-project.org mailing list
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> PLEASE do read the posting guide
>> >> >> http://www.R-project.org/posting-guide.html
>> >> >> and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jim Holtman
>> >> > Cincinnati, OH
>> >> > +1 513 646 9390
>> >> >
>> >> > What is the problem that you are trying to solve?
>> >> >
>> >
>> >
>> >
>> > --
>> > Jim Holtman
>> > Cincinnati, OH
>> > +1 513 646 9390
>> >
>> > What is the problem that you are trying to solve?
>> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
More information about the R-help
mailing list