[R] aggregation question

Christoph Lehmann christoph.lehmann at gmx.ch
Sat Apr 16 00:31:11 CEST 2005


great, Andy! Thanks a lot- I didn't know split. 
So 'split' can be used as alternative for 'aggregate', with the advantage 
that in the passed self-defined function one can consider more than one 
variable of the to-be-aggregated data.frame?

Christoph
> If I understood you correctly, here's one way:
> 
> > sumWO2 <- sapply(split(dat, dat$id), function(d) sum(d$meas[d$date !=
> 2]))
> > sumWO2
>         a         b         c 
> 0.9439614 0.4481582 1.6967618 
> 
> Andy
> 
> 
> > From: Christoph Lehmann 
> > 
> > Dear Sundar, dear Andy
> > manyt thanks for the length(unique(x)) hint. It solves of course my 
> > problem in a very elegant way. Just of curiosity (or for 
> > potential future 
> > problems): how could I solve it in a way, conceptually 
> > different, namely, 
> > that the computation on 'meas' being dependent on the 
> > variable 'date'?, 
> > means the computation on a variable x in the function passed 
> > to aggregate 
> > is conditional on the value of another variable y? I hope you 
> > understand 
> > what I mean, let's think of an example:
> > 
> > E.g for the example data.frame below, the sum shall be taken over the 
> > variable meas only for all entries with a corresponding 'data' != 2
> > 
> > for this do I have to nest two aggregate statements, or is 
> > there a way 
> > using sapply or similar apply-based commands?
> > 
> > thanks a lot for your kind help.
> > 
> > Cheers!
> > 
> > Christoph
> > 
> > aggregate(data$meas, list(id = data$id), sum)
> > > 
> > > 
> > > Christoph Lehmann wrote on 4/15/2005 9:51 AM:
> > > > Hi I have a question concerning aggregation
> > > > 
> > > > (simple demo code S. below)
> > > > 
> > > > I have the data.frame
> > > > 
> > > >    id        meas date
> > > > 1   a 0.637513747    1
> > > > 2   a 0.187710063    2
> > > > 3   a 0.247098459    2
> > > > 4   a 0.306447690    3
> > > > 5   b 0.407573577    2
> > > > 6   b 0.783255085    2
> > > > 7   b 0.344265082    3
> > > > 8   b 0.103893068    3
> > > > 9   c 0.738649586    1
> > > > 10  c 0.614154037    2
> > > > 11  c 0.949924371    3
> > > > 12  c 0.008187858    4
> > > > 
> > > > When I want for each id the sum of its meas I do:
> > > > 
> > > >     aggregate(data$meas, list(id = data$id), sum)
> > > > 
> > > > If I want to know the number of meas(ures) for each id I do, eg
> > > > 
> > > >     aggregate(data$meas, list(id = data$id), length)
> > > > 
> > > > NOW: Is there a way to compute the number of meas(ures) 
> > for each id 
> > with
> > > > not identical date (e.g using diff()?
> > > > so that I get eg:
> > > > 
> > > >   id x
> > > > 1  a 3
> > > > 2  b 2
> > > > 3  c 4
> > > > 
> > > > 
> > > > I am sure it must be possible
> > > > 
> > > > thanks for any (even short) hint
> > > > 
> > > > cheers
> > > > Christoph
> > > > 
> > > > 
> > > > 
> > > > --------------
> > > > data <- data.frame(c(rep("a", 4), rep("b", 4), rep("c", 4)),
> > > >                    runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 
> > 1, 2, 3, 4))
> > > > names(data) <- c("id", "meas", "date")
> > > > 
> > > > m <- aggregate(data$meas, list(id = data$id), sum)
> > > > names(m) <- c("id", "cum.meas")
> > > > 
> > > 
> > > 
> > > How about:
> > > 
> > > m <- aggregate(data["date"], data["id"],
> > >                 function(x) length(unique(x)))
> > > 
> > > --sundar
> > > 
> > 
> > -- 
> > +++ GMX - Die erste Adresse für Mail, Message, More +++
> > 

> > 
> > 
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
> 

-- 


GMX Garantie: Surfen ohne Tempo-Limit! http://www.gmx.net/de/go/dsl




More information about the R-help mailing list