[R] aggregation question

Fri Apr 15 21:23:54 CEST 2005

Dear Sundar, dear Andy
manyt thanks for the length(unique(x)) hint. It solves of course my 
problem in a very elegant way. Just of curiosity (or for potential future 
problems): how could I solve it in a way, conceptually different, namely, 
that the computation on 'meas' being dependent on the variable 'date'?, 
means the computation on a variable x in the function passed to aggregate 
is conditional on the value of another variable y? I hope you understand 
what I mean, let's think of an example:

E.g for the example data.frame below, the sum shall be taken over the 
variable meas only for all entries with a corresponding 'data' != 2

for this do I have to nest two aggregate statements, or is there a way 
using sapply or similar apply-based commands?

thanks a lot for your kind help.

Cheers!

Christoph

aggregate(data$meas, list(id = data$id), sum)
> 
> 
> Christoph Lehmann wrote on 4/15/2005 9:51 AM:
> > Hi I have a question concerning aggregation
> > 
> > (simple demo code S. below)
> > 
> > I have the data.frame
> > 
> >    id        meas date
> > 1   a 0.637513747    1
> > 2   a 0.187710063    2
> > 3   a 0.247098459    2
> > 4   a 0.306447690    3
> > 5   b 0.407573577    2
> > 6   b 0.783255085    2
> > 7   b 0.344265082    3
> > 8   b 0.103893068    3
> > 9   c 0.738649586    1
> > 10  c 0.614154037    2
> > 11  c 0.949924371    3
> > 12  c 0.008187858    4
> > 
> > When I want for each id the sum of its meas I do:
> > 
> >     aggregate(data$meas, list(id = data$id), sum)
> > 
> > If I want to know the number of meas(ures) for each id I do, eg
> > 
> >     aggregate(data$meas, list(id = data$id), length)
> > 
> > NOW: Is there a way to compute the number of meas(ures) for each id 
with
> > not identical date (e.g using diff()?
> > so that I get eg:
> > 
> >   id x
> > 1  a 3
> > 2  b 2
> > 3  c 4
> > 
> > 
> > I am sure it must be possible
> > 
> > thanks for any (even short) hint
> > 
> > cheers
> > Christoph
> > 
> > 
> > 
> > --------------
> > data <- data.frame(c(rep("a", 4), rep("b", 4), rep("c", 4)),
> >                    runif(12), c(1, 2, 2, 3, 2, 2, 3, 3, 1, 2, 3, 4))
> > names(data) <- c("id", "meas", "date")
> > 
> > m <- aggregate(data$meas, list(id = data$id), sum)
> > names(m) <- c("id", "cum.meas")
> > 
> 
> 
> How about:
> 
> m <- aggregate(data["date"], data["id"],
>                 function(x) length(unique(x)))
> 
> --sundar
> 

--