[R] Grouping data.frames

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Jan 7 00:46:00 CET 2004


Olaf Mersmann <olafm at tako.de> writes:

> Hello all,
> 
> I'm new to R (and the S language in general) so go easy on me if this is really simple.
> 
> Given a data.frame df which looks like this:
> 	f1	f2	f3	f4	c1	c2
> 1	y	y	a	b	10	20	
> 2	n	y	b	a	20	20
> 3	n	n	b	b	 8	10
> 4	y	n	a	a	30	 5
> 
> I'd like to aggregate it by the factors f1 and f2 (or f2 and f3, or any other combination of the three) and compute the sum of c1 and c2 (as separate values). I can do this just fine as long as there is only one column with counts using tapply of mApply out of Hmisc, but I've been unable to come up with a solution that works with two or more columns.
> 
> In SQL a query to achieve this would look something like this:
> SELECT f1, f2, sum(c1), sum(2) FROM df GROUP BY f1, f2
> 
> An hints on how this is done efficiently in R would be greatly appreciated.

I think aggregate() will do what you want. If not, notice that
whatever you can do with a single factor, you can also do with
interaction(f1,f2) or maybe interaction(f1,f2, drop=TRUE). 

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907




More information about the R-help mailing list