[R] Aggregating data (with more than one function)

Liaw, Andy andy_liaw at merck.com
Wed Mar 30 03:22:27 CEST 2005


> agg.dat <- do.call("rbind", by(dat$Salary, dat$Department, 
+                function(x) c(mean=mean(x), total=sum(x))))
> agg.dat <- data.frame(dept=rownames(agg.dat), agg.dat)
> agg.dat
           dept     mean  total
Finance Finance 83925.67 251777
HR           HR 63333.33 190000
IT           IT 59928.67 179786
Sales     Sales 62481.67 187445

Andy 

> From: Robin Schroeder 
> 
> Dear list & Andy, 
> 
> I am hopelessly stumped, how would one add the department 
> names as a variable?
> 
> Robin
> 
> > Robin Tori Schroeder
> > International Institute for Sustainability 
> > P.O. Box 873211
> > Arizona State University
> > Tempe, Arizona 85287-3211
> > Phone: (480) 727-7290
> > 
> > 
> 
> 
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Liaw, Andy
> Sent: Monday, March 28, 2005 6:45 PM
> To: 'Sivakumaran Raman'; r-help at stat.math.ethz.ch
> Subject: RE: [R] Aggregating data (with more than one function)
> 
> 
> Here's one possible way, using the data you supplied:
> 
> > dat <- read.table("clipboard", header=T, row=1)
> > do.call("rbind",by(dat$Salary, dat$Department, function(x) 
> c(mean=mean(x),
> total=sum(x))))
>             mean  total
> Finance 83925.67 251777
> HR      63333.33 190000
> IT      59928.67 179786
> Sales   62481.67 187445
> 
> If you need the department names as a variable, you can add 
> that easily.
> 
> HTH,
> Andy
> 
> > From: Sivakumaran Raman
> > 
> > I have the data similar to the following in a data frame:
> >     LastName   Department  Salary
> > 1   Johnson    IT          56000
> > 2   James      HR          54223
> > 3   Howe       Finance     80000
> > 4   Jones      Finance     82000
> > 5   Norwood    IT          67000
> > 6   Benson     Sales       76000
> > 7   Smith      Sales       65778
> > 8   Baker      HR          56778
> > 9   Dempsey    HR          78999
> > 10  Nolan      Sales       45667
> > 11  Garth      Finance     89777
> > 12  Jameson    IT          56786
> > 
> > I want to calculate both the mean salary broken down by 
> > Department and 
> > also the
> > total amount paid out per department i.e. I want both 
> sum(Salary) and
> > mean(Salary) for each Department. Right now, I am using 
> > aggregate.data.frame
> > twice, creating two data frames, and then combining them 
> > using data.frame.
> > However, this seems to be very memory and processor 
> intensive and is 
> > taking a
> > very long time on my data set. Is there a quicker way to do this?
> > 
> > Thanks in advance,
> > Siv Raman
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list