[R] Aggregating data (with more than one function)

Tue Mar 29 03:59:21 CEST 2005

In the Arguments section of help(aggregate), you will find :

     FUN: a scalar function to compute the summary statistics which can
          be applied to all data subsets.

a) So you can try the 'by' function :

> by( df[ , 3], df$Department, function(x) c(mean(x), sum(x)) )
  INDICES: Finance
  [1]  83925.67 251777.00
  ------------------------------------------------------------
  INDICES: HR
  [1]  63333.33 190000.00
  ------------------------------------------------------------
  INDICES: IT
  [1]  59928.67 179786.00
  ------------------------------------------------------------
  INDICES: Sales
  [1]  62481.67 187445.00

b) or use tapply more directly :

> tmp <- tapply(df$Salary, df$Department, function(x) 
                                            c( mean(x), sum(x) ) )
  $Finance
  [1]  83925.67 251777.00

  $HR
  [1]  63333.33 190000.00

  $IT
  [1]  59928.67 179786.00

  $Sales
  [1]  62481.67 187445.00

And using the 'sapply( tmp, c )' gives you a slightly more compact
output as

       Finance        HR        IT     Sales
[1,]  83925.67  63333.33  59928.67  62481.67
[2,] 251777.00 190000.00 179786.00 187445.00

Regards, Adai

On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman wrote:
> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using aggregate.data.frame
> twice, creating two data frames, and then combining them using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>