[R] Aggregating data (with more than one function)

Marc Schwartz MSchwartz at MedAnalytics.com
Tue Mar 29 03:56:13 CEST 2005


On Mon, 2005-03-28 at 19:15 -0600, Sivakumaran Raman wrote:
> I have the data similar to the following in a data frame:
>     LastName   Department  Salary
> 1   Johnson    IT          56000
> 2   James      HR          54223
> 3   Howe       Finance     80000
> 4   Jones      Finance     82000
> 5   Norwood    IT          67000
> 6   Benson     Sales       76000
> 7   Smith      Sales       65778
> 8   Baker      HR          56778
> 9   Dempsey    HR          78999
> 10  Nolan      Sales       45667
> 11  Garth      Finance     89777
> 12  Jameson    IT          56786
> 
> I want to calculate both the mean salary broken down by Department and 
> also the
> total amount paid out per department i.e. I want both sum(Salary) and
> mean(Salary) for each Department. Right now, I am using aggregate.data.frame
> twice, creating two data frames, and then combining them using data.frame.
> However, this seems to be very memory and processor intensive and is 
> taking a
> very long time on my data set. Is there a quicker way to do this?
> 
> Thanks in advance,
> Siv Raman


Here is one approach:

Presuming that 'df' is your data frame:

# Create a function that returns both values
my.summ <- function(x)
{
  c(mean = mean(x), sum = sum(x))
}


# Now split() 'df'  by Department
df.s <- split(df$Salary, df$Department)


# Now run the summary, using sapply()
> sapply(df.s, my.summ)
     Finance  HR       IT       Sales   
mean 83925.67 63333.33 59928.67 62481.67
sum  251777   190000   179786   187445


HTH,

Marc Schwartz




More information about the R-help mailing list