[R] aggregate data.frame based on column class
Ista Zahn
istazahn at gmail.com
Fri Jan 11 16:27:42 CET 2013
Please see in line.
On Fri, Jan 11, 2013 at 10:07 AM, Martin Batholdy
<batholdy at googlemail.com> wrote:
> Hi,
>
> When using the aggregate function to aggregate a data.frame by one or more grouping variables I often have the problem, that I want the mean for some numeric variables but the unique value for factor variables.
>
> So for example in this data-frame:
>
> data <- data.frame(x = rnorm(10,1,2), group = c(rep(1,5), rep(2,5)), gender =c(rep('m',5), rep('f',5)))
> aggregate(data, by=list(data$group), FUN=mean)
>
>
> I would like to have 'm' and 'f' in the third column, not NA.
>
>
> I see the problem, that it could happen that there is no unique factor level in a group –
> but is there an alternative function who at least tries what I am aiming at?
>
> That is;
>
> "aggregate the data.frame by a list of grouping variables,
> for numeric variables compute the mean,
> for factor variables return the unique factor value"
R is a language, so you just have to do the translation:
mt <- function(x) {
if(is.numeric(x)) { # if x is numeric
return(mean(x)) # compute the mean
} else { # otherwise
tab <- table(x) # tabulate x
return(paste(paste(names(tab), # and format it for display
tab, sep=": "),
collapse=", "))
}
}
aggregate(Dat, by=list(Dat$group), FUN=mt)
Best,
Ista
>
>
> Thanks!
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list