[R] aggregate function oddity
Mihalicza Péter
mihalicza.peter at eski.hu
Mon Sep 17 14:29:17 CEST 2007
Dear All,
I tried to aggregate the rows according to some factors in a data frame.
I got the
"Error in Summary.factor(..., na.rm = na.rm) :
sum not meaningful for factors"
message. This problem was once already discussed in 2003 on this list,
where the following solution was given: include only those columns -when
giving it to aggregate() - that are not factors.
It also worked for me, but this solution is a bit odd, since there is no
need to sum the factors given as grouping variables. Of course I may do
something completely wrong.
help(aggregate) says:
## S3 method for class 'data.frame': aggregate(x, by, FUN, ...)
|x| an R object.
|by| a list of grouping elements, each as long as the variables in |x|.
Names for the grouping variables are provided if they are not given. The
elements of the list will be coerced to factors (if they are not already
factors).
In my interpretation this means that the factor variables and the
numeric variables are in the same data frame, namely x.
The data frame looks like this (its mortality from cerebrovascular
diseases):
> str(agyer)
'data.frame': 102 obs. of 65 variables:
$ Country : int 4055 4055 4055 4055 4055 4055 4055 4055
4055 4055 ...
$ Name : Factor w/ 5 levels "Estonia","Latvia",..: 1 1 1
1 1 1 1 1 1 1 ...
$ Year : int 1997 1997 1998 1999 1999 1999 2000 2000
2000 2001 ...
$ List : int 103 103 103 103 103 103 103 103 103 103 ...
$ Sex : int 2 1 2 2 1 2 2 1 1 2 ...
$ Morticd10_103_Frmat: int 1 1 1 1 1 1 1 1 1 1 ...
$ IM_Frmat : int 1 1 1 1 1 1 1 1 1 1 ...
$ Deaths1 : int 33 179 143 1428 83 61 3 759 29 4 ...
and a bunch of other int variables.
After omitting agyer$Name, I do
> agyerpr=aggregate(agyer, by=list(agyer$Country, agyer$Year,
agyer$List, agyer$Sex, agyer$Morticd10_103_Frmat, agyer$IM_Frmat), sum)
The sum is done on -the already omitted - factor of "Cause".
I do not understand why it tries to sum a factor that is included in the
"by" list, since the concept is not to sum for those included, but use
them for grouping. I am lucky with this database because all the factors
can be interpreted as integers and I do not have to onit them one by
one, but what if not?
Am I missing something with aggregate or classes?
Thanks for your help!
Sincerely,
Peter Mihalicza
--
This message has been scanned for viruses and\ dangerous con...{{dropped}}
More information about the R-help
mailing list