[R] Odp: aggregate function oddity
Petr PIKAL
petr.pikal at precheza.cz
Tue Sep 18 09:21:59 CEST 2007
Hi
r-help-bounces at r-project.org napsal dne 17.09.2007 14:29:17:
> Dear All,
>
> I tried to aggregate the rows according to some factors in a data frame.
> I got the
> "Error in Summary.factor(..., na.rm = na.rm) :
> sum not meaningful for factors"
> message. This problem was once already discussed in 2003 on this list,
> where the following solution was given: include only those columns -when
> giving it to aggregate() - that are not factors.
>
> It also worked for me, but this solution is a bit odd, since there is no
> need to sum the factors given as grouping variables. Of course I may do
> something completely wrong.
> help(aggregate) says:
> ## S3 method for class 'data.frame': aggregate(x, by, FUN, ...)
> |x| an R object.
> |by| a list of grouping elements, each as long as the variables in
|x|.
> Names for the grouping variables are provided if they are not given. The
> elements of the list will be coerced to factors (if they are not already
> factors).
>
> In my interpretation this means that the factor variables and the
> numeric variables are in the same data frame, namely x.
>
> The data frame looks like this (its mortality from cerebrovascular
> diseases):
> > str(agyer)
> 'data.frame': 102 obs. of 65 variables:
> $ Country : int 4055 4055 4055 4055 4055 4055 4055 4055
> 4055 4055 ...
> $ Name : Factor w/ 5 levels "Estonia","Latvia",..: 1 1 1
> 1 1 1 1 1 1 1 ...
> $ Year : int 1997 1997 1998 1999 1999 1999 2000 2000
> 2000 2001 ...
> $ List : int 103 103 103 103 103 103 103 103 103 103 ...
> $ Sex : int 2 1 2 2 1 2 2 1 1 2 ...
> $ Morticd10_103_Frmat: int 1 1 1 1 1 1 1 1 1 1 ...
> $ IM_Frmat : int 1 1 1 1 1 1 1 1 1 1 ...
> $ Deaths1 : int 33 179 143 1428 83 61 3 759 29 4 ...
> and a bunch of other int variables.
>
> After omitting agyer$Name, I do
> > agyerpr=aggregate(agyer, by=list(agyer$Country, agyer$Year,
> agyer$List, agyer$Sex, agyer$Morticd10_103_Frmat, agyer$IM_Frmat), sum)
If this is the command you issued, it tries to aggregate the whole data
frame agyer including a factor variable Name, hence the error.
You want probably to sum only Deaths column based on values in other
variables so you can do
agyerpr <- with(agyer, aggregate(Deaths1, by=list(Country, Year,List,Sex,
Morticd10_103_Frmat, IM_Frmat), sum))
Aggregate applies a function on each variable in R object, and if this
variable is not conforming to the function it will result in error.
If you want to omit some columns from aggregation just put agyer[,
-c(column.numbers)] in x position of aggregate command.
Regards
Petr
>
> The sum is done on -the already omitted - factor of "Cause".
>
> I do not understand why it tries to sum a factor that is included in the
> "by" list, since the concept is not to sum for those included, but use
> them for grouping. I am lucky with this database because all the factors
> can be interpreted as integers and I do not have to onit them one by
> one, but what if not?
>
> Am I missing something with aggregate or classes?
>
> Thanks for your help!
>
> Sincerely,
> Peter Mihalicza
>
>
>
> --
> This message has been scanned for viruses and\ dangerous
con...{{dropped}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list