[R] Odp: aggregate function oddity

Petr PIKAL petr.pikal at precheza.cz
Tue Sep 18 09:21:59 CEST 2007


Hi

r-help-bounces at r-project.org napsal dne 17.09.2007 14:29:17:

> Dear All,
> 
> I tried to aggregate the rows according to some factors in a data frame. 

> I got the
> "Error in Summary.factor(..., na.rm = na.rm) :
>         sum not meaningful for factors"
> message. This problem was once already discussed in 2003 on this list, 
> where the following solution was given: include only those columns -when 

> giving it to aggregate() -  that are not factors.
> 
> It also worked for me, but this solution is a bit odd, since there is no 

> need to sum the factors given as grouping variables. Of course I may do 
> something completely wrong.
> help(aggregate) says:
> ## S3 method for class 'data.frame': aggregate(x, by, FUN, ...)
> |x|    an R object.
> |by|    a list of grouping elements, each as long as the variables in 
|x|. 
> Names for the grouping variables are provided if they are not given. The 

> elements of the list will be coerced to factors (if they are not already 

> factors).
> 
> In my interpretation this means that the factor variables and the 
> numeric variables are in the same data frame, namely x.
> 
> The data frame looks like this (its mortality from cerebrovascular 
> diseases):
>  > str(agyer)
> 'data.frame':   102 obs. of  65 variables:
>  $ Country            : int  4055 4055 4055 4055 4055 4055 4055 4055 
> 4055 4055 ...
>  $ Name               : Factor w/ 5 levels "Estonia","Latvia",..: 1 1 1 
> 1 1 1 1 1 1 1 ...
>  $ Year               : int  1997 1997 1998 1999 1999 1999 2000 2000 
> 2000 2001 ...
>  $ List               : int  103 103 103 103 103 103 103 103 103 103 ...
>  $ Sex                : int  2 1 2 2 1 2 2 1 1 2 ...
>  $ Morticd10_103_Frmat: int  1 1 1 1 1 1 1 1 1 1 ...
>  $ IM_Frmat           : int  1 1 1 1 1 1 1 1 1 1 ...
>  $ Deaths1            : int  33 179 143 1428 83 61 3 759 29 4 ...
> and a bunch of other int variables.
> 
> After omitting agyer$Name, I do
>  > agyerpr=aggregate(agyer, by=list(agyer$Country, agyer$Year, 
> agyer$List, agyer$Sex, agyer$Morticd10_103_Frmat, agyer$IM_Frmat), sum)

If this is the command you issued, it tries to aggregate the whole data 
frame agyer including a factor variable Name, hence the error.

You want probably to sum only Deaths column based on values in other 
variables so you can do

agyerpr <- with(agyer, aggregate(Deaths1, by=list(Country, Year,List,Sex, 
Morticd10_103_Frmat, IM_Frmat), sum))

Aggregate applies a function on each variable in R object, and if this 
variable is not conforming to the function it will result in error.
If you want to omit some columns from aggregation just put agyer[, 
-c(column.numbers)] in x position of aggregate command.


Regards
Petr


> 
> The sum is done on -the already omitted - factor of "Cause".
> 
> I do not understand why it tries to sum a factor that is included in the 

> "by" list, since the concept is not to sum for those included, but use 
> them for grouping. I am lucky with this database because all the factors 

> can be interpreted as integers and I do not have to onit them one by 
> one, but what if not?
> 
> Am I missing something with aggregate or classes?
> 
> Thanks for your help!
> 
> Sincerely,
> Peter Mihalicza
> 
> 
> 
> -- 
> This message has been scanned for viruses and\ dangerous 
con...{{dropped}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list