[R] Odp: aggregate function oddity

Mihalicza Péter mihalicza.peter at eski.hu
Tue Sep 18 10:51:04 CEST 2007


Sorry for the confusion, I was not clear enough, so I made a small 
example to illustrate:

 >m=data.frame(fac1=rep(c(1,2),3), fac2=c("a","b","b","b","a","b"), 
num1=1:6, num2=7:12)
 > m$fac1=as.factor(m$fac1)
 > m
  fac1 fac2 num1 num2
1    1    a    1    7
2    2    b    2    8
3    1    b    3    9
4    2    b    4   10
5    1    a    5   11
6    2    b    6   12
 >#I would like to get the sum of num1 and num2 grouped by c(1,2) and c(a,b)
 > ag=aggregate(m, list(m$fac1, m$fac2), sum)
Error in Summary.factor(..., na.rm = na.rm) :
        sum not meaningful for factors

 >#I understand, that it is possible to do...

 >ag=aggregate(m[,3:4], list(m$fac1, m$fac2), sum)
 > ag
  Group.1 Group.2 num1 num2
1       1       a    6   18
2       1       b    3    9
3       2       b   12   30

but I do not understand why aggragate tries to sum fac1 and fac2 since 
they are grouping variables that need not, and must not be summed. To my 
understanding the aggregate help text also does not speak about omitting 
factor variables from the data frame.

My question is whether I miss something, or this is how aggregate works. 
If the latter, than what is the reason for it.

Thanks, and sorry again!

Yours,
Peter Mihalicza



Petr PIKAL írta:
> Hi
>
> r-help-bounces at r-project.org napsal dne 17.09.2007 14:29:17:
>
>   
>> Dear All,
>>
>> I tried to aggregate the rows according to some factors in a data frame. 
>>     
>
>   
>> I got the
>> "Error in Summary.factor(..., na.rm = na.rm) :
>>         sum not meaningful for factors"
>> message. This problem was once already discussed in 2003 on this list, 
>> where the following solution was given: include only those columns -when 
>>     
>
>   
>> giving it to aggregate() -  that are not factors.
>>
>> It also worked for me, but this solution is a bit odd, since there is no 
>>     
>
>   
>> need to sum the factors given as grouping variables. Of course I may do 
>> something completely wrong.
>> help(aggregate) says:
>> ## S3 method for class 'data.frame': aggregate(x, by, FUN, ...)
>> |x|    an R object.
>> |by|    a list of grouping elements, each as long as the variables in 
>>     
> |x|. 
>   
>> Names for the grouping variables are provided if they are not given. The 
>>     
>
>   
>> elements of the list will be coerced to factors (if they are not already 
>>     
>
>   
>> factors).
>>
>> In my interpretation this means that the factor variables and the 
>> numeric variables are in the same data frame, namely x.
>>
>> The data frame looks like this (its mortality from cerebrovascular 
>> diseases):
>>  > str(agyer)
>> 'data.frame':   102 obs. of  65 variables:
>>  $ Country            : int  4055 4055 4055 4055 4055 4055 4055 4055 
>> 4055 4055 ...
>>  $ Name               : Factor w/ 5 levels "Estonia","Latvia",..: 1 1 1 
>> 1 1 1 1 1 1 1 ...
>>  $ Year               : int  1997 1997 1998 1999 1999 1999 2000 2000 
>> 2000 2001 ...
>>  $ List               : int  103 103 103 103 103 103 103 103 103 103 ...
>>  $ Sex                : int  2 1 2 2 1 2 2 1 1 2 ...
>>  $ Morticd10_103_Frmat: int  1 1 1 1 1 1 1 1 1 1 ...
>>  $ IM_Frmat           : int  1 1 1 1 1 1 1 1 1 1 ...
>>  $ Deaths1            : int  33 179 143 1428 83 61 3 759 29 4 ...
>> and a bunch of other int variables.
>>
>> After omitting agyer$Name, I do
>>  > agyerpr=aggregate(agyer, by=list(agyer$Country, agyer$Year, 
>> agyer$List, agyer$Sex, agyer$Morticd10_103_Frmat, agyer$IM_Frmat), sum)
>>     
>
> If this is the command you issued, it tries to aggregate the whole data 
> frame agyer including a factor variable Name, hence the error.
>
> You want probably to sum only Deaths column based on values in other 
> variables so you can do
>
> agyerpr <- with(agyer, aggregate(Deaths1, by=list(Country, Year,List,Sex, 
> Morticd10_103_Frmat, IM_Frmat), sum))
>
> Aggregate applies a function on each variable in R object, and if this 
> variable is not conforming to the function it will result in error.
> If you want to omit some columns from aggregation just put agyer[, 
> -c(column.numbers)] in x position of aggregate command.
>
>
> Regards
> Petr
>
>
>   
>> The sum is done on -the already omitted - factor of "Cause".
>>
>> I do not understand why it tries to sum a factor that is included in the 
>>     
>
>   
>> "by" list, since the concept is not to sum for those included, but use 
>> them for grouping. I am lucky with this database because all the factors 
>>     
>
>   
>> can be interpreted as integers and I do not have to onit them one by 
>> one, but what if not?
>>
>> Am I missing something with aggregate or classes?
>>
>> Thanks for your help!
>>
>> Sincerely,
>> Peter Mihalicza
>>
>>
>>
>> -- 
>> This message has been scanned for viruses and\ dangerous 
>>     
> con...{{dropped}}
>   
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>>     
> http://www.R-project.org/posting-guide.html
>   
>> and provide commented, minimal, self-contained, reproducible code.
>>     
>
>
>   



-- 
This message has been scanned for viruses and\ dangerous con...{{dropped}}



More information about the R-help mailing list