[R] aggregate by factor

David Winsemius dwinsemius at comcast.net
Sat Jan 30 23:13:55 CET 2010


On Jan 30, 2010, at 4:46 PM, david hilton shanabrook wrote:

>
> On 30 Jan 2010, at 4:20 PM, David Winsemius wrote:
>
>>
>> On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:
>>
>>> I have a data frame with two columns, a factor and a numeric.  I  
>>> want to create data frame with the factor, its frequency and the  
>>> median of the numeric column
>>>> head(motifList)
>>> events     score
>>> 1  aeijm -0.25000000
>>> 2  begjm -0.25000000
>>> 3  afgjm -0.25000000
>>> 4  afhjm -0.25000000
>>> 5  aeijm -0.25000000
>>> 6  aehjm  0.08333333
>>>
>>> To get the frequency table of events:
>>>
>>>> motifTable <- as.data.frame(table(motifList$events))
>>>> head(motifTable)
>>> Var1 Freq
>>> 1 aeijm  110
>>> 2 begjm   46
>>> 3 afgjm  337
>>> 4 afhjm  102
>>> 5 aehjm  190
>>> 6 adijm   18
>>>>
>>>
>>> Now get the score column back in.
>>>
>>>> motifTable2 <- merge(motifList, motifTable, by="events")
>>>> head(motifTable2)
>>> events     percent freq
>>> 1  adgjm  0.00000000  111
>>> 2  adgjm          NA  111
>>> 3  adgjm  0.13333333  111
>>> 4  adgjm  0.06666667  111
>>> 5  adgjm -0.16666667  111
>>> 6  adgjm          NA  111
>>>>
>>>
>>> Then lastly to aggregate on the events column getting the median  
>>> of the score
>>>> motifTable3 <- aggregate.data.frame(motifTable2,  
>>>> by=list(motifTable2$events), FUN=median, na.rm=TRUE)
>>> Error in median.default(X[[1L]], ...) : need numeric data
>>>
>>> Which gives the error as events are a factor.  Can someone  
>>> enlighten me to a more obvious approach?
>>
>> I don't think grouping on a factor is the source of your error. You  
>> have NA's in your data and median will choke on those unless you  
>> specify na.rm=TRUE.
>>
>> -- 
>
> I thought the na.rm=TRUE in the aggregate function would do this  
> (see above).  I also tried it with
>
I missed that.
>> medianRmNa <- function(data) {
> 	return(median(data, na.rm=TRUE))}
>
>> motifTable3 <- aggregate.data.frame(motifTable2,  
>> by=list(motifTable2$events), FUN=medianRmNa)
> Error in median.default(data, na.rm = TRUE) : need numeric data

Apparently you cannot include the grouping variable in the first  
argument to aggregate:

motifTable3 <- aggregate(motifTable2[ , -1],  
by=list(motifTable2$events), FUN=median, na.rm=TRUE)

 > motifTable3
   Group.1       score freq
1   aehjm  0.08333333    1
2   aeijm -0.25000000    2
3   afgjm -0.25000000    1
4   afhjm -0.25000000    1
5   begjm -0.25000000    1


>
> same error.
>
> I did leave a line out of the above script,
>
> names(motifTable) <- c("events", "freq")
> which helps explain why the merge works
>
> dhs
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list