[R] aggregate by factor

David Winsemius dwinsemius at comcast.net
Sat Jan 30 22:20:57 CET 2010


On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:

> I have a data frame with two columns, a factor and a numeric.  I  
> want to create data frame with the factor, its frequency and the  
> median of the numeric column
>> head(motifList)
>  events     score
> 1  aeijm -0.25000000
> 2  begjm -0.25000000
> 3  afgjm -0.25000000
> 4  afhjm -0.25000000
> 5  aeijm -0.25000000
> 6  aehjm  0.08333333
>
> To get the frequency table of events:
>
>> motifTable <- as.data.frame(table(motifList$events))
>> head(motifTable)
>   Var1 Freq
> 1 aeijm  110
> 2 begjm   46
> 3 afgjm  337
> 4 afhjm  102
> 5 aehjm  190
> 6 adijm   18
>>
>
> Now get the score column back in.
>
>> motifTable2 <- merge(motifList, motifTable, by="events")
>> head(motifTable2)
>  events     percent freq
> 1  adgjm  0.00000000  111
> 2  adgjm          NA  111
> 3  adgjm  0.13333333  111
> 4  adgjm  0.06666667  111
> 5  adgjm -0.16666667  111
> 6  adgjm          NA  111
>>
>
> Then lastly to aggregate on the events column getting the median of  
> the score
>> motifTable3 <- aggregate.data.frame(motifTable2,  
>> by=list(motifTable2$events), FUN=median, na.rm=TRUE)
> Error in median.default(X[[1L]], ...) : need numeric data
>
> Which gives the error as events are a factor.  Can someone enlighten  
> me to a more obvious approach?

I don't think grouping on a factor is the source of your error. You  
have NA's in your data and median will choke on those unless you  
specify na.rm=TRUE.

-- 

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list