[R] aggregate by factor
David Winsemius
dwinsemius at comcast.net
Sat Jan 30 22:20:57 CET 2010
On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:
> I have a data frame with two columns, a factor and a numeric. I
> want to create data frame with the factor, its frequency and the
> median of the numeric column
>> head(motifList)
> events score
> 1 aeijm -0.25000000
> 2 begjm -0.25000000
> 3 afgjm -0.25000000
> 4 afhjm -0.25000000
> 5 aeijm -0.25000000
> 6 aehjm 0.08333333
>
> To get the frequency table of events:
>
>> motifTable <- as.data.frame(table(motifList$events))
>> head(motifTable)
> Var1 Freq
> 1 aeijm 110
> 2 begjm 46
> 3 afgjm 337
> 4 afhjm 102
> 5 aehjm 190
> 6 adijm 18
>>
>
> Now get the score column back in.
>
>> motifTable2 <- merge(motifList, motifTable, by="events")
>> head(motifTable2)
> events percent freq
> 1 adgjm 0.00000000 111
> 2 adgjm NA 111
> 3 adgjm 0.13333333 111
> 4 adgjm 0.06666667 111
> 5 adgjm -0.16666667 111
> 6 adgjm NA 111
>>
>
> Then lastly to aggregate on the events column getting the median of
> the score
>> motifTable3 <- aggregate.data.frame(motifTable2,
>> by=list(motifTable2$events), FUN=median, na.rm=TRUE)
> Error in median.default(X[[1L]], ...) : need numeric data
>
> Which gives the error as events are a factor. Can someone enlighten
> me to a more obvious approach?
I don't think grouping on a factor is the source of your error. You
have NA's in your data and median will choke on those unless you
specify na.rm=TRUE.
--
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list