[R] aggregate by factor
David Winsemius
dwinsemius at comcast.net
Sat Jan 30 23:13:55 CET 2010
On Jan 30, 2010, at 4:46 PM, david hilton shanabrook wrote:
>
> On 30 Jan 2010, at 4:20 PM, David Winsemius wrote:
>
>>
>> On Jan 30, 2010, at 4:09 PM, david hilton shanabrook wrote:
>>
>>> I have a data frame with two columns, a factor and a numeric. I
>>> want to create data frame with the factor, its frequency and the
>>> median of the numeric column
>>>> head(motifList)
>>> events score
>>> 1 aeijm -0.25000000
>>> 2 begjm -0.25000000
>>> 3 afgjm -0.25000000
>>> 4 afhjm -0.25000000
>>> 5 aeijm -0.25000000
>>> 6 aehjm 0.08333333
>>>
>>> To get the frequency table of events:
>>>
>>>> motifTable <- as.data.frame(table(motifList$events))
>>>> head(motifTable)
>>> Var1 Freq
>>> 1 aeijm 110
>>> 2 begjm 46
>>> 3 afgjm 337
>>> 4 afhjm 102
>>> 5 aehjm 190
>>> 6 adijm 18
>>>>
>>>
>>> Now get the score column back in.
>>>
>>>> motifTable2 <- merge(motifList, motifTable, by="events")
>>>> head(motifTable2)
>>> events percent freq
>>> 1 adgjm 0.00000000 111
>>> 2 adgjm NA 111
>>> 3 adgjm 0.13333333 111
>>> 4 adgjm 0.06666667 111
>>> 5 adgjm -0.16666667 111
>>> 6 adgjm NA 111
>>>>
>>>
>>> Then lastly to aggregate on the events column getting the median
>>> of the score
>>>> motifTable3 <- aggregate.data.frame(motifTable2,
>>>> by=list(motifTable2$events), FUN=median, na.rm=TRUE)
>>> Error in median.default(X[[1L]], ...) : need numeric data
>>>
>>> Which gives the error as events are a factor. Can someone
>>> enlighten me to a more obvious approach?
>>
>> I don't think grouping on a factor is the source of your error. You
>> have NA's in your data and median will choke on those unless you
>> specify na.rm=TRUE.
>>
>> --
>
> I thought the na.rm=TRUE in the aggregate function would do this
> (see above). I also tried it with
>
I missed that.
>> medianRmNa <- function(data) {
> return(median(data, na.rm=TRUE))}
>
>> motifTable3 <- aggregate.data.frame(motifTable2,
>> by=list(motifTable2$events), FUN=medianRmNa)
> Error in median.default(data, na.rm = TRUE) : need numeric data
Apparently you cannot include the grouping variable in the first
argument to aggregate:
motifTable3 <- aggregate(motifTable2[ , -1],
by=list(motifTable2$events), FUN=median, na.rm=TRUE)
> motifTable3
Group.1 score freq
1 aehjm 0.08333333 1
2 aeijm -0.25000000 2
3 afgjm -0.25000000 1
4 afhjm -0.25000000 1
5 begjm -0.25000000 1
>
> same error.
>
> I did leave a line out of the above script,
>
> names(motifTable) <- c("events", "freq")
> which helps explain why the merge works
>
> dhs
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list