[R] best way to apply a list of functions to a dataset ?

Glen Barnett glnbrntt at gmail.com
Wed Jul 21 02:37:16 CEST 2010


Hi Dennis,

Thanks for the reply.

Yes, that's easier, but the conversion to a matrix with rbind has
converted the output of that final function to a numeric.

I included that last function in the example secifically to preclude
people assuming that functions would always return the same type.

I guess this doesn't matter too much for a logical, but what if
instead the function returned a character (say "mean", "median", or
"equal" - indicating which one was larger, or "equal" which could
easily happen with discrete data). This precludes using rbind (which I
also used at first, before I noticed that sometimes I could have
functions that don't return numerics).

Glen


On Tue, Jul 20, 2010 at 6:55 PM, Dennis Murphy <djmuser at gmail.com> wrote:
> Hi:
>
> This might be a little easier (?):
>
> library(datasets)
> skewness <- function(x) mean(scale(x)^3)
> mean.gt.med <- function(x) mean(x)>median(x)
>
> # ------
> # construct the function to apply to each variable in the data frame
> f <- function(x) c(mean = mean(x), sd = sd(x), skewness = skewness(x),
>      median = median(x), mean.gt.med = mean.gt.med(x))
>
> # map function to each variable with lapply and combine with do.call():
> do.call(rbind, lapply(attitude, f))
>                mean        sd    skewness median mean.gt.med
> rating     64.63333 12.172562 -0.35792491   65.5           0
> complaints 66.60000 13.314757 -0.21541749   65.0           1
> privileges 53.13333 12.235430  0.37912287   51.5           1
> learning   56.36667 11.737013 -0.05403354   56.5           0
> raises     64.63333 10.397226  0.19754317   63.5           1
> critical   74.76667  9.894908 -0.86577893   77.5           0
> advance    42.93333 10.288706  0.85039799   41.0           1
>
> HTH,
> Dennis
>
>
> On Mon, Jul 19, 2010 at 10:51 PM, Glen Barnett <glnbrntt at gmail.com> wrote:
>>
>> Assuming I have a matrix of data (or under some restrictions that will
>> become obvious, possibly a data frame), I want to be able to apply a
>> list of functions (initially producing a single number from a vector)
>> to the data and produce a data frame (for compact output) with column
>> 1 being the function results for the first function, column 2 being
>> the results for the second function and so on - with each row being
>> the columns of the original data.
>>
>> The obvious application of this is to produce summaries of data sets
>> (a bit like summary() does on numeric matrices), but with user
>> supplied functions. I am content for the moment to leave it to the
>> user to supply functions that work with the data they supply so as to
>> produce results that will actually be data-frame-able, though I'd like
>> to ultimately make it a bit nicer than it currently is without
>> compromising the niceness of the output in the "good" cases.
>>
>> The example below is a simplistic approach to this problem (it should
>> run as is). I have named it "fapply" for fairly obvious reasons, but
>> added the ".1" because it doesn't accept multidimensional arrays. I
>> have included the output I generated, which is what I want. There are
>> some obvious generalizations (e.g. being able to include functions
>> like range(), say, that produce several values on a vector, rather
>> than one, making the user's life simpler when a function already does
>> most of what they need).
>>
>> The question is: this looks like a silly approach, growing a list
>> inside a for loop. Also I recall reading that if you find yourself
>> using "do.call" you should probably be doing something else.
>>
>> So my question: Is there a better way to implement a function like this?
>>
>> Or, even better, is there already a function that does this?
>>
>> ## example function and code to apply a list of functions to a matrix
>> (here a numeric data frame)
>>
>> library(datasets)
>>
>> fapply.1 <- function(x, fun.l, colnames=fun.l){
>> out.l <- list()   # starts with an empty list
>> for (i in seq_along(fun.l)) out.l[[i]] <- apply(x,2,fun.l[[i]])   #
>> loop through list of functions
>>
>> # set up names and make into a data frame
>> names(out.l) <- colnames
>> attr(out.l,"row.names") <- names(out.l[[1]])
>> attr(out.l,"class") <- "data.frame"
>> out.l
>> }
>>
>> skewness <- function(x) mean(scale(x)^3)      #define a simple numeric
>> function
>> mean.gt.med <- function(x) mean(x)>median(x)  # define a simple
>> non-numeric fn
>> flist <- c("mean","sd","skewness","median","mean.gt.med") # make list
>> of fns to apply
>>
>> fapply.1(attitude,flist)
>>               mean        sd    skewness median mean.gt.med
>> rating     64.63333 12.172562 -0.35792491   65.5       FALSE
>> complaints 66.60000 13.314757 -0.21541749   65.0        TRUE
>> privileges 53.13333 12.235430  0.37912287   51.5        TRUE
>> learning   56.36667 11.737013 -0.05403354   56.5       FALSE
>> raises     64.63333 10.397226  0.19754317   63.5        TRUE
>> critical   74.76667  9.894908 -0.86577893   77.5       FALSE
>> advance    42.93333 10.288706  0.85039799   41.0        TRUE
>>
>> ## end code and output
>>
>> So did I miss something obvious?
>>
>> Any suggestions as far as style or simple stability-enhancing
>> improvements would be handy.
>>
>> regards,
>> Glen
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>



More information about the R-help mailing list