[R] correct function formation in R

Duncan Murdoch murdoch.duncan at gmail.com
Tue Nov 20 19:35:50 CET 2012


On 20/11/2012 12:39 PM, Omphalodes Verna wrote:
> Dear list!
>   
> I have question of 'correct function formation'. Which function (fun1 or fun2; see below) is written more correctly? Using ''structure'' as output or creating empty ''data.frame'' and then transform it as output? (fun1 and fun1 is just for illustration).
>   
> Thanks a lot, OV
>   
> code:
> input <- data.frame(x1 = rnorm(20), x2 = rnorm(20), x3 = rnorm(20))
> fun1 <- function(x) {
>      ID <- NULL; minimum <- NULL; maximum <- NULL
>      for(i in seq_along(names(x)))   {
>          ID[i]       <- names(x)[i]
>            minimum[i]  <- min(x[, names(x)[i]])
>              maximum[i]  <- max(x[, names(x)[i]])
>                                      }
>      output <- structure(list(ID, minimum, maximum), row.names = seq_along(names(x)), .Names = c("ID", "minimum", "maximum"), class = "data.frame")
>      return(output)
> }

fun1 above relies on the internal implementation of the data.frame 
class.  That's really unlikely to change, but you still shouldn't rely 
on it.

> fun2 <- function(x) {
>      output <- data.frame(ID = character(), minimum = numeric(), maximum = numeric(), stringsAsFactors = FALSE)
>      for(i in seq_along(names(x)))   {
>          output[i, "ID"] <-names(x)[i]
>          output[i, "minimum"]  <- min(x[, names(x)[i]])
>          output[i, "maximum"]  <- max(x[, names(x)[i]])
>                                      }
>      return(output)
> }

This one is going to be really slow, because it does so much indexing of 
the output dataframe.

I would combine the approaches:  assign to local variables in the loop 
the way fun1 does, then construct a dataframe at the end.  That is,

output <- data.frame(ID, minimum, maximum)
return(output)

One other change:  don't initialize the local variables to NULL, 
initialize them to their final size, e.g.

ID <- character(ncol(x))
minimum <- numeric(ncol(x))
maximum <- numeric(ncol(x))

(And if the contents are as simple as in the example, you don't need the 
loop, but I assume the real case is more complicated.)

Duncan Murdoch

>
> fun1(input)
> fun2(input)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list