[R] Applying function to a TABLE and also "apply, tapply, sapply etc"

Brian Diggs diggsb at ohsu.edu
Wed Dec 15 20:25:46 CET 2010


On 12/15/2010 7:18 AM, Amelia Vettori wrote:
> Dear R-help forum members,
>
> Suppose I have a data-frame having two variables and single data for
>  each of them, as described below.
>
> variable_1           variable_2 10                          20
>
> I have written a function, say, 'fun' which uses input 10 and 20 and
>  gives me desired result.
>
> fun = function(X, Y) { X + Y              #( I am just giving an
> example of process. Actual process is quite different.) }
>
> result = fun(variable_1[1], variable_2[1])   # Thus, i should be
> getting answer 30 which I am storing in say 'ans1.csv'
>
> #
> ____________________________________________________________________
>
> # My problem
>
> Suppose instead of having above dataframe having single data for
> variable 1 and variable 2, I have following data as
>
> variable_1           variable_2
>
> 10                         20 40                         30 3 11
>
> I need to run the function 'fun' for each pair of values taken by
> variable_1 and variable_2 separately. Also, the results (= 30, 70 and
> 14) obtained for each of above pairs should be stored in different
> csv files, say "ans1.csv", "ans2.csv" and "ans3.csv" respectively
> which I can use for further analysis. (In reality each of these
> output files will consists of 1000 records).
>
> As I had mentioned in my earlier mail, I am new to R and I think I
> should be using apply or sapply or tapply etc., which I have tried
> but I am not able to proceed further as I am not able to understand
> it properly.
>
> It will be a great help to me if I receive the guidance w.r.t
>
> (a) how do I tackle above problem i.e. how do I apply the function to
> a table so that it will generate different csv files pertaining to
> pair of values "10 and 20", "40 and 30" and "3 and 11";
>
> (b) I am not that sharp to understand the programming aspects of R
> taht easily, though I am really keen to learn R, so I will be highly
>  obliged if someone helps me understand with some simple examples as
>  to how "apply", "supply", "tapply", "mapply" etc can be used?
>
> I am sure this will go a long way in helping the new learners like me
> to undesrtand the proper use of these wonderful commands.
>
> I hope I am able to put forward my problem properly.
>
> Thanking all in advance for the anticipated guidance
>
> Amelia Vettori, Auckland

# a slightly more complicated demonstration function, which
# gives a result that make sense for writing to a CSV file.
fun <- function(X, Y) {
	data.frame(result=X + Y)
}

foo <- data.frame(variable_1=c(10,40,3), variable_2=c(20,30,11))

# using apply
# This only really works if the columns in foo are the same
# type because it will be transformed into a matrix (which
# is of one type). Also, since the column names of the data.frame
# don't match the arugments of fun, the unname is needed.
# do.call is a somewhat advanced function that lets you call
# a function with arguments that are stored in some other
# list.
apply(foo, 1, function(x) do.call("fun", as.list(unname(x))))

# version using apply, where foo has been transformed into
# something more like what apply would expect.
foo.m <- as.matrix(foo)
colnames(foo.m) <- c("X","Y")
apply(foo.m, 1, function(x){do.call("fun", as.list(x))})

# using lapply
# lapply takes a list, which for this looping purpose would have
# to be the row indexes of foo. This version does not reqire
# the different arguements to be the same type.
lapply(1:nrow(foo), function(i) {fun(foo[i,1],foo[i,2])})

# using mapply
# This one is more designed for when multiple arguments to a
# function are changing.
mapply(fun, foo[,1], foo[,2])

# using Vectorize
# A different approach, where instead of creating the looping
# structure, create a new function which is vectorized over its
# arguements.
fun.v <- Vectorize(fun)
fun.v(foo[,1], foo[,2])

# storing the results to disk
results <- mapply(fun, foo[,1], foo[,2])
# results is a list, each element of which is one of the returned
# sets of results corresponding to a row in the original data.frame
lapply(1:length(results), function(r) {write.csv(results[r],
file=paste("ans",r,".csv",sep=""))})

# if you didn't need different file names (the name of which depends on
# the position of the result in the list, not anything in the result
# itself), it could be simpler.
lapply(results, summary)


-- 
Brian S. Diggs, PhD
Senior Research Associate, Department of Surgery
Oregon Health & Science University



More information about the R-help mailing list