[R] Applying function to a TABLE and also "apply, tapply, sapply etc"

Liviu Andronic landronimirc at gmail.com
Wed Dec 15 17:24:08 CET 2010


On Wed, Dec 15, 2010 at 4:18 PM, Amelia Vettori
<amelia_vettori at yahoo.co.nz> wrote:
> Dear R-help forum members,
>
> Suppose I have a data-frame having two variables and single data for each of them, as described below.
>
> variable_1           variable_2
>         10                          20
>
> I have written a function, say, 'fun' which uses input 10 and 20 and gives me desired result.
>
> fun = function(X, Y)
>          {
>          X + Y              #( I am just giving an example of process. Actual process is
>  quite different.)
>          }
>
> result = fun(variable_1[1], variable_2[1])   # Thus, i should be getting answer 30 which I am storing in say 'ans1.csv'
>
> # ____________________________________________________________________
>
> # My problem
>
> Suppose instead of having above dataframe having single data for variable 1 and variable 2, I have following data as
>
> variable_1           variable_2
>
>        10                         20
>         40                         30
>         3                          11
>
> I need to run the function 'fun' for each pair of values taken by variable_1 and variable_2 separately. Also, the results (= 30, 70 and 14) obtained for each of above pairs should be stored in different csv files, say "ans1.csv", "ans2.csv" and "ans3.csv" respectively which I can use for further analysis. (In reality each of these output files will consists of 1000 records).
>
> As I had mentioned in my earlier mail, I am new to R and I think I
>  should be using apply or sapply or tapply etc., which I have tried but I am not able to proceed further as I am not able to understand it properly.
>
> It will be a great help to me if I receive the guidance w.r.t
>
> (a) how do I tackle above problem i.e. how do I apply the function to a table so that it will generate different csv files pertaining to pair of values "10 and 20", "40 and 30" and "3 and 11";
>

Say you have the following data frame
> df
  Var1 V2
1   10 20
2   40 30
3    3 11
> str(df)
'data.frame':	3 obs. of  2 variables:
 $ Var1: num  10 40 3
 $ V2  : num  20 30 11

Then
> apply(df, 1, sum)  ##compute sum() for each row
 1  2  3
30 70 14
> apply(df, 2, sum)  ##compute sum() for each column
Var1   V2
  53   61


> (b) I am not that sharp to understand the programming aspects of R taht easily, though I am really keen to learn R, so I will be highly obliged if someone helps me understand with some simple examples as to how "apply", "supply", "tapply", "mapply" etc can be used?
>

Only some examples that I understand well.
##apply function to each element of a list (data frames are lists)
##compute sum() for each column
> lapply(df, sum)
$Var1
[1] 53

$V2
[1] 61

##sapply() is a variation of lapply(); see the docs
> sapply(df, sum)
Var1   V2
  53   61

##using the 'iris' data frame, for each Species level compute mean()
of the Sepal.Length column
> with(iris, tapply(Sepal.Length, Species, mean))
    setosa versicolor  virginica
     5.006      5.936      6.588

##a friendlier interface is provided by by()
> with(iris, by(Sepal.Length, Species, mean))
Species: setosa
[1] 5.006
------------------------------------------------------------
Species: versicolor
[1] 5.936
------------------------------------------------------------
Species: virginica
[1] 6.588

##the same, now for four variables at the same time
> by(iris[1:4], iris$Species, mean)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       5.006        3.428        1.462        0.246
------------------------------------------------------------
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       5.936        2.770        4.260        1.326
------------------------------------------------------------
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width
       6.588        2.974        5.552        2.026


For an example of mapply see this recent post:
http://r.789695.n4.nabble.com/calculating-mean-of-list-components-tp3088986p3089057.html

For more on vectorization, see sections 3 and 4 of the 'R inferno'
[1]. Also check 'Some Hints for the R Beginner' [2].
[1] http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
[2] http://www.burns-stat.com/pages/Tutor/hints_R_begin.html

Regards
Liviu


> I am sure this will go a long way in helping the new learners like me to undesrtand the proper use of these wonderful commands.
>
> I hope I am able to put forward my problem properly.
>
> Thanking all in advance for the anticipated guidance
>
> Amelia
>  Vettori, Auckland
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail



More information about the R-help mailing list