[R] Odp: How do I make proper use of the by() function?

Petr PIKAL petr.pikal at precheza.cz
Mon Jun 13 11:06:33 CEST 2011


Hi

> [R] How do I make proper use of the by() function?
> 
> Dear list,
> 
> I have a function that uses values from two vectors  and spits out one 
new
> vector  based on all the values of the two original vectors, and with 
the
> same length as them.
> Ok, I would now like to be able to apply that function simply on two 
columns
> in a data frame, divided by the levels of factors in the data frame.
> 
> I'm trying to use by() for this, but the output is too hard to use. What 
I
> am doing is this:
> 
> by(df, list(df$Factor1,df$Factor2),function(x)
> my_function(x$col1,x$col2),simplify=TRUE)

by(df, list(df$Factor1,df$Factor2),function(x) 
my_function(x$col1,x$col2),simplify=TRUE)
Error in df$Factor1 : object of type 'closure' is not subsettable

I get this kind of error which is because I do not have any df object 
other than df function from stats package.

If I understand correctly you want maybe either aggregate or ave

> milan
   pc lot bettong    bet  sira vyber betctc
1   1  56   89.95 109.25 3.000     b  88.37
2   2  66   86.87 100.96 3.156     a     NA
3   3  84   93.91 101.21 3.120     a     NA
4   4  89   89.48 100.00 3.010     a     NA
5   5  41  110.15 116.92 3.597     b 106.37
6   6  44   96.53 106.54 3.057     b     NA
7   7  47   94.94 104.91 2.857     b     NA
8   8  62   90.30 111.05 3.210     b     NA
9   9  64   96.41 102.56 3.180     a  90.79
10 10  65   95.04 101.15 3.200     a     NA
11 11  57   88.27 104.71 3.060     b     NA
12 13  74   91.98 104.93 3.470     a     NA

> aggregate(milan[,2:5], list(milan$vyber), sum) 
  Group.1 lot bettong    bet   sira
1       a 442  553.69 610.81 19.136
2       b 307  570.14 653.38 18.781

> sapply(milan[,2:5], function(x) ave(x, milan$vyber, "sum"))
           lot  bettong      bet     sira
 [1,] 51.16667 95.02333 108.8967 3.130167
 [2,] 73.66667 92.28167 101.8017 3.189333
 [3,] 73.66667 92.28167 101.8017 3.189333
 [4,] 73.66667 92.28167 101.8017 3.189333
 [5,] 51.16667 95.02333 108.8967 3.130167
 [6,] 51.16667 95.02333 108.8967 3.130167
 [7,] 51.16667 95.02333 108.8967 3.130167
 [8,] 51.16667 95.02333 108.8967 3.130167
 [9,] 73.66667 92.28167 101.8017 3.189333
[10,] 73.66667 92.28167 101.8017 3.189333
[11,] 51.16667 95.02333 108.8967 3.130167
[12,] 73.66667 92.28167 101.8017 3.189333

But maybe I am wrong, I still only try to learn mind reading.

> 
> and the output is too complex to be used in a simple way. Actually, I 
just
> want something like a data frame, where the results vectors are placed 
in
> one column and the conditions under which they were produced (i.e. the
> values of the factors according to which the data set were divided) in 
other
> columns.
> 
> This does not seem to be provided by by(), and aggregate only provides 
me
> with the ability to use values from one column, right?
> So, are there other functions I could use?
> 
> Thanks!
> 
> /Fredrik
> 
> -- 
> "Life is like a trumpet - if you don't put anything into it, you don't 
get
> anything out of it."
> 
>    [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list