[R] A calculation in data.frame

Duncan Murdoch murdoch.duncan at gmail.com
Tue Jan 7 22:58:27 CET 2014


On 14-01-07 3:21 PM, Ron Michael wrote:
> Hi,
>
> I have to perform some formula driven calculation in a data.frame (as defined below). Let say I have following DF:
>
>> DF <- data.frame(A1 = c('a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'), A2 = c('m', 'n', 'p', 'm', 'n', 'p', 'm', 'n', 'p'), A3 = c(1,2,3,4,5,6,7,8,9))
>> DF
>    A1 A2 A3
> 1  a  m  1
> 2  a  n  2
> 3  a  p  3
> 4  b  m  4
> 5  b  n  5
> 6  b  p  6
> 7  c  m  7
> 8  c  n  8
> 9  c  p  9
>
>
> Now let say, user gives one formula which will be applied on the elements of A1 column. Let say the formula looks like:
>
> z = a + 2*b + c (infact the formula will be arbitrary like z = f(a, b, c))
>
> Once such formula is given, the result will be like (for the columns A1, A2, A3 respectively)
>
> z m 16
> z n 20
> z p 24
>
> the last column comes from the fact that 1 + 2*4 + 7 = 16, 2 + 2*5 + 8 = 20, 3 + 2*6 + 9 = 24
>
> Given that the formula wil be user defined, and to be applied on some data.frame like DF, I am seeking some automated way to accomplice the task for really big DF of previous kind and fairly complex formula.
>
> Can somebody suggest me for efficient way to perform this task in R?

A dataframe isn't really the best structure for this problem.  What you 
really have in R terms are three environments, indexed by A2, each 
containing bindings to a, b and c.  Within each of those environments 
you want to create a new binding to z, according to the user-supplied 
formula.

The way I'd implement that pretty much matches my description.  Have a 
named list of environments, write a function to evaluate the formula and 
assign the value, then just lapply it to your list.

If you really do want things in the dataframe format, then write 
functions to convert to it at the beginning, and from it at the very 
end.   Don't work with that format if efficiency matters to you.

Duncan Murdoch




More information about the R-help mailing list