[R] a vectorized solution to some simple dataframe math?

Sat Mar 27 15:17:46 CET 2010

On Mar 27, 2010, at 6:53 AM, Dennis Murphy wrote:

> Hi:
>
> Does this do what you want?
>
> # Create some fake data...
>
> df <- data.frame(id = factor(rep(c('cell1', 'cell2'), each = 10)),
>                  cond = factor(rep(rep(c('A', 'B'), each = 5), 2)),
>                  time = round(rnorm(20, 350, 10), 2))
>
> # Create a function to subtract each element of a vector from its mean
> f <- function(x) x - mean(x)

A function, ave,  already exists in base R for calculating means  
within groups. Subtraction from the time variable is straightforward:

 > df$dev <- df$time - ave(df$time, df$id, df$cond)
 > df$dev
  [1]  -1.346  -8.586  -2.366   2.714   9.584 -12.108  13.052   0.742   
-7.438
[10]   5.752   1.434  10.854   0.514 -21.166   8.364   4.128  -4.502   
-5.602
[19]  -4.322  10.298

Although the default for ave() is the mean function, other functions  
can be used with the FUN= argument.

-- 
David.

> # Load the plyr package, which contains the function ddply():
> library(plyr)
> df2 <- ddply(df, .(id, cond), transform, dev = f(time))
>
> # output
>> df2
>      id cond   time     dev
> 1  cell1    A 353.01   7.226
> 2  cell1    A 351.06   5.276
> 3  cell1    A 343.59  -2.194
> 4  cell1    A 341.50  -4.284
> 5  cell1    A 339.76  -6.024
> 6  cell1    B 351.18   0.644
> 7  cell1    B 340.53 -10.006
> 8  cell1    B 345.09  -5.446
> 9  cell1    B 347.44  -3.096
> 10 cell1    B 368.44  17.904
> 11 cell2    A 343.48  -3.776
> 12 cell2    A 352.35   5.094
> 13 cell2    A 350.78   3.524
> 14 cell2    A 340.38  -6.876
> 15 cell2    A 349.29   2.034
> 16 cell2    B 364.45  15.524
> 17 cell2    B 354.52   5.594
> 18 cell2    B 350.41   1.484
> 19 cell2    B 345.78  -3.146
> 20 cell2    B 329.47 -19.456
>
> # cell means
>> with(df, aggregate(time, list(id = id, cond = cond), mean))
>     id cond       x
> 1 cell1    A 345.784
> 2 cell2    A 347.256
> 3 cell1    B 350.536
> 4 cell2    B 348.926
>
> HTH,
> Dennis
>
> On Fri, Mar 26, 2010 at 1:31 PM, Dgnn <sharkbrainpdx at gmail.com> wrote:
>
>>
>> I have a data frame containing the results of time measurements  
>> taken from
>> several cells. Each cell was measured in conditions A and B, and  
>> there are
>> an arbitrary number of measurements in each condition. I am trying to
>> calculate the difference of each measurement from the mean of a  
>> given cell
>> in a given condition without relying on loops.
>>
>>> my.df
>>          id       cond    time
>> 1         cell1     A       343.5
>> 2         cell1     A       355.2
>> ...
>> 768      cell1     B       454.0
>> ...
>> 2106    cell2     A       433.9
>> ...
>>
>> as a first approach I tried:
>>
>>> mews<-aggregate(my.df$time, list(cond=data$id, id=data$cond), mean)
>> id      cond      time
>> cell1    A         352
>> cell1    B         446
>> cell2    A         244
>> cell2    B         ...
>>
>> I then tried to use %in% to match id and cond of mews with my.df,  
>> but I
>> haven't been able to get it to work.
>> Am I on the right track? What are some other solutions?
>>
>> Thanks for any help.
>>
>> jason
>>
>>
>> --
>> View this message in context:
>> http://n4.nabble.com/a-vectorized-solution-to-some-simple-dataframe-math-tp1692810p1692810.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT