[R] Centering data frame by factor

David Winsemius dwinsemius at comcast.net
Tue Jul 19 18:36:33 CEST 2011


On Jul 19, 2011, at 11:58 AM, William Dunlap wrote:

>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org 
>> ] On Behalf Of Daniel Malter
>> Sent: Tuesday, July 19, 2011 1:51 AM
>> To: r-help at r-project.org
>> Subject: Re: [R] Centering data frame by factor
>>
>>
>> P1-tapply(P1,Experiment,mean)[Experiment]
>
> Note that the above solution works in this example
> because Experiment takes the values 1 and 2.  If
> Experiment were coded as, say, 101 and 102 the above
> would not work.  This is a case where converting
> Experiment to a factor would avoid problems.

I checked to see if my ave solution was subject to the same caveats  
and it is not. The help page is less categorical about what the  
grouping variables' structure should be, saying only that they are  
"typically factors".

>  E.g.,
>> RAW <-  
>> data 
>> .frame 
>> ("Experiment 
>> "= 
>> c 
>> (2,2,2,1,1,1 
>> ),"Group 
>> "= 
>> c 
>> ("B 
>> ","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
>> RAW$E <- RAW$Experiment + 100 # relabeled Experiment
>> with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
>   2  2  2  1  1  1
>  -2  0  2  1 -1  0
>> with(RAW, P1-tapply(P1,E,mean)[E]) # bad
>  <NA> <NA> <NA> <NA> <NA> <NA>
>    NA   NA   NA   NA   NA   NA

with(RAW, ave(P1, E, FUN=function(x) scale(x,  scale=FALSE) ) )
# [1] -2  0  2  1 -1  0   good


>> RAW$E <- factor(RAW$E) # convert to factor
>> with(RAW, P1-tapply(P1,E,mean)[E]) # good
>  102 102 102 101 101 101
>   -2   0   2   1  -1   0

And take note that Bill made his variable a factor outside the tapply  
environment. If he had just used it in the tapply function (as I often  
do ...possibly unwisely in light of this gotcha)  it would fail:

 > with(RAW, P1-tapply(P1, factor(E), mean)[E])
<NA> <NA> <NA> <NA> <NA> <NA>
   NA   NA   NA   NA   NA   NA

... that is unless you also use factor(E) as the index:

 > with(RAW, P1-tapply(P1, factor(E), mean)[factor(E)])
102 102 102 101 101 101
  -2   0   2   1  -1   0

Thanks. Bill. I've learned a lot of R from you.

-- 
David.

>
> Another way to approach the problem is to think of
> your normalized data as the residuals from a linear model:
>> residuals(lm(data=RAW, cbind(P1,P2) ~ E))
>               P1            P2
>  1 -2.000000e+00 -4.000000e+00
>  2  4.385598e-17  8.771196e-17
>  3  2.000000e+00  4.000000e+00
>  4  1.000000e+00 -1.000000e+00
>  5 -1.000000e+00  8.771196e-17
>  6  4.385598e-17  1.000000e+00
>> zapsmall(.Last.value) # make reading easier
>    P1 P2
>  1 -2 -4
>  2  0  0
>  3  2  4
>  4  1 -1
>  5 -1  0
>  6  0  1
> That approach can make generizations to more factors
> or to smoothing approaches easier.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> HTH,
>> Daniel
>>
>>
>> ronny wrote:
>>>
>>> Hi,
>>>
>>> I would like to center P1 and P2 of the following data frame by  
>>> the factor
>>> "Experiment", i.e. substruct from each value the average of its
>>> experiment, and keep the original data structure, i.e. the  
>>> experiment and
>>> the group of each value.
>>>
>>> RAW=
>>>
>> data 
>> .frame 
>> ("Experiment 
>> "= 
>> c 
>> (2,2,2,1,1,1 
>> ),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
>> c(8,12,16,2,3,4))
>>>
>>> Desired result:
>>>
>>> NORMALIZED=
>>> data 
>>> .frame 
>>> ("Experiment 
>>> "= 
>>> c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
>> 1,0),"P2"=c(-4,0,4,-1,0,1))
>>>
>>> I tried using "by", but then I lose the original order, and the  
>>> "Group"
>>> varaible. Can you help?
>>>
>>>> RAW
>>>  Experiment Group P1 P2
>>>         2     A 10  8
>>>         2     A 12 12
>>>         2     B 14 16
>>>         1     A  5  2
>>>         1     A  3  3
>>>         1     B  4  4
>>>
>>> NOT.OK<- within (RAW,
>>> {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})
>>>
>>>> NOT.OK
>>>  Experiment Group P1 P2
>>>          2     A  1  8
>>>          2     A -1 12
>>>          2     B  0 16
>>>          1     A -2  2
>>>          1     A  0  3
>>>          1     B  2  4
>>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
>> tp3677609p3677620.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list