# [R] Centering data frame by factor

William Dunlap wdunlap at tibco.com
Tue Jul 19 17:58:54 CEST 2011

```> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Malter
> Sent: Tuesday, July 19, 2011 1:51 AM
> To: r-help at r-project.org
> Subject: Re: [R] Centering data frame by factor
>
>
> P1-tapply(P1,Experiment,mean)[Experiment]

Note that the above solution works in this example
because Experiment takes the values 1 and 2.  If
Experiment were coded as, say, 101 and 102 the above
would not work.  This is a case where converting
Experiment to a factor would avoid problems.  E.g.,
> RAW <- data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
> RAW\$E <- RAW\$Experiment + 100 # relabeled Experiment
> with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
2  2  2  1  1  1
-2  0  2  1 -1  0
<NA> <NA> <NA> <NA> <NA> <NA>
NA   NA   NA   NA   NA   NA
> RAW\$E <- factor(RAW\$E) # convert to factor
> with(RAW, P1-tapply(P1,E,mean)[E]) # good
102 102 102 101 101 101
-2   0   2   1  -1   0

Another way to approach the problem is to think of
your normalized data as the residuals from a linear model:
> residuals(lm(data=RAW, cbind(P1,P2) ~ E))
P1            P2
1 -2.000000e+00 -4.000000e+00
2  4.385598e-17  8.771196e-17
3  2.000000e+00  4.000000e+00
4  1.000000e+00 -1.000000e+00
5 -1.000000e+00  8.771196e-17
6  4.385598e-17  1.000000e+00
> zapsmall(.Last.value) # make reading easier
P1 P2
1 -2 -4
2  0  0
3  2  4
4  1 -1
5 -1  0
6  0  1
That approach can make generizations to more factors
or to smoothing approaches easier.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>
> HTH,
> Daniel
>
>
> ronny wrote:
> >
> > Hi,
> >
> > I would like to center P1 and P2 of the following data frame by the factor
> > "Experiment", i.e. substruct from each value the average of its
> > experiment, and keep the original data structure, i.e. the experiment and
> > the group of each value.
> >
> > RAW=
> >
> data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
> c(8,12,16,2,3,4))
> >
> > Desired result:
> >
> > NORMALIZED=
> > data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
> 1,0),"P2"=c(-4,0,4,-1,0,1))
> >
> > I tried using "by", but then I lose the original order, and the "Group"
> > varaible. Can you help?
> >
> >> RAW
> >   Experiment Group P1 P2
> >          2     A 10  8
> >          2     A 12 12
> >          2     B 14 16
> >          1     A  5  2
> >          1     A  3  3
> >          1     B  4  4
> >
> > NOT.OK<- within (RAW,
> > {P1<-do.call(rbind,by(RAW\$P1,RAW\$Experiment,scale,scale=F))})
> >
> >> NOT.OK
> >   Experiment Group P1 P2
> >           2     A  1  8
> >           2     A -1 12
> >           2     B  0 16
> >           1     A -2  2
> >           1     A  0  3
> >           1     B  2  4
> >
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
> tp3677609p3677620.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help