[R] Centering data frame by factor

William Dunlap wdunlap at tibco.com
Tue Jul 19 17:58:54 CEST 2011


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Malter
> Sent: Tuesday, July 19, 2011 1:51 AM
> To: r-help at r-project.org
> Subject: Re: [R] Centering data frame by factor
> 
> 
> P1-tapply(P1,Experiment,mean)[Experiment]

Note that the above solution works in this example
because Experiment takes the values 1 and 2.  If
Experiment were coded as, say, 101 and 102 the above
would not work.  This is a case where converting
Experiment to a factor would avoid problems.  E.g.,
  > RAW <- data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
  > RAW$E <- RAW$Experiment + 100 # relabeled Experiment
  > with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
   2  2  2  1  1  1 
  -2  0  2  1 -1  0 
  > with(RAW, P1-tapply(P1,E,mean)[E]) # bad
  <NA> <NA> <NA> <NA> <NA> <NA> 
    NA   NA   NA   NA   NA   NA 
  > RAW$E <- factor(RAW$E) # convert to factor
  > with(RAW, P1-tapply(P1,E,mean)[E]) # good
  102 102 102 101 101 101 
   -2   0   2   1  -1   0

Another way to approach the problem is to think of
your normalized data as the residuals from a linear model:
  > residuals(lm(data=RAW, cbind(P1,P2) ~ E))
               P1            P2
  1 -2.000000e+00 -4.000000e+00
  2  4.385598e-17  8.771196e-17
  3  2.000000e+00  4.000000e+00
  4  1.000000e+00 -1.000000e+00
  5 -1.000000e+00  8.771196e-17
  6  4.385598e-17  1.000000e+00
  > zapsmall(.Last.value) # make reading easier 
    P1 P2
  1 -2 -4
  2  0  0
  3  2  4
  4  1 -1
  5 -1  0
  6  0  1
That approach can make generizations to more factors
or to smoothing approaches easier.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> 
> HTH,
> Daniel
> 
> 
> ronny wrote:
> >
> > Hi,
> >
> > I would like to center P1 and P2 of the following data frame by the factor
> > "Experiment", i.e. substruct from each value the average of its
> > experiment, and keep the original data structure, i.e. the experiment and
> > the group of each value.
> >
> > RAW=
> >
> data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
> c(8,12,16,2,3,4))
> >
> > Desired result:
> >
> > NORMALIZED=
> > data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
> 1,0),"P2"=c(-4,0,4,-1,0,1))
> >
> > I tried using "by", but then I lose the original order, and the "Group"
> > varaible. Can you help?
> >
> >> RAW
> >   Experiment Group P1 P2
> >          2     A 10  8
> >          2     A 12 12
> >          2     B 14 16
> >          1     A  5  2
> >          1     A  3  3
> >          1     B  4  4
> >
> > NOT.OK<- within (RAW,
> > {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})
> >
> >> NOT.OK
> >   Experiment Group P1 P2
> >           2     A  1  8
> >           2     A -1 12
> >           2     B  0 16
> >           1     A -2  2
> >           1     A  0  3
> >           1     B  2  4
> >
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
> tp3677609p3677620.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list