[R] Centering data frame by factor
William Dunlap
wdunlap at tibco.com
Tue Jul 19 17:58:54 CEST 2011
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Daniel Malter
> Sent: Tuesday, July 19, 2011 1:51 AM
> To: r-help at r-project.org
> Subject: Re: [R] Centering data frame by factor
>
>
> P1-tapply(P1,Experiment,mean)[Experiment]
Note that the above solution works in this example
because Experiment takes the values 1 and 2. If
Experiment were coded as, say, 101 and 102 the above
would not work. This is a case where converting
Experiment to a factor would avoid problems. E.g.,
> RAW <- data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-1,0),"P2"=c(-4,0,4,-1,0,1))
> RAW$E <- RAW$Experiment + 100 # relabeled Experiment
> with(RAW, P1-tapply(P1,Experiment,mean)[Experiment]) # good
2 2 2 1 1 1
-2 0 2 1 -1 0
> with(RAW, P1-tapply(P1,E,mean)[E]) # bad
<NA> <NA> <NA> <NA> <NA> <NA>
NA NA NA NA NA NA
> RAW$E <- factor(RAW$E) # convert to factor
> with(RAW, P1-tapply(P1,E,mean)[E]) # good
102 102 102 101 101 101
-2 0 2 1 -1 0
Another way to approach the problem is to think of
your normalized data as the residuals from a linear model:
> residuals(lm(data=RAW, cbind(P1,P2) ~ E))
P1 P2
1 -2.000000e+00 -4.000000e+00
2 4.385598e-17 8.771196e-17
3 2.000000e+00 4.000000e+00
4 1.000000e+00 -1.000000e+00
5 -1.000000e+00 8.771196e-17
6 4.385598e-17 1.000000e+00
> zapsmall(.Last.value) # make reading easier
P1 P2
1 -2 -4
2 0 0
3 2 4
4 1 -1
5 -1 0
6 0 1
That approach can make generizations to more factors
or to smoothing approaches easier.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> HTH,
> Daniel
>
>
> ronny wrote:
> >
> > Hi,
> >
> > I would like to center P1 and P2 of the following data frame by the factor
> > "Experiment", i.e. substruct from each value the average of its
> > experiment, and keep the original data structure, i.e. the experiment and
> > the group of each value.
> >
> > RAW=
> >
> data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("A","A","B","A","A","B"),"P1"=c(10,12,14,5,3,4),"P2"=
> c(8,12,16,2,3,4))
> >
> > Desired result:
> >
> > NORMALIZED=
> > data.frame("Experiment"=c(2,2,2,1,1,1),"Group"=c("B","A","B","B","A","B"),"P1"=c(-2,0,2,1,-
> 1,0),"P2"=c(-4,0,4,-1,0,1))
> >
> > I tried using "by", but then I lose the original order, and the "Group"
> > varaible. Can you help?
> >
> >> RAW
> > Experiment Group P1 P2
> > 2 A 10 8
> > 2 A 12 12
> > 2 B 14 16
> > 1 A 5 2
> > 1 A 3 3
> > 1 B 4 4
> >
> > NOT.OK<- within (RAW,
> > {P1<-do.call(rbind,by(RAW$P1,RAW$Experiment,scale,scale=F))})
> >
> >> NOT.OK
> > Experiment Group P1 P2
> > 2 A 1 8
> > 2 A -1 12
> > 2 B 0 16
> > 1 A -2 2
> > 1 A 0 3
> > 1 B 2 4
> >
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Centering-data-frame-by-factor-
> tp3677609p3677620.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list