[R] sapply following using by with a list of factors

Gabor Grothendieck ggrothendieck at gmail.com
Mon May 30 05:58:51 CEST 2005


On 5/29/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> On 5/29/05, McClatchie, Sam (PIRSA-SARDI)
> <mcclatchie.sam at saugov.sa.gov.au> wrote:
> > Background:
> > OS: Linux Mandrake 10.1
> > release: R 2.0.0
> > editor: GNU Emacs 21.3.2
> > front-end: ESS 5.2.3
> > ---------------------------------
> > Colleagues
> >
> > I am having some trouble extracting results from the function by, used to
> > average variables in a data.frame first by one factor (depth) and then by a
> > second factor (station). The real data.frame is quite large
> > > dim(data.2001)
> > [1] 32049  11
> >
> > Here is a snippet of code:
> >
> > ## bin density data for each station into 1 m depth bins, containing means
> >    data.2001.test$integer.Depth <- as.factor(round(data.2001.test$Depth,
> > digits=0))
> >    attach(data.2001.test)
> >    binned.data.2001 <- by(data.2001.test[,5:11], list(depth=integer.Depth,
> > station=Station), mean)
> >
> > and here is a snippet of the data.frame
> >
> > > dim(data.2001.test)
> > [1] 150  11
> > > dump("data.2001.test", file=stdout())
> > data.2001.test <-
> > structure(list(Cruise = structure(as.integer(c(1, 1, 1, 1, 1,
> 
> 
> Try the following.  To keep this short lets just take a subset
> of rows called dd.  Also, we drop the Station levels
> that are not being used since this test only uses 2 levels
> and there are 288 Station levels in total.  The function that we apply using
> by returns a vector consisting of the integer.Depth, Station
> and the column means of columns 5 to 10.  (Asking for just the
> mean of those, as in your example, would take all the numbers
> in all the columns passed to mean and give back a grand mean
>  rather than a mean per column.)   Finally we rbind it all back together.
> 
> > # data.2001.test is your data frame including the integer.Depth column
> > dd <- data.2001.test[50:60,]
> > dd$Station <- dd$Station[drop = TRUE]
> > dd.bin <- by(dd, list(dd$integer.Depth, dd$Station), function(x)
> + c(integer.Depth = x$integer.Depth[1], Station = x$Station[1],
> + colMeans(x[,5:10])))
> > do.call("rbind", dd.bin)
>     integer.Depth Station    Depth Temperature.oC Salinity Fluoresence.Volts
> [1,]            20       1 23.90167       17.67420 35.47650          1.107433
> [2,]            21       1 24.75350       17.33355 35.59050          1.060400
> [3,]             1       2  5.19000       19.61510 35.54870          0.726500
> [4,]             2       2  5.82950       19.61305 35.55025          0.719200
> [5,]             3       2  6.81250       19.61300 35.58345          0.741150
> [6,]             4       2  7.55000       19.61180 35.60460          0.754600
>     Density.kg.m3 Brunt.Vaisala.Freq.cycl.h
> [1,]      25.82400                 -5.095467
> [2,]      25.99820                 16.030975
> [3,]      25.30560                 -6.261240
> [4,]      25.31015                  4.051561
> [5,]      25.33985                  8.893225
> [6,]      25.35960                 -8.167610
> 

Here is a correction for the fact that the first two columns are 
factors.  This time, instead of creating a vector in the function we create a
one row data frame.

> # previous lines as above
> dd.bin <- by(dd, list(dd$integer.Depth, dd$Station), function(x)
+   cbind(data.frame(integer.Depth = x$integer.Depth[1], 
+   Station = x$Station[1]), t(colMeans(x[,5:10]))))
> do.call("rbind", dd.bin)
   integer.Depth Station    Depth Temperature.oC Salinity Fluoresence.Volts
1             24      a2 23.90167       17.67420 35.47650          1.107433
11            25      a2 24.75350       17.33355 35.59050          1.060400
12             5      a3  5.19000       19.61510 35.54870          0.726500
13             6      a3  5.82950       19.61305 35.55025          0.719200
14             7      a3  6.81250       19.61300 35.58345          0.741150
15             8      a3  7.55000       19.61180 35.60460          0.754600
   Density.kg.m3 Brunt.Vaisala.Freq.cycl.h
1       25.82400                 -5.095467
11      25.99820                 16.030975
12      25.30560                 -6.261240
13      25.31015                  4.051561
14      25.33985                  8.893225
15      25.35960                 -8.167610




More information about the R-help mailing list