[R] Calculation of group summaries

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 15 04:55:57 CEST 2005


There was an error in my code (after advising you to use data.frame
rather than cbind I used it myself!).  Here it is again:


nsites <- 6
yearList <- 1999:2001
fakesub <- data.frame(
	year = rep(yearList, nsites/length(yearList), each = 11),
	site_id  = rep(c('site1','site2'), each = 11*nsites),
	visit_no = rep(1, 11*2*nsites),
	transect = rep(LETTERS[1:11], nsites, each = 2),
	transdir = rep(c('LF','RT'), 11*nsites),
	undercut = abs(rnorm(11*2*nsites, 10)),
	angle    = runif(11*2*nsites, 0, 180)
)


f <- function(x) data.frame(year = x[1,1], site_id = x[1,2], visit_no = x[1,3], 
	mean = mean(x[,6]), sd = sd(x[,6]), length = length(x[,6]))
do.call("rbind", by(fakesub, fakesub[,1:3], f))



On 7/14/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> 1. Try using more spaces so your code is easier to read.
> 
> 2. Use data.frame to define your data frame (since the method
> in your post creates data frames of factors rather than
> the desired classes).
> 
> 3. Given the appropriate function, f, a single 'by' statement rbind'ed
> together, as shown, will create the result.
> 
> nsites <- 6
> yearList <- 1999:2001
> fakesub <- data.frame(
>        year = rep(yearList, nsites/length(yearList), each = 11),
>        site_id  = rep(c('site1','site2'), each = 11*nsites),
>        visit_no = rep(1, 11*2*nsites),
>        transect = rep(LETTERS[1:11], nsites, each = 2),
>        transdir = rep(c('LF','RT'), 11*nsites),
>        undercut = abs(rnorm(11*2*nsites, 10)),
>        angle    = runif(11*2*nsites, 0, 180)
> )
> 
> 
> f <- function(x) cbind(year = x[1,1], site_id = x[1,2], visit_no = x[1,3],
>        mean = mean(x[,6]), sd = sd(x[,6]), length = length(x[,6]))
> do.call("rbind", by(fakesub, fakesub[,1:3], f))
> 
> 
> 
> 
> 
> On 7/14/05, Seeliger.Curt at epamail.epa.gov <Seeliger.Curt at epamail.epa.gov> wrote:
> > Several people suggested specific functions (by, tapply, sapply and
> > others); thanks for not blowing off a simple question regarding how to
> > do the following SQL in R:
> > >   select year,
> > >          site_id,
> > >          visit_no,
> > >          mean(undercut) AS meanUndercut,
> > >          count(undercut) AS nUndercut,
> > >          std(undercut) AS stdUndercut
> > >   from channelMorphology
> > >   group by year, site_id, visit_no
> > >   ;
> >
> > I'd spent quite a bit of time with the suggested functions earlier but
> > had no luck as I'd misread the docs and put the entire dataframe where
> > it only wants the columns to be processed.  Sometimes it's the simplest
> > of things.
> >
> > This has lead to another confoundment-- sd() acts differently than
> > mean() for some reason, at least with R 1.9.0.  For some reason, means
> > generate NA results and a warning message for each group:
> >
> >  argument is not numeric or logical: returning NA in:
> > mean.default(data[x, ], ...)
> >
> > Of course, the argument is numeric, or there'd be no sd value.  Or more
> > likely, I'm still missing something really basic. If I wrap the value in
> > as.numeric() things work fine.  Why should I have to do this for mean
> > and median, but not sd? The code below should reproduce this error
> >
> >  # Fake data for demo:
> >  nsites<-6
> >  yearList<-1999:2001
> >  fakesub<-as.data.frame(cbind(
> >                 year     =rep(yearList,nsites/length(yearList),each=11)
> >                ,site_id  =rep(c('site1','site2'),each=11*nsites)
> >                ,visit_no =rep(1,11*2*nsites)
> >                ,transect =rep(LETTERS[1:11],nsites,each=2)
> >                ,transdir =rep(c('LF','RT'),11*nsites)
> >                ,undercut =abs(rnorm(11*2*nsites,10))
> >                ,angle    =runif(11*2*nsites,0,180)
> >                ))
> >
> >  # Create group summaries:
> >  sdmets<-by(fakesub$undercut
> >            ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
> >            ,sd
> >            )
> >  nmets<-by(fakesub$undercut
> >           ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
> >           ,length
> >           )
> >  xmets<-by(fakesub$undercut
> >           ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
> >           ,mean
> >           )
> >   xmets<-by(as.numeric(fakesub$undercut)
> >           ,list(fakesub$year,fakesub$site_id,fakesub$visit_no)
> >           ,mean
> >           )
> >
> >  # Put site id values (year, site_id and visit_no) into results:
> >  # List unique id combinations as a list of lists.  Then
> >  # reorganize that into 3 vectors for final results.
> >  # Certainly, there MUST be a better way...
> >  foo<-strsplit(unique(paste(fakesub$year
> >                            ,fakesub$site_id
> >                            ,fakesub$visit_no
> >                            ,sep='#'))
> >               ,split='#'
> >               )
> >  year<-list()
> >  for(i in 1:length(foo)) {year<-rbind(year,foo[[i]][1])}
> >  site_id<-list()
> >  for(i in 1:length(foo)) {site_id<-rbind(site_id,foo[[i]][2])}
> >  visit_no<-list()
> >  for(i in 1:length(foo)) {visit_no<-rbind(visit_no,foo[[i]][3])}
> >
> >  # Final result, more or less
> >  data.frame(cbind(a=year,b=site_id,c=visit_no,sdmets,nmets,xmets))
> >
> >
> > cur
> >
> > --
> > Curt Seeliger, Data Ranger
> > CSC, EPA/WED contractor
> > 541/754-4638
> > seeliger.curt at epa.gov
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>




More information about the R-help mailing list