[R] Calculation of group summaries

Seeliger.Curt@epamail.epa.gov Seeliger.Curt at epamail.epa.gov
Tue Jul 12 19:51:03 CEST 2005


I know R has a steep learning curve, but from where I stand the slope
looks like a sheer cliff.  I'm pawing through the available docs and
have come across examples which come close to what I want but are
proving difficult for me to modify for my use.

Calculating simple group means is fairly straight forward:
  data(PlantGrowth)
  attach(PlantGrowth)
  stack(mean(unstack(PlantGrowth)))

I'd like to do something slightly more complex, using a data frame and
groups identified by unique combinations of three id variables.  There
may be thousands of such combinations in the data.  This is easy in SQL:

  select year,
         site_id,
         visit_no,
         mean(undercut) AS meanUndercut,
         count(undercut) AS nUndercut,
         std(undercut) AS stdUndercut
  from channelMorphology
  group by year, site_id, visit_no
      ;

Reading a CSV written by SAS and selecting only records expected to have
values is also straight forward in R, but getting those summary values
for each site visit is currently beyond me:

  sub<-read.csv('c:/data/channelMorphology.csv'
               ,header=TRUE
               ,na.strings='.'
               ,sep=','
               ,strip.white=TRUE
               )

  undercut<-subset(sub,
                  ,TRANSDIR %in% c('LF','RT')

,select=c('YEAR','SITE_ID','VISIT_NO','TRANSECT','TRANSDIR'
                           ,'UNDERCUT'
                           )
                  ,drop=TRUE
                  )


Thanks all for your help.
cur
--
Curt Seeliger, Data Ranger
CSC, EPA/WED contractor
541/754-4638
seeliger.curt at epa.gov




More information about the R-help mailing list