[R] tapply and names
Göran Broström
gb at tal.stat.umu.se
Tue Jan 25 19:43:16 CET 2005
On Tue, Jan 25, 2005 at 10:43:24AM -0500, Liaw, Andy wrote:
> > From: Göran Broström
> >
> > I have a data frame containing children, with variables 'year' = birth
> > year, and 'm.id' = mother's id number. Let's assume that all
> > the births of
> > each mother is represented in the data frame.
> >
> > Now I want to create a subset of this data frame containing
> > all children,
> > whose mother's first birth was in the year 1816 or later.
> > This seems to
> > work:
> >
> > mid <- tapply(dat$year, dat$m.id, min)
> > mid <- as.numeric(names(mid)[mid >= 1816])
> > dat <- dat[dat$m.id %in% mid, ]
> >
> > but I'm worried about the second line, because the output
> > from 'tapply'
> > isn't documented to have a 'dimnames' attribute (although it
> > has one, at
> > least in R-2.1.0, 2005-01-19). Another aspect is that this
> > code relies on
> > m.id being numeric; I would have to change it if the type of
> > m.id changes
> > to, eg, character.
> >
> > So, question: Is there a better way of doing this?
>
> Would this work?
>
> dat <- dat[ave(dat$year, dat$m.id, min) >= 1816, ]
Yes, but you (or I) need
> dat <- dat[ave(dat$year, dat$m.id, FUN = min) >= 1816, ]
^^^^^
(took me some time to figure out), because
?ave
Usage:
ave(x, ..., FUN = mean)
Thanks Andy for giving me 'ave'! And thanks to Dimitris for his suggestion.
Göran
More information about the R-help
mailing list