[R] converting stata's by syntax to R
Thomas Lumley
tlumley at u.washington.edu
Mon Aug 1 18:43:03 CEST 2005
On Mon, 1 Aug 2005, Chris Wallace wrote:
> I am struggling with migrating some stata code to R. I have a data
> frame containing, sometimes, repeat observations (rows) of the same
> family. I want to keep only one observation per family, selecting
> that observation according to some other variable. An example data
> frame is:
>
> # construct example data
> fam <- c(1,2,3,3,4,4,4)
> wt <- c(1,1,0.6,0.4,0.4,0.4,0.2)
> keep <- c(1,1,1,0,1,0,0)
> dat <- as.data.frame(cbind(fam,wt,keep))
> dat
>
> I want to keep the observation for which wt is a maximum, and where
> this doesn't identify a unique observation, to keep just one anyway,
> not caring which. Those observations are indicated above by keep==1.
> (Note, keep <- c(1,1,1,0,0,1,0) would be fine too, but not
> c(1,1,1,0,0,0,1)).
>
> The stata code I would use is
> bys fam (wt): keep if _n==_N
A reasonably direct translation of the Stata code is
index <- order(fam, -wt)
keep <- !duplicated(fam[index])
dat <- data.frame(fam=fam[index], wt=wt[index], keep=keep)
which sorts wt into decreasing order within family, then keeps the first
observation in each family.
This is less general than solutions other people have given, but I'd
expect it to be faster for large data sets. 'keep' ends up TRUE/FALSE
rather than 1/0; if this is a problem use as.numeric() on it.
-thomas
More information about the R-help
mailing list