[R] subsetting like in SAS
Denis Chabot
chabotd at globetrotter.net
Thu Jan 13 11:52:04 CET 2005
Hi,
Being in the process of translating some of my SAS programs to R, I
encountered one difficulty. I have a solution, but it is not elegant
(and not pleasant to implement).
I have a large dataset with many variables needed to identify the
origin of a sample, many to describe sample characteristics, others to
describe site characteristics.
I want only a (shorter) list of sites and their characteristics.
If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to
identify a site, in SAS you'd sort on those variables, then read the
data with:
data sites;
set alldata;
by origin ship_cat ship_nb trip set;
if first.set;
keep list-of-variables-detailing-sites;
run;
In R I did this with the Lag function of Hmisc, and the original data
set also needs to be sorted first:
oL <- Lag(origin)
scL <- Lag(ship_cat)
snL <- Lag(ship_nb)
tL <- Lag(trip)
sL <- Lag(set)
same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
sites <- subset(alldata, !same,
select=c(list-of-variables-detailing-sites)
Could I do better than this?
Thanks in advance,
Denis Chabot
More information about the R-help
mailing list