[R] subsetting like in SAS

Denis Chabot chabotd at globetrotter.net
Thu Jan 13 11:52:04 CET 2005


Being in the process of translating some of my SAS programs to R, I 
encountered one difficulty. I have a solution, but it is not elegant 
(and not pleasant to implement).

I have a large dataset with many variables needed to identify the 
origin of a sample, many to describe sample characteristics, others to 
describe site characteristics.

I want only a (shorter) list of sites and their characteristics.

If "origin", "ship_cat", "ship_nb", "trip" and "set" are needed to 
identify a site, in SAS you'd sort on those variables, then read the 
data with:

data sites;
	set alldata;
	by origin ship_cat ship_nb trip set;
	if first.set;
	keep list-of-variables-detailing-sites;

In R I did this with the Lag function of Hmisc, and the original data 
set also needs to be sorted first:

oL <- Lag(origin)
scL <- Lag(ship_cat)
snL <- Lag(ship_nb)
tL <- Lag(trip)
sL <- Lag(set)
same <- origin==oL & ship_cat==scL & ship_nb==snL & trip==tL & set==sL
sites <- subset(alldata, !same, 

Could I do better than this?

Thanks in advance,

Denis Chabot

More information about the R-help mailing list