[R] subset using noncontiguous variables by name (not index)
Thomas Lumley
tlumley at u.washington.edu
Mon Aug 27 16:24:30 CEST 2007
On Mon, 27 Aug 2007, Muenchen, Robert A (Bob) wrote:
> Gabor, That works great!
>
> I think this would be a very helpful addition to the main R
> distribution. Perhaps with a single colon representing numerical order
> (exactly as you have written it) and two colons representing the order
> of the variables as they appear in the data frame (your first example).
> That's analogous to SAS' x1-xN, which you know gets those N variables,
> and a--z, which selects an unknown number of variables a through z. How
> many that is depends upon their order in the data frame. That would not
> only be very useful in general, but it would also make transitioning to
> R from SAS or SPSS less confusing.
>
> Is R still being extended in such basic ways, or does that muck up
> existing programs too much?
>
In principle base R can be extended like that, but a strong case is needed
for non-standard evaluation rules and for depleting the restricted supply
of short binary operator names.
The reason for subset() and its behaviour is that 'variables as they
appear the in data frame' is typically ambiguous -- which data frame? In
SPSS you have only one and in SAS there is a default one, so there is no
ambiguity in X1--Y2, but in R it needs another argument specifying the
data frame, so it can't really be a binary operator.
The double colon :: and triple colon ::: are already used for namespaces,
and a search of r-help reveals two previous, different, suggestions for
%:%.
-thomas
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list