[R] dropping rows
Richard A. O'Keefe
ok at cs.otago.ac.nz
Fri Dec 3 00:05:45 CET 2004
Douglas Bates <bates at stat.wisc.edu> wrote:
In R this is called subsetting and the simplest way to do this
is with the subset function.
older <- subset(master, year < 1960)
I'm not sure that it's the "simplest".
Since rows for year < 1960 were to be dropped,
I'd say the _simplest_ way to do it is one which exploits
a primitive feature of R:
master[master$year >= 1960,]
For me, the fact that the 'subset' argument of subset() is evaluated
in the scope of the data frame makes subset() quite a complicated way
to do things. It's certainly something I'd hesitate to use inside a
function which might be given a data frame without knowing _exactly_
which column names were going to be in scope for the 2nd argument.
The fact that the 'subset' argument is *not* evaluated in the scope
of the 1st argument in other cases also makes subset() a somewhat
confusing function, compared with simple logical indexing.
Strengths of subset() include
- you can select which columns you want, either instead of choosing
a subset or at the same time (but you can do this with indexing too)
- the drop= argument of indexing defaults to FALSE instead of TRUE
(but this is not a problem for indexing data frames, where
master[master$year == 1960,] will give you a data frame even if
there is exactly one row with year 1960)
I would suggest that people who aren't yet thoroughly familiar with
what a simple "[" can do should add subset() to the list of things to
learn about _after_ they've done learning about "[". On second thoughts,
maybe looking at the implementation of subset.default and subset.data.frame
would be helpful in learning about "[".
More information about the R-help
mailing list