[R] The end of Matlab
hadley wickham
h.wickham at gmail.com
Fri Dec 12 17:38:13 CET 2008
On Fri, Dec 12, 2008 at 8:41 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
> On 12/12/2008 8:25 AM, hadley wickham wrote:
>>>
>>> From which you might conclude that I don't like the design of subset, and
>>> you'd be right. However, I don't think this is a counterexample to my
>>> general rule. In the subset function, the select argument is treated as
>>> an
>>> unevaluated expression, and then there are rules about what to do with
>>> it.
>>> (I.e. try to look up name `a` in the data frame, if that fails, ...)
>>>
>>> For the requested behaviour to similarly fall within the general rule,
>>> we'd
>>> have to treat all indices to all kinds of things (vectors, matrices,
>>> dataframes, etc.) as unevaluated expressions, with special handling for
>>> the
>>> particular symbol `end`.
>>
>> Except you wouldn't have to necessarily change indexing - you could
>> change seq instead. Then 5:end could produce some kind of special
>> data structure (maybe an iterator) that was recognised by the various
>> indexing functions.
>
> Ummm, doesn't that require changes to *both* indexing and seq?
Ooops, yes. I meant it wouldn't require indexing to use unevaluated
expression.
>> This would still be a lot of work for not a lot
>> of payoff, but it would be a logically consistent way of adding this
>> behaviour to indexing, and the basic work would make it possible to
>> develop other sorts of indexing, eg df[evens(), ], or df[last(5),
>> last(3)].
>
> I agree: it would be a nice addition, but a fair bit of work. I think it
> would be quite doable for the indexable things in the base packages, but
> there are a lot of contributed packages that define [ methods, and those
> methods would all need to be modified too.
That's true, although I suspect many contributed [.methods eventually
delegate to base methods and might work without further modification.
> (Just to be clear, when I say doable, I'm thinking that your iterators
> return functions that compute subsets of index ranges. For example, evens()
> might be implemented as
>
> evens <- function() {
> result <- function(indices) {
> indices[indices %% 2 == 0]
> }
> class(result) <- "iterator"
> return(result)
> }
>
> and then `[` in v[evens()] would recognize that it had been passed an
> iterator, and would pass 1:length(v) to the iterator to get the subset of
> even indices. Is that what you had in mind?)
Yes, that's exactly what I was thinking, although you'd have to put
some thought into the conventions - would it be better to pass in the
length of the vector instead of a vector of indices? Should all
iterators return logical vectors? That way you could do x[evens() &
last(5)] to get the even indices out of the last 5, as opposed to
x[evens()][last(5)] which would return the last 5 even indices.
You could also imagine similar iterators for random sampling, like
samp(0.2) to choose 20% of the indices, or boot(0.8) to choose 80%
with replacement. first(n) could also be useful, selecting the first
min(n, length(vector)) observations. An iterator version of rev()
would also be handy.
Maybe selector would be a better name than iterator though, as these
don't have the same feel as iterators in other languages.
Hadley
--
http://had.co.nz/
More information about the R-help
mailing list