[Rd] ranges and contiguity checking

James Bullard bullard at stat.Berkeley.EDU
Wed May 12 22:27:40 CEST 2010


>> -----Original Message-----
>> From: r-devel-bounces at r-project.org
>> [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
>> Sent: Wednesday, May 12, 2010 11:35 AM
>> To: bullard at stat.berkeley.edu
>> Cc: r-devel at stat.math.ethz.ch
>> Subject: Re: [Rd] ranges and contiguity checking
>>
>> On 12/05/2010 2:18 PM, James Bullard wrote:
>> > Hi All,
>> >
>> > I am interfacing to some C libraries (hdf5) and I have
>> methods defined for
>> > '[', these methods do hyperslab selection, however, currently I am
>> > limiting slab selection to contiguous blocks, i.e., things
>> defined like:
>> > i:(i+k). I don't do any contiguity checking at this point,
>> I just grab the
>> > max and min of the range and them potentially do an
>> in-memory subselection
>> > which is what I am definitely trying to avoid. Besides
>> using deparse, I
>> > can't see anyway to figure out that these things (i:(i+k)
>> and c(i, i+1,
>> > ..., i+k)) are different.
>> >
>> > I have always liked how 1:10 was a valid expression in R
>> (as opposed to
>> > python where it is not by itself.), however I'd somehow
>> like to know that
>> > the thing was contiguous range without examining the un-evaluated
>> > expression or worse, all(diff(i:(i+k)) == 1)
>
> You could define a sequence class, say 'hfcSeq'
> and insist that the indices given to [.hfc are
> hfcSeq objects.  E.g., instead of
>     hcf[i:(i+k)]
> the user would use
>     hcf[hfcSeq(i,i+k)]
> or
>     index <- hcfSeq(i,i+k)
>     hcf[index]
> max, min, and range methods for hcfSeq
> would just inspect one or both of its
> elements.

I could do this, but I wanted it to not matter to the user whether or not
they were dealing with a HDF5Dataset or a plain-old matrix.

It seems like I cannot define methods on: ':'. If I could do that then I
could implement an immutable 'range' class which would be good, but then
I'd have to also implement: '['(matrix, range) -- which would be easy, but
still more work than I wanted to do.

I guess I was thinking that there is some inherent value in an immutable
native range type which is constant in time and memory for construction.
Then I could define methods on '['(matrix, range) and '['(matrix,
integer). I'm pretty confident this is more less what is happening in the
IRanges package in Bioconductor, but (maybe for the lack of support for
setting methods on ':') it is happening in a way that makes things very
non-transparent to a user. As it stands, I can optimize for performance by
using a IRange-type wrapper or I can optimize for code-clarity by killing
performance.

thanks again, jim





>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>>
>> You can implement all(diff(x) == 1) more efficiently in C,
>> but I don't
>> see how you could hope to do any better than that without
>> putting very
>> un-R-like restrictions on your code.  Do you really want to say that
>>
>> A[i:(i+k)]
>>
>> is legal, but
>>
>> x <- i:(i+k)
>> A[x]
>>
>> is not?  That will be very confusing for your users.  The problem is
>> that objects don't remember where they came from, only arguments to
>> functions do, and functions that make use of this fact mainly
>> do it for
>> decorating the output (nice labels in plots) or making error messages
>> more intelligible.
>>
>> Duncan Murdoch
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>



More information about the R-devel mailing list