[Bioc-devel] split on IRanges semantic change 3.3->3.4

Michael Lawrence lawrence.michael at gene.com
Mon Dec 5 22:18:54 CET 2016


Hi Marcin,

I'm not sure I agree with that Stack Overflow answer. Sure,
ceiling(1:100/10) is a neat arithmetical trick to get what you want, but it
would be a lot more obvious to me at least to use rep(1:10, each=10). And
even better to use Partitioning objects when you have IRanges around. If
you really wanted to use tricky math you could at least do (1:100-1L)%/%10L
or just coerce the result of ceiling() to integer.

The idea is to communicate the intent (the semantics) of the code to the
reader, and, secondarily, to the program. Not only is your code easier to
read using rep(), IRanges does smarter things when it gets an integer vs. a
real vector. It can be even more efficient if you give it a Partitioning or
(to a lesser extent) an Rle with rep(Rle(1:10), each=10).

Also, btw, there is a low-level function in IRanges called breakInChunks()
that generates partitionings, so you could do:
relist(ir, breakInChunks(100, 10))
but PartitioningByEnd(seq(10, 100, 10)) might be more obvious.

Related to this is tile:
tile(IRanges(1, 100), 10)
but that generates an IRanges (inside of a list), not specifically a
Partitioning.

Hope that helps,
Michael


On Mon, Dec 5, 2016 at 11:42 AM, Marcin Cieślik <marcin.cieslik at gmail.com>
wrote:

> Thanks Michael!
>
> I am looking into relist - did not know about it. The split/ceiling
> approach is the top-answer on a popular R stack-overflow question, so I
> guess it would be good to maintain compatibility with base-R
>
> http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r
>
> Yours,
> Marcin
>
>
>
> On Mon, Dec 5, 2016 at 12:32 PM, Michael Lawrence <
> lawrence.michael at gene.com> wrote:
>
>> Sorry about that. I pushed a fix to devel (2.9.14) and soon release. Btw,
>> it's not typically a good idea to use a real valued vector as a factor. In
>> this case, you could use %/% or better yet a partitioning, i.e., relist(ir,
>> PartitioningByEnd(seq(10, 100, 10))).
>>
>> Michael
>>
>>
>> On Sun, Dec 4, 2016 at 8:40 PM, Marcin Cieślik <marcin.cieslik at gmail.com>
>> wrote:
>>
>>> Dear All,
>>>
>>> I ran into the following change in behaviour between the 3.3 and 3.4
>>> Bioconductor releases.
>>>
>>> The following code returns TRUE for 3.3 and FALSE for 3.4
>>>
>>> library(GenomicRanges)
>>> ir <- IRanges(sample(100),sample(100)+100)
>>> ir2 <- unlist(split(ir, ceiling(1:100 / 10)))
>>> all(ir==ir2)
>>>
>>> The reason is lexicographic sorting of the result list from the split.
>>>
>>> Thanks!
>>>
>>> Yours,
>>> Marcin
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>>
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list