[Rd] HOW TO AVOID LOOPS
Martin Morgan
mtmorgan at fhcrc.org
Sun Apr 13 05:45:18 CEST 2008
Taking Hadley's clue, I guess
x * unlist(lapply(rle(x)$lengths, seq_len))
is faster than my previous suggestion (which Dan had inspired in the
first place).
For
> A vector of the following format:
> (0,0,1,0,0,0,3,0,0,0,2,0,1,0,0,0,0,0,6)
one might
> z <- numeric(length(x))
> r <- rle(x)
> i <- r$values==1
> z[cumsum(r$lengths)[i]] <- r$lengths[i]
> z
[1] 0 0 1 0 0 0 3 0 0 0 2 0 1 0 0 0 0 0 0 6
but since 'rle' keeps coming up, perhaps you're really wanting to know
basic things about run-length encoding, such as the length of each run
of 1's
> r$lengths[r$values==1]
[1] 1 3 2 1 6
? If space and time are an issue, you might also consider reprsenting
your data as 'raw' to save space
> x <- rbinom(1000000, 1, .5)
> object.size(x)
[1] 8000040
> system.time(x*unlist(lapply(rle(x)$lengths, seq_len)), gcFirst=TRUE)
user system elapsed
1.800 0.000 1.798
> x <- as.raw(x)
> head(x)
[1] 01 01 01 01 00 01
> object.size(x)
[1] 1000040
> system.time((x==1)*unlist(lapply(rle(x)$lengths, seq_len)), gcFirst=TRUE)
user system elapsed
1.730 0.000 1.730
(the timings are quite variable; perhaps they're about equal?)
Martin
"carlos martinez" <martinezbula at earthlink.net> writes:
> Appreciate the ingenious and effective suggestions and feedback from:
>
> Dan Davison
> Vincent Goulet
> Martin Morgan
> Hadley Wickham
>
> The variety of technical approaches proposes so far are clear prove of the
> strong and flexible capabilites of the R system, and specially the dynamics
> and technical understanding of the R user base.
>
> We tested all four recommendations with an input vector of more than 850000
> components, and got time-responses from about 40-second to 20-seconds.
>
> All four approches produced the desired vector. The Wickham's approach
> produced and extra vector, but the second vector included the correct
> format.
>
> Just one additional follow up, to obtain from the same input vector:
> c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
>
> A vector of the following format:
> (0,0,1,0,0,0,3,0,0,0,2,0,1,0,0,0,0,0,6)
>
> Will be easier and more efficient to start from the original input vector,
> or start from the above second vector
> (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)
>
> Thanks for your responses.
>
> -------------------------------------------------------------------------
> Hadley Wickham Approach
>
> How about:
>
> unlist(lapply(split(x, cumsum(x == 0)), seq_along)) - 1
>
> Hadley
> --------------------------------------------------------------------------
> -----Original Message-----
> From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
> Sent: Saturday, April 12, 2008 5:00 PM
> To: Dan Davison
> Cc: martinezbula at earthlink.net
> Subject: Re: [Rd] HOW TO AVOID LOOPS
>
> (anonymous 'off-list' response; some extra calcs but tidy)
>
>> x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
>> x * unlist(lapply(rle(x)$lengths, seq))
> [1] 0 0 1 0 1 2 3 0 0 1 2 0 1 0 1 2 3 4 5 6
>
>
> Dan Davison <davison at stats.ox.ac.uk> writes:
>
>> On Sat, Apr 12, 2008 at 06:45:00PM +0100, Dan Davison wrote:
>>> On Sat, Apr 12, 2008 at 01:30:13PM -0400, Vincent Goulet wrote:
>>> > Le sam. 12 avr. à 12:47, carlos martinez a écrit :
>>> > >> Looking for a simple, effective a minimum execution time solution.
>>> > >>
>>> > >> For a vector as:
>>> > >>
>>> > >> c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
>>> > >>
>>> > > To transform it to the following vector without using any loops:
>>> > >
>>> > >> (0,0,1,0,1,2,3,0,0,1,2,0,1,0,1,2,3,4,5,6)
>>> > >>
>>> > > Appreciate any suggetions.
>>> >
>>> > This does it -- but it is admittedly ugly:
>>> >
>>> > > x <- c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)
>>> > > ind <- which(x == 0)
>>> > > unlist(lapply(mapply(seq, ind, c(tail(ind, -1) - 1, length(x))),
>>> > function(y) cumsum(x[y])))
>>> > [1] 0 0 1 0 1 2 3 0 0 1 2 0 1 0 1 2 3 4 5 6
>>> >
>>> > (The mapply() part is used to create the indexes of each sequence
>>> > in x starting with a 0. The rest is then straightforward.)
>>>
>>>
>>> Here's my effort. Maybe a bit easier to digest? Only one *apply so
> probably more efficient.
>>>
>>> function(x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)) {
>>> d <- diff(c(0,x,0))
>>> starts <- which(d == 1)
>>> ends <- which(d == -1)
>>> x[x == 1] <- unlist(lapply(ends - starts, function(n) 1:n))
>>> x
>>> }
>>>
>>
>> Come to think of it, I suggest using the existing R function rle(), rather
> than my dodgy substitute.
>>
>> e.g.
>>
>> g <- function(x=c(0,0,1,0,1,1,1,0,0,1,1,0,1,0,1,1,1,1,1,1)) {
>>
>> runs <- rle(x)
>> runlengths <- runs$lengths[runs$values == 1]
>> x[x == 1] <- unlist(lapply(runlengths, function(n) 1:n))
>> x
>> }
>>
>> Dan
>>
>> p.s. R-help would perhaps have been more appropriate than R-devel
>>
>>
>>> Dan
>>>
>>>
>>> >
>>> > HTH
>>> >
>>> > ---
>>> > Vincent Goulet, Associate Professor
>>> > École d'actuariat
>>> > Université Laval, Québec
>>> > Vincent.Goulet at act.ulaval.ca http://vgoulet.act.ulaval.ca
>>> >
>>> > ______________________________________________
>>> > R-devel at r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview
> Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the R-devel
mailing list