[R] local sequence function

William Dunlap wdunlap at tibco.com
Mon Sep 14 19:21:47 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Henrique 
> Dallazuanna
> Sent: Monday, September 14, 2009 9:59 AM
> To: smu
> Cc: r-help at r-project.org
> Subject: Re: [R] local sequence function
> 
> Try this also:
> 
> with(rle(v), unlist(sapply(lengths, FUN = seq)) * v)

Note that the sequence() function is essentially
the second argument to that call to with().
  > sequence
  function (nvec) 
  unlist(lapply(nvec, seq_len))
  <environment: namespace:base>
It uses lapply because there is no need to waste
the time sapply spends "simplifying" the answer
and it uses seq_len since that is faster than seq
(and gives more predictable results for odd inputs
like 0).

For long input vectors the following Sequence() function
is faster and probably uses less memory

  Sequence <- function(nvec) {
        seq_len(sum(nvec)) - rep(cumsum(c(0L,nvec[-length(nvec)])), nvec)
   }

Hence the following can be considerably faster than the above
   f2 <- function(v)v * Sequence(rle(is.na(v))$lengths)
Note that I use rle(is.na(v)) instead of rle(v), since R's rle considers
a run of n NA's to be n runs of singleton NA's, which would cause
the lapply-based function to waste time computing seq_len(1) for
each NA (which we intend to throw away).  When using Sequence
instead of sequence you may as well omit the is.na.

For a vector of 10 million NA's and TRUE's with c. 5 million runs
in it
  v1<-sample(c(TRUE,NA), replace=T, size=1e7, prob=c(.7,.3))
f2 takes 2.7 seconds on my machine and the sequence()
based approach takes 19.0.  sequence() is a candidate for
implementing in C as it is orthogonal to the rep() function which
is a .Primitive
   > noquote(rep(c("A","B","C","D"), c(2,4,1,2)))
   [1] A A B B B B C D D
   > sequence(c(2,4,1,2))
   [1] 1 2 1 2 3 4 1 1 2

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com 

> 
> On Mon, Sep 14, 2009 at 12:20 PM, smu <ml at z107.de> wrote:
> 
> > hey,
> >
> > I can not find a function for the following problem, 
> hopefully you can
> > help me.
> >
> > I have a vactor like this one
> >
> > v = c(NA,NA,TRUE,TRUE,NA,TRUE,NA,TRUE,TRUE,TRUE)
> >
> > and I would like to the TRUE values by the their "local sequence
> > number".
> >
> > This means, the result should look thike this:
> >
> > c(NA,NA,1,2,NA,1,NA,1,2,3)
> >
> > Of course I could solve the problems using a loop, but this would be
> > much to slow, because the real vector is much larger.
> > Can you point me in the right direction?
> >
> > thank you!
> >
> > regards,
> >  Stefan
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> 
> 
> 
> -- 
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
> 
> 	[[alternative HTML version deleted]]
> 
> 




More information about the R-help mailing list