[R] bad performance of a function

Fri Nov 14 13:52:10 CET 2003

On Fri, 14 Nov 2003, Petr Pikal wrote:

> Dear all
> 
> I need to find a length of true sequences in logical vector (see example 1). I found 
> a possible solution which is good but if I use it on a larger data set I experience a 
> substantial decrease in performance (example 2).
> 
> Example 1
> set.seed(111)
> x <- sample(c(T,F),50, replace=T)
> system.time(cetnost <- as.numeric(table(which(x)-cumsum(x[which(x)]))))
> [1] 0.00 0.00 0.03   NA   NA
> cetnost
> [1] 1 3 2 5 1 4 1 1 1 3 1 1 2

Have you looked at rle()?

> rlex <- rle(x)
> str(rlex)
List of 2
 $ lengths: int [1:27] 2 1 1 3 1 2 2 5 1 1 ...
 $ values : logi [1:27] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE 
FALSE  TRUE FALSE  TRUE ...
 - attr(*, "class")= chr "rle"
> rlex$lengths[rlex$values]
 [1] 1 3 2 5 1 4 1 1 1 3 1 1 2
> cetnost
 [1] 1 3 2 5 1 4 1 1 1 3 1 1 2

rle() is interpreted too, like your solution, so I'm not sure how it will 
scale.

> 
> Example 2
> x<-sample(c(T,F),40321*51, replace=T)
> dd<-matrix(x,40321,51)
> system.time(cetnost <- lapply(dd,function(x) as.numeric(table(which(x)-
> cumsum(x[which(x)])))))
> Timing stopped at: 750.63 1 775.6 NA NA 
> 
> Please give me any hint how to improve performance or advice a different (but 
> more effective) solution.
> 
> R 1.8.0, W2000,  512M memory, Pentium4
> 
> Thank you in advance.
> 
> 
> 
> Petr Pikal
> petr.pikal at precheza.cz
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no