[R] bad performance of a function
Roger Bivand
Roger.Bivand at nhh.no
Fri Nov 14 13:52:10 CET 2003
On Fri, 14 Nov 2003, Petr Pikal wrote:
> Dear all
>
> I need to find a length of true sequences in logical vector (see example 1). I found
> a possible solution which is good but if I use it on a larger data set I experience a
> substantial decrease in performance (example 2).
>
> Example 1
> set.seed(111)
> x <- sample(c(T,F),50, replace=T)
> system.time(cetnost <- as.numeric(table(which(x)-cumsum(x[which(x)]))))
> [1] 0.00 0.00 0.03 NA NA
> cetnost
> [1] 1 3 2 5 1 4 1 1 1 3 1 1 2
Have you looked at rle()?
> rlex <- rle(x)
> str(rlex)
List of 2
$ lengths: int [1:27] 2 1 1 3 1 2 2 5 1 1 ...
$ values : logi [1:27] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
FALSE TRUE FALSE TRUE ...
- attr(*, "class")= chr "rle"
> rlex$lengths[rlex$values]
[1] 1 3 2 5 1 4 1 1 1 3 1 1 2
> cetnost
[1] 1 3 2 5 1 4 1 1 1 3 1 1 2
rle() is interpreted too, like your solution, so I'm not sure how it will
scale.
>
> Example 2
> x<-sample(c(T,F),40321*51, replace=T)
> dd<-matrix(x,40321,51)
> system.time(cetnost <- lapply(dd,function(x) as.numeric(table(which(x)-
> cumsum(x[which(x)])))))
> Timing stopped at: 750.63 1 775.6 NA NA
>
> Please give me any hint how to improve performance or advice a different (but
> more effective) solution.
>
> R 1.8.0, W2000, 512M memory, Pentium4
>
> Thank you in advance.
>
>
>
> Petr Pikal
> petr.pikal at precheza.cz
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no
More information about the R-help
mailing list