[R] bad performance of a function
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Fri Nov 14 14:55:58 CET 2003
Peter Dalgaard <p.dalgaard at biostat.ku.dk> writes:
> Roger Bivand <Roger.Bivand at nhh.no> writes:
>
> > > rlex$lengths[rlex$values]
> > [1] 1 3 2 5 1 4 1 1 1 3 1 1 2
> > > cetnost
> > [1] 1 3 2 5 1 4 1 1 1 3 1 1 2
> >
> > rle() is interpreted too, like your solution, so I'm not sure how it will
> > scale.
>
> Not spectacularly better, but I don't think Peter is doing what he
> thinks he's doing...
Argh. Petr, not Peter....
> > >
> > > Example 2
> > > x<-sample(c(T,F),40321*51, replace=T)
> > > dd<-matrix(x,40321,51)
> > > system.time(cetnost <- lapply(dd,function(x) as.numeric(table(which(x)-
> > > cumsum(x[which(x)])))))
> > > Timing stopped at: 750.63 1 775.6 NA NA
>
> dd is not a list or data frame, so lapply is doing something for each
> of the 2 million cells. Was this intended instead:
>
> > system.time(cetnost <- apply(dd,2,function(x) as.numeric(table(which(x)-
> + cumsum(x[which(x)])))))
> [1] 8.45 0.10 13.84 0.00 0.00
>
> rle() helps a bit but not orders of magnitude:
>
> > system.time(cetnost <- apply(dd,2,function(x) ((z <- rle(x))$lengths)[z$values]))
> [1] 2.88 0.03 5.32 0.00 0.00
>
> (This problem has a memory foot print of more than 200MB, so total
> timings vary wildly depending on whether swapping occurs.)
>
> --
> O__ ---- Peter Dalgaard Blegdamsvej 3
> c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
--
O__ ---- Peter Dalgaard Blegdamsvej 3
c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list