[R] Efficiently calculate sd on an array?

Charles C. Berry cberry at tajo.ucsd.edu
Sun Jun 17 18:56:31 CEST 2007


On Sun, 17 Jun 2007, Gavin Simpson wrote:

> Dear list,
>
> Consider the following problem:
>
> n.obs <- 167
> n.boot <- 100
> arr <- array(runif(n.obs*n.obs*n.boot), dim = c(n.obs, n.obs, n.boot))
> arr[sample(n.obs, 3), sample(n.obs, 3), ] <- NA
>
> Given the array arr, with dims = 167*167*100, I would like to calculate
> the sd of the values in the 3rd dimension of arr, and an obvious way to
> do this is via apply():
>
> system.time(res <- apply(arr, c(2,1), sd, na.rm = TRUE))
>
> This takes over 4 seconds on my desktop.
>
> I have found an efficient way to calculate the means of the 3rd
> dimension using
>
> temp <- t(rowMeans(arr, na.rm = TRUE, dims = 2))
>
> instead of
>
> temp <- apply(arr, c(2,1), mean, na.rm = TRUE)
>
> but I am having difficulty seeing how to calculate the standard
> deviations efficiently.
>
> Any idea how I might go about this?

Here are timings on my system:

> system.time(res <- apply(arr, c(2,1), sd, na.rm = TRUE))
    user  system elapsed
    3.49    0.00    3.52
> system.time(res2 <- {
+   ns <- rowSums(!is.na(arr),dim=2)
+   mns <- as.vector(rowMeans(arr, na.rm = TRUE, dims = 2))
+   sds <- t(sqrt(rowSums( (arr- mns )^2,na.rm=T,dims=2)/as.vector(ns-1)))
+   sds[t(ns)==0] <- NA
+   sds})
    user  system elapsed
    0.36    0.02    0.37
> all.equal(res,res2)
[1] TRUE
>

HTH,

Chuck

>
> All the best,
>
> G
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Gavin Simpson                     [t] +44 (0)20 7679 0522
> ECRC                              [f] +44 (0)20 7679 0565
> UCL Department of Geography
> Pearson Building                  [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street
> London, UK                        [w] http://www.ucl.ac.uk/~ucfagls/
> WC1E 6BT                          [w] http://www.freshwaters.org.uk/
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list