[R] Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Jan 20 08:08:49 CET 2007


The usual advice would seem to apply here:

- profile (the real application, not a toy example) to see where the
   bottlenecks are.
- rewrite those in C.

It is quite possible that hubers would benefit in your problem by being 
rewritten in C.  However, there is a reason why it was not.  It is a 
univariate location estimator, and using multiple univariate location 
estimators is not a robust multivariate location estimator.  So in so far 
as I understand your problem sketch, you would be better off with a 
multivariate estimator for your N outcomes.

There may be problems which need the application of very large numbers of 
univariate robust estimators, but they are not commonplace.

On Fri, 19 Jan 2007, Benilton Carvalho wrote:

> Hi Everyone,
>
> Given the scenario I have, I was wondering if anyone would be able to
> give me a hind on how to get the results from hubers() in a more
> efficient way.
>
> I have an outcome on an array [N x S x D].
>
> I also have a factor (levels 1,2,3) stored on a matrix N x S.
>
> My objective is to get "mu" and "sigma" for each of the N rows
> (outcome) stratified by the factor (levels 1, 2 and 3) for each of
> the D "levels", but using MASS:hubers().
>
> Ideally the final result would be an array [N x D x 3 x 2].
>
> The following toy example demonstrates what I want to do, and I'd
> like to improve the performance when working on my case, where S=400
> and N > 200000
>
> Thank you very much for any suggestion.
>
> benilton
>
> ## begin toy example
> set.seed(1)
> N <- 100
> S <- 5
> D <- 2
>
> outcome <- array(rnorm(N*S*D), dim=c(N, S, D))
> classes <- matrix(sample(c(1:3, NA), N*S, rep=T), ncol=S)
>
> results <- array(NA, dim=c(N, D, 3, 2))
>
> library(MASS)
> myHubers <- function(x)
>   if (length(x)>1) as.numeric(hubers(x))  else c(NA, NA)
>
> for (n in 1:N)
>   for (d in 1:D){
>     tmp <- outcome[n,,d]
>     grp <- classes[n,]
>     results[n, d,,] <- t(sapply(split(tmp, factor(grp, levels=1:3)),
> myHubers))
>   }
> ## end
>
> --
> Benilton Carvalho
> PhD Candidate
> Department of Biostatistics
> Johns Hopkins University
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list