[R] Suggestion on how to improve efficiency when using MASS:::hubers on high-dimensional arrays
Prof Brian Ripley
ripley at stats.ox.ac.uk
Sat Jan 20 08:08:49 CET 2007
The usual advice would seem to apply here:
- profile (the real application, not a toy example) to see where the
bottlenecks are.
- rewrite those in C.
It is quite possible that hubers would benefit in your problem by being
rewritten in C. However, there is a reason why it was not. It is a
univariate location estimator, and using multiple univariate location
estimators is not a robust multivariate location estimator. So in so far
as I understand your problem sketch, you would be better off with a
multivariate estimator for your N outcomes.
There may be problems which need the application of very large numbers of
univariate robust estimators, but they are not commonplace.
On Fri, 19 Jan 2007, Benilton Carvalho wrote:
> Hi Everyone,
>
> Given the scenario I have, I was wondering if anyone would be able to
> give me a hind on how to get the results from hubers() in a more
> efficient way.
>
> I have an outcome on an array [N x S x D].
>
> I also have a factor (levels 1,2,3) stored on a matrix N x S.
>
> My objective is to get "mu" and "sigma" for each of the N rows
> (outcome) stratified by the factor (levels 1, 2 and 3) for each of
> the D "levels", but using MASS:hubers().
>
> Ideally the final result would be an array [N x D x 3 x 2].
>
> The following toy example demonstrates what I want to do, and I'd
> like to improve the performance when working on my case, where S=400
> and N > 200000
>
> Thank you very much for any suggestion.
>
> benilton
>
> ## begin toy example
> set.seed(1)
> N <- 100
> S <- 5
> D <- 2
>
> outcome <- array(rnorm(N*S*D), dim=c(N, S, D))
> classes <- matrix(sample(c(1:3, NA), N*S, rep=T), ncol=S)
>
> results <- array(NA, dim=c(N, D, 3, 2))
>
> library(MASS)
> myHubers <- function(x)
> if (length(x)>1) as.numeric(hubers(x)) else c(NA, NA)
>
> for (n in 1:N)
> for (d in 1:D){
> tmp <- outcome[n,,d]
> grp <- classes[n,]
> results[n, d,,] <- t(sapply(split(tmp, factor(grp, levels=1:3)),
> myHubers))
> }
> ## end
>
> --
> Benilton Carvalho
> PhD Candidate
> Department of Biostatistics
> Johns Hopkins University
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list