[R] Complex sort problem
Petr Savicky
savicky at cs.cas.cz
Mon May 21 21:43:59 CEST 2012
On Fri, May 18, 2012 at 09:20:59PM -0400, Axel Urbiz wrote:
[...]
> Petr: I kind of see your line of thought, but still cannot see how it works
> on a specific example like this one.
I did not have email in the last few days.
The previous suggestion from
https://stat.ethz.ch/pipermail/r-help/2012-May/313197.html
was meant for the situation that we want to keep the result of
sorting according to several variables, so that later, sorting
of a subset can be done only by sorting according to a single
variable. Now, i see, all sortings are already according to
a single variable, so this is not helpful.
Try the following, which uses the example from your code.
In particular, it uses a matrix (not a data frame) and
there are no duplicates in the data.
set.seed(1)
dframe <- matrix(runif(250), 50, 5)
### store sort indexes
sort_matrix <- matrix(ncol = ncol(dframe), nrow = nrow(dframe))
for (i in 1:ncol(dframe)) {
xtemp <- dframe[, i]
sort_matrix[, i] <- sort.list(xtemp, method = "shell")
}
### take a bootstrap sample
nr_samples <- nrow(dframe)
b.ind <- sample(1:nr_samples, nr_samples*0.5, replace = TRUE)
freq <- tabulate(b.ind, nbins=nr_samples)
### create bootstrap sample sorted with respect to an arbitrary variable
var1 <- 1
ind <- sort_matrix[, var1]
DF1 <- dframe[ind, ] # this can be computed in advance (before b.ind)
NDF1 <- DF1[rep(1:nrow(DF1), times=freq[ind]), ]
### compare with a straightforward method
subDF <- dframe[b.ind, ]
subDF1 <- subDF[order(subDF[, var1]), ]
identical(NDF1, subDF1)
[1] TRUE
The main step is that "ind" is used to transform both the data
and the frequency table. So, they remain consistent and the
reordered frequencies may be used for the reordered data.
Hope this helps.
Petr Savicky.
More information about the R-help
mailing list