[R] Complex sort problem

Petr Savicky savicky at cs.cas.cz
Mon May 21 21:43:59 CEST 2012


On Fri, May 18, 2012 at 09:20:59PM -0400, Axel Urbiz wrote:
[...]
> Petr: I kind of see your line of thought, but still cannot see how it works
> on a specific example like this one.

I did not have email in the last few days.

The previous suggestion from

  https://stat.ethz.ch/pipermail/r-help/2012-May/313197.html

was meant for the situation that we want to keep the result of
sorting according to several variables, so that later, sorting
of a subset can be done only by sorting according to a single
variable. Now, i see, all sortings are already according to
a single variable, so this is not helpful.

Try the following, which uses the example from your code.
In particular, it uses a matrix (not a data frame) and
there are no duplicates in the data.

  set.seed(1)
 
  dframe <- matrix(runif(250), 50, 5)
 
  ### store sort indexes
 
  sort_matrix <- matrix(ncol = ncol(dframe), nrow = nrow(dframe))
 
  for (i in 1:ncol(dframe)) {
    xtemp <- dframe[, i]
    sort_matrix[, i] <- sort.list(xtemp, method = "shell")
  }
 
  ### take a bootstrap sample
 
  nr_samples <- nrow(dframe)
  b.ind <- sample(1:nr_samples, nr_samples*0.5, replace = TRUE)
  freq <- tabulate(b.ind, nbins=nr_samples)
 
  ### create bootstrap sample sorted with respect to an arbitrary variable
 
  var1 <- 1
  ind <- sort_matrix[, var1]
  DF1 <- dframe[ind, ]    # this can be computed in advance (before b.ind)
  NDF1 <- DF1[rep(1:nrow(DF1), times=freq[ind]), ]
 
  ### compare with a straightforward method

  subDF <- dframe[b.ind, ]
  subDF1 <- subDF[order(subDF[, var1]), ]
  identical(NDF1, subDF1)

  [1] TRUE

The main step is that "ind" is used to transform both the data
and the frequency table. So, they remain consistent and the
reordered frequencies may be used for the reordered data.

Hope this helps.

Petr Savicky.



More information about the R-help mailing list