[R] sorting variable names containing digits
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Dec 22 02:51:17 CET 2008
mixedsort in gtools will give the same result as mysort(s) but
differs in the case of t.
On Sun, Dec 21, 2008 at 8:33 PM, John Fox <jfox at mcmaster.ca> wrote:
> Dear r-helpers,
>
> I'm looking for a way of sorting variable names in a "natural" order, when
> the names are composed of digits and other characters. I know that this is a
> vague idea, and that sorting character strings is a complex topic, but
> perhaps a couple of examples will clarify what I mean:
>
>> s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2",
> + "y10a10", "y10a1", "y2", "var10a2", "var2", "y10")
>
>> sort(s)
> [1] "var10a2" "var2" "x02" "x02a" "x02b" "x1a"
> [7] "x1b" "y10" "y10a1" "y10a10" "y10a2" "y1a1"
> [13] "y2"
>
>> mysort(s)
> [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a"
> [7] "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2"
> [13] "y10a10"
>
>> t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2")
>
>> sort(t)
> [1] "q10.1.1" "q10.10.2" "q10.2.1" "q2.1.1"
>
>> mysort(t)
> [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2"
>
> Here, sort() is the standard R function and mysort() is a replacement, which
> sorts the names into the order that seems natural to me, at least in the
> cases that I've tried:
>
> mysort <- function(x){
> sort.helper <- function(x){
> prefix <- strsplit(x, "[0-9]")
> prefix <- sapply(prefix, "[", 1)
> prefix[is.na(prefix)] <- ""
> suffix <- strsplit(x, "[^0-9]")
> suffix <- as.numeric(sapply(suffix, "[", 2))
> suffix[is.na(suffix)] <- -Inf
> remainder <- sub("[^0-9]+", "", x)
> remainder <- sub("[0-9]+", "", remainder)
> if (all (remainder == "")) list(prefix, suffix)
> else c(list(prefix, suffix), Recall(remainder))
> }
> ord <- do.call("order", sort.helper(x))
> x[ord]
> }
>
> I have a couple of applications in mind, one of which is recognizing
> repeated-measures variables in "wide" longitudinal datasets, which often are
> named in the form x1, x2, ... , xn.
>
> mysort(), which works by recursively slicing off pairs of non-digit and
> digit strings, seems more complicated than it should have to be, and I
> wonder whether anyone has a more elegant solution. I don't think that
> efficiency is a serious issue for the applications I'm considering, but of
> course a more efficient solution would be of interest.
>
> Thanks,
> John
>
> ------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> web: socserv.mcmaster.ca/jfox
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list