[R] sorting variable names containing digits
Gabor Grothendieck
ggrothendieck at gmail.com
Mon Dec 22 05:28:51 CET 2008
Another possibility is to use strapply in gsubfn giving a solution
that is non-recursive and shorter:
library(gsubfn)
mysort2 <- function(s) {
L <- strapply(s, "([0-9]+)|([^0-9]+)",
~ if (nchar(x)) sprintf("%9d", as.numeric(x)) else y)
L2 <- t(do.call(cbind, lapply(L, ts)))
L3 <- replace(L2, is.na(L2), "")
ord <- do.call(order, as.data.frame(L3, stringsAsFactors = FALSE))
s[ord]
}
First strapply breaks up each string into a character vector of the numeric
and non-numeric components. We pad each numeric component on the
left with spaces using sprintf so they are all 9 wide. The next line
turns that
into a matrix L2 and then we replace the NAs giving L3. Finally we order it
and apply the ordering, ord, to get the sorted version.
The gsubfn home page is at:
http://gsubfn.googlecode.com
Here is some sample output:
> mysort2(s)
[1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a"
"x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" "y10a10"
> mysort(s)
[1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a"
"x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" "y10a10"
> mysort2(t)
[1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2"
> mysort(t)
[1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2"
On Sun, Dec 21, 2008 at 9:57 PM, John Fox <jfox at mcmaster.ca> wrote:
> Dear Gabor,
>
> Thanks for this -- I was unaware of mixedsort(). As you point out,
> however, mixedsort() doesn't cover all of the cases in which I'm
> interested and which are handled by mysort().
>
> Regards,
> John
>
> On Sun, 21 Dec 2008 20:51:17 -0500
> "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote:
>> mixedsort in gtools will give the same result as mysort(s) but
>> differs in the case of t.
>>
>> On Sun, Dec 21, 2008 at 8:33 PM, John Fox <jfox at mcmaster.ca> wrote:
>> > Dear r-helpers,
>> >
>> > I'm looking for a way of sorting variable names in a "natural"
>> order, when
>> > the names are composed of digits and other characters. I know that
>> this is a
>> > vague idea, and that sorting character strings is a complex topic,
>> but
>> > perhaps a couple of examples will clarify what I mean:
>> >
>> >> s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2",
>> > + "y10a10", "y10a1", "y2", "var10a2", "var2", "y10")
>> >
>> >> sort(s)
>> > [1] "var10a2" "var2" "x02" "x02a" "x02b" "x1a"
>> > [7] "x1b" "y10" "y10a1" "y10a10" "y10a2" "y1a1"
>> > [13] "y2"
>> >
>> >> mysort(s)
>> > [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a"
>> > [7] "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2"
>> > [13] "y10a10"
>> >
>> >> t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2")
>> >
>> >> sort(t)
>> > [1] "q10.1.1" "q10.10.2" "q10.2.1" "q2.1.1"
>> >
>> >> mysort(t)
>> > [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2"
>> >
>> > Here, sort() is the standard R function and mysort() is a
>> replacement, which
>> > sorts the names into the order that seems natural to me, at least
>> in the
>> > cases that I've tried:
>> >
>> > mysort <- function(x){
>> > sort.helper <- function(x){
>> > prefix <- strsplit(x, "[0-9]")
>> > prefix <- sapply(prefix, "[", 1)
>> > prefix[is.na(prefix)] <- ""
>> > suffix <- strsplit(x, "[^0-9]")
>> > suffix <- as.numeric(sapply(suffix, "[", 2))
>> > suffix[is.na(suffix)] <- -Inf
>> > remainder <- sub("[^0-9]+", "", x)
>> > remainder <- sub("[0-9]+", "", remainder)
>> > if (all (remainder == "")) list(prefix, suffix)
>> > else c(list(prefix, suffix), Recall(remainder))
>> > }
>> > ord <- do.call("order", sort.helper(x))
>> > x[ord]
>> > }
>> >
>> > I have a couple of applications in mind, one of which is
>> recognizing
>> > repeated-measures variables in "wide" longitudinal datasets, which
>> often are
>> > named in the form x1, x2, ... , xn.
>> >
>> > mysort(), which works by recursively slicing off pairs of non-digit
>> and
>> > digit strings, seems more complicated than it should have to be,
>> and I
>> > wonder whether anyone has a more elegant solution. I don't think
>> that
>> > efficiency is a serious issue for the applications I'm considering,
>> but of
>> > course a more efficient solution would be of interest.
>> >
>> > Thanks,
>> > John
>> >
>> > ------------------------------
>> > John Fox, Professor
>> > Department of Sociology
>> > McMaster University
>> > Hamilton, Ontario, Canada
>> > web: socserv.mcmaster.ca/jfox
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
> --------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario, Canada
> http://socserv.mcmaster.ca/jfox/
>
More information about the R-help
mailing list