[R] why is 9 after 10?

Fri Feb 12 23:10:13 CET 2016

It can also happen if you use colClasses, since that applies as.factor to the input column without first converting it to numeric. To wit:

> read.table(text="
+ 9
+ 10", colClasses="factor")$V1
[1] 9  10
Levels: 10 9

-pd

> On 12 Feb 2016, at 22:43 , Jim Lemon <drjimlemon at gmail.com> wrote:
> 
> It depends upon what goes into the "data reshaping pipeline". If there is a
> single non-numeric value in the data read in, it will alpha sort it upon
> conversion to a factor:
> 
> x<-factor(c(sample(6:37,1000,TRUE)," "))
> z<-factor(x)
> levels(z)
> [1] " "  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20" "21" "22"
> "23"
> [16] "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37"
> "6"
> [31] "7"  "8"  "9"
> 
> Jim
> 
> 
> On Sat, Feb 13, 2016 at 2:41 AM, Fox, John <jfox at mcmaster.ca> wrote:
> 
>> Dear Federico,
>> 
>>> -----Original Message-----
>>> From: Federico Calboli [mailto:federico.calboli at helsinki.fi]
>>> Sent: February 12, 2016 10:27 AM
>>> To: Fox, John <jfox at mcmaster.ca>
>>> Cc: R Help <r-help at r-project.org>
>>> Subject: Re: [R] why is 9 after 10?
>>> 
>>> Dear John,
>>> 
>>> that is fortunatey not the case, I just managed to figure out that the
>> problem
>>> was that in the data reshaping pipeline the numeric column was
>> transformed
>>> into a factor.
>> 
>> But that shouldn't have this effect, I think:
>> 
>>> z <- as.factor(x)
>>> table(z)
>> z
>> 6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
>> 31 32 33 34 35 36 37
>> 29 30 35 29 41 33 27 21 38 36 34 35 31 29 27 26 28 22 21 34 32 33 31 34 23
>> 32 35 39 31 40 35 29
>> 
>>> levels(z)
>> [1] "6"  "7"  "8"  "9"  "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
>> "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31"
>> [27] "32" "33" "34" "35" "36" "37"
>> 
>> Best,
>> John
>> 
>>> 
>>> Many thanks for your time.
>>> 
>>> BW
>>> 
>>> F
>>> 
>>> 
>>> 
>>>> On 12 Feb 2016, at 17:22, Fox, John <jfox at mcmaster.ca> wrote:
>>>> 
>>>> Dear Federico,
>>>> 
>>>> Might my.data[, 2] contain character data, which therefore would be
>>> sorted in this manner? For example:
>>>> 
>>>>> x <- sample(6:37, 1000, replace=TRUE)
>>>>> table(x)
>>>> x
>>>> 6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
>>>> 30 31 32 33 34 35 36 37
>>>> 29 30 35 29 41 33 27 21 38 36 34 35 31 29 27 26 28 22 21 34 32 33 31
>>>> 34 23 32 35 39 31 40 35 29
>>>>> y <- as.character(x)
>>>>> table(y)
>>>> y
>>>> 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
>>>> 33 34 35 36 37  6  7  8  9
>>>> 41 33 27 21 38 36 34 35 31 29 27 26 28 22 21 34 32 33 31 34 23 32 35
>>>> 39 31 40 35 29 29 30 35 29
>>>> 
>>>> I hope this helps,
>>>> John
>>>> 
>>>> -----------------------------
>>>> John Fox, Professor
>>>> McMaster University
>>>> Hamilton, Ontario
>>>> Canada L8S 4M4
>>>> Web: socserv.mcmaster.ca/jfox
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
>>>>> Federico Calboli
>>>>> Sent: February 12, 2016 10:13 AM
>>>>> To: R Help <r-help at r-project.org>
>>>>> Subject: [R] why is 9 after 10?
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> I have some data, one of the columns is a bunch of numbers from 6 to
>> 41.
>>>>> 
>>>>> table(my.data[,2])
>>>>> 
>>>>> returns
>>>>> 
>>>>> 10   11   12   13   14   15   16   17   18   19   20   21   22   23
>> 24   25   26   27   28
>>> 29
>>>>> 30   31   32   33   34   35   36   37
>>>>> 1761 1782 1897 1749 1907 1797 1734 1810 1913 1988 1914 1822 1951 1973
>>>>> 1951
>>>>> 1947 2067 1967 1812 2119 1999 2086 2133 2081 2165 2365 2330 2340
>>>>> 38   39   40   41    6    7    8    9
>>>>> 2681 2905 3399 3941 1648 1690 1727 1668
>>>>> 
>>>>> whereas the reasonable expectation is that the numbers from 6 to 9
>>>>> would come before 10 to 41.
>>>>> 
>>>>> How do I sort this incredibly silly behaviour so that my table
>>>>> follows a reasonable expectation that 9 comes before 10 (and so on and
>>> so forth)?
>>>>> 
>>>>> BW
>>>>> 
>>>>> F
>>>>> 
>>>>> --
>>>>> Federico Calboli
>>>>> Ecological Genetics Research Unit
>>>>> Department of Biosciences
>>>>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>>>>> FIN-00014 University of Helsinki
>>>>> Finland
>>>>> 
>>>>> federico.calboli at helsinki.fi
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>>>> guide.html and provide commented, minimal, self-contained,
>>>>> reproducible code.
>>> 
>>> --
>>> Federico Calboli
>>> Ecological Genetics Research Unit
>>> Department of Biosciences
>>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>>> FIN-00014 University of Helsinki
>>> Finland
>>> 
>>> federico.calboli at helsinki.fi
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com