[R] Ordering problem

John Logsdon j.logsdon at quantex-research.com
Fri Nov 25 11:38:55 CET 2005


I have an ordering and factor problem to which there must be a simple
solution!  The version is R 2.0.1  (2004-11-15) on A Linux platform.

A data frame H is read in from a .csv file using read.csv with as.is=TRUE.  

Another data frame HN is constructed from data and I want to compare two
columns both named ss of the (sorted) data frames that are the same
length.

The problem is that HN$ss is always treated as a factor whatever I do
while H$ss is treated as an integer, which is what I want.  Somewhere R is
making an implicit transformation but I can't see how to correct it.

The data are all integers in the range 1:13 - in fact with no gaps.  If I
tabulate from H:

> table(H$ss)

   1    2    3    4    5    6    7    8    9   10   11   12   13 
 176  176  176  176  176  176  341 8726 8784 8777 8773 8749 8747 

and for HN:

> table(HN$ss)

   1   10   11   12   13    2    3    4    5    6    7    8    9 
 176 8777 8773 8749 8747  176  176  176  176  176  341 8726 8784 

At some time while constructing HN, I have to make it a character matrix -
otherwise gsub doesn't work when removing surplus blanks for example - but
I have turned it back into a data frame in the end.

If I check the modes, both data frames are lists and both columns are
numeric - HN is not reported as a factor.  Yet it appears to be treated as
a factor, for example:

> table(formatC(H$ss,dig=0,width=2,format="f",flag="0"))

  01   02   03   04   05   06   07   08   09   10   11   12   13 
 176  176  176  176  176  176  341 8726 8784 8777 8773 8749 8747 
> table(formatC(HN$ss,dig=0,width=2,format="f",flag="0"))

yet:

   1   10   11   12   13    2    3    4    5    6    7    8    9 
 176 8777 8773 8749 8747  176  176  176  176  176  341 8726 8784 
Warning messages: 
1: "+" not meaningful for factors in: Ops.factor(x, ifelse(x == 0, 1, 0)) 
2: "<" not meaningful for factors in: Ops.factor(x, 0) 

I have tried as.numeric but then I get the factor level rather than name
returned:

> table(formatC(as.numeric(HN$ss),dig=0,width=2,format="f",flag="0"))

  01   02   03   04   05   06   07   08   09   10   11   12   13 
 176 8777 8773 8749 8747  176  176  176  176  176  341 8726 8784 

which obviously is a tabulation of the internal levels rather than the
data.

TIA

John

John Logsdon                               "Try to make things as simple
Quantex Research Ltd, Manchester UK         as possible but not simpler"
j.logsdon at quantex-research.com              a.einstein at relativity.org
+44(0)161 445 4951/G:+44(0)7717758675       www.quantex-research.com




More information about the R-help mailing list