[R] Ordering problem

John Logsdon j.logsdon at quantex-research.com
Fri Nov 25 13:25:11 CET 2005


Thanks to Florence but it needs a little modification.  However as I have
now discovered the str() command, things are looking up.:))

I have a character matrix so I() just leaves it as characters whereas I
want the various columns to be integers or whatever they contain.

To take Florence's example slightly extended:

> v1<-c(1,2,3);v2<-c("a","b","c");v3<-c("1","2","3")

Note that the third vector is a character with numerical contents.

> data.frame(v1,v2,v3)
  v1 v2 v3
1  1  a  1
2  2  b  2
3  3  c  3

so it looks OK, but

> str(data.frame(v1,v2,v3))
`data.frame':   3 obs. of  3 variables:
 $ v1: num  1 2 3
 $ v2: Factor w/ 3 levels "a","b","c": 1 2 3
 $ v3: Factor w/ 3 levels "1","2","3": 1 2 3

reveals the nasty truth!

whereas

> str(data.frame(v1,v2,I(v3)))
`data.frame':   3 obs. of  3 variables:
 $ v1: num  1 2 3
 $ v2: Factor w/ 3 levels "a","b","c": 1 2 3
 $ v3:Class 'AsIs'  chr [1:3] "1" "2" "3"

just keeps the character v3 as characters.  I want it to be interpreted as
numeric so:

> str(data.frame(v1,v2,as.numeric(v3)))
`data.frame':   3 obs. of  3 variables:
 $ v1            : num  1 2 3
 $ v2            : Factor w/ 3 levels "a","b","c": 1 2 3
 $ as.numeric.v3.: num  1 2 3

actually gives me what I need.  

The only problem is that I have to do everything column by column and
there are 15 cols all.  So it makes particularly ugly coding to reproduce
an as.is read from a .csv file. 

The other solutions from Baz and Carlos would also work of course - but
they are still pretty horrible.  Perhaps another way to do this is to
write it out using cat then read it in again using as.is=TRUE!! ;)

Thanks to one and all

Best wishes

John

John Logsdon                               "Try to make things as simple
Quantex Research Ltd, Manchester UK         as possible but not simpler"
j.logsdon at quantex-research.com              a.einstein at relativity.org
+44(0)161 445 4951/G:+44(0)7717758675       www.quantex-research.com


On Fri, 25 Nov 2005, Florence Combes wrote:

> John,
> 
> at ?factor, you can see :
> 
> " Be careful only to compare factors with the
>   same set of levels (in the same order).  In particular,
>   'as.numeric' applied to a factor is meaningless, and may happen by
>   implicit coercion.  To "revert" a factor 'f' to its original
>   numeric values, 'as.numeric(levels(f))[f]' is recommended and
>   slightly more efficient than 'as.numeric(as.character(f))'. "
> 
> 'as.numeric(levels(f))[f]'  worked well for me in the similar situation i.e.
> to get back numeric values from a factor type.
> But see also the I() "option" of the data.frame() function, which allows you
> not to obtain a factor (from a character vector only) if it is not what you
> want.
> 
> from ?data.frame :
> 
> "Objects passed to 'data.frame' should have the same number of
>      rows, but atomic vectors, factors and character vectors protected
>      by 'I' will be recycled a whole number of times if necessary."
> 
> 
> see this example:
> --------------------------------------------------
> > v1<-c(1,2,3)
> > v2<-c("a","b","c")
> > df.A<-data.frame(v1,v2)
> > str(df.A)
> `data.frame':   3 obs. of  2 variables:
>  $ v1: num  1 2 3
>  $ v2: Factor w/ 3 levels "a","b","c": 1 2 3
> > df.B<-data.frame(v1,I(v2))
> > str(df.B)
> `data.frame':   3 obs. of  2 variables:
>  $ v1: num  1 2 3
>  $ v2:Class 'AsIs'  chr [1:3] "a" "b" "c"
> -------------------------------------------------
> 
> hope this helps,
> 
> Florence.
> 
> 
> 
> 
> 
> On 11/25/05, John Logsdon <j.logsdon at quantex-research.com> wrote:
> >
> > I have an ordering and factor problem to which there must be a simple
> > solution!  The version is R 2.0.1  (2004-11-15) on A Linux platform.
> >
> > A data frame H is read in from a .csv file using read.csv with as.is=TRUE.
> >
> > Another data frame HN is constructed from data and I want to compare two
> > columns both named ss of the (sorted) data frames that are the same
> > length.
> >
> > The problem is that HN$ss is always treated as a factor whatever I do
> > while H$ss is treated as an integer, which is what I want.  Somewhere R is
> > making an implicit transformation but I can't see how to correct it.
> >
> > The data are all integers in the range 1:13 - in fact with no gaps.  If I
> > tabulate from H:
> >
> > > table(H$ss)
> >
> >    1    2    3    4    5    6    7    8    9   10   11   12   13
> > 176  176  176  176  176  176  341 8726 8784 8777 8773 8749 8747
> >
> > and for HN:
> >
> > > table(HN$ss)
> >
> >    1   10   11   12   13    2    3    4    5    6    7    8    9
> > 176 8777 8773 8749 8747  176  176  176  176  176  341 8726 8784
> >
> > At some time while constructing HN, I have to make it a character matrix -
> > otherwise gsub doesn't work when removing surplus blanks for example - but
> > I have turned it back into a data frame in the end.
> >
> > If I check the modes, both data frames are lists and both columns are
> > numeric - HN is not reported as a factor.  Yet it appears to be treated as
> > a factor, for example:
> >
> > > table(formatC(H$ss,dig=0,width=2,format="f",flag="0"))
> >
> >   01   02   03   04   05   06   07   08   09   10   11   12   13
> > 176  176  176  176  176  176  341 8726 8784 8777 8773 8749 8747
> > > table(formatC(HN$ss,dig=0,width=2,format="f",flag="0"))
> >
> > yet:
> >
> >    1   10   11   12   13    2    3    4    5    6    7    8    9
> > 176 8777 8773 8749 8747  176  176  176  176  176  341 8726 8784
> > Warning messages:
> > 1: "+" not meaningful for factors in: Ops.factor(x, ifelse(x == 0, 1, 0))
> > 2: "<" not meaningful for factors in: Ops.factor(x, 0)
> >
> > I have tried as.numeric but then I get the factor level rather than name
> > returned:
> >
> > > table(formatC(as.numeric(HN$ss),dig=0,width=2,format="f",flag="0"))
> >
> >   01   02   03   04   05   06   07   08   09   10   11   12   13
> > 176 8777 8773 8749 8747  176  176  176  176  176  341 8726 8784
> >
> > which obviously is a tabulation of the internal levels rather than the
> > data.
> >
> > TIA
> >
> > John
> >
> > John Logsdon                               "Try to make things as simple
> > Quantex Research Ltd, Manchester UK         as possible but not simpler"
> > j.logsdon at quantex-research.com              a.einstein at relativity.org
> > +44(0)161 445 4951/G:+44(0)7717758675       www.quantex-research.com
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>




More information about the R-help mailing list