[R] Ordering problem
John Logsdon
j.logsdon at quantex-research.com
Fri Nov 25 13:25:11 CET 2005
Thanks to Florence but it needs a little modification. However as I have
now discovered the str() command, things are looking up.:))
I have a character matrix so I() just leaves it as characters whereas I
want the various columns to be integers or whatever they contain.
To take Florence's example slightly extended:
> v1<-c(1,2,3);v2<-c("a","b","c");v3<-c("1","2","3")
Note that the third vector is a character with numerical contents.
> data.frame(v1,v2,v3)
v1 v2 v3
1 1 a 1
2 2 b 2
3 3 c 3
so it looks OK, but
> str(data.frame(v1,v2,v3))
`data.frame': 3 obs. of 3 variables:
$ v1: num 1 2 3
$ v2: Factor w/ 3 levels "a","b","c": 1 2 3
$ v3: Factor w/ 3 levels "1","2","3": 1 2 3
reveals the nasty truth!
whereas
> str(data.frame(v1,v2,I(v3)))
`data.frame': 3 obs. of 3 variables:
$ v1: num 1 2 3
$ v2: Factor w/ 3 levels "a","b","c": 1 2 3
$ v3:Class 'AsIs' chr [1:3] "1" "2" "3"
just keeps the character v3 as characters. I want it to be interpreted as
numeric so:
> str(data.frame(v1,v2,as.numeric(v3)))
`data.frame': 3 obs. of 3 variables:
$ v1 : num 1 2 3
$ v2 : Factor w/ 3 levels "a","b","c": 1 2 3
$ as.numeric.v3.: num 1 2 3
actually gives me what I need.
The only problem is that I have to do everything column by column and
there are 15 cols all. So it makes particularly ugly coding to reproduce
an as.is read from a .csv file.
The other solutions from Baz and Carlos would also work of course - but
they are still pretty horrible. Perhaps another way to do this is to
write it out using cat then read it in again using as.is=TRUE!! ;)
Thanks to one and all
Best wishes
John
John Logsdon "Try to make things as simple
Quantex Research Ltd, Manchester UK as possible but not simpler"
j.logsdon at quantex-research.com a.einstein at relativity.org
+44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com
On Fri, 25 Nov 2005, Florence Combes wrote:
> John,
>
> at ?factor, you can see :
>
> " Be careful only to compare factors with the
> same set of levels (in the same order). In particular,
> 'as.numeric' applied to a factor is meaningless, and may happen by
> implicit coercion. To "revert" a factor 'f' to its original
> numeric values, 'as.numeric(levels(f))[f]' is recommended and
> slightly more efficient than 'as.numeric(as.character(f))'. "
>
> 'as.numeric(levels(f))[f]' worked well for me in the similar situation i.e.
> to get back numeric values from a factor type.
> But see also the I() "option" of the data.frame() function, which allows you
> not to obtain a factor (from a character vector only) if it is not what you
> want.
>
> from ?data.frame :
>
> "Objects passed to 'data.frame' should have the same number of
> rows, but atomic vectors, factors and character vectors protected
> by 'I' will be recycled a whole number of times if necessary."
>
>
> see this example:
> --------------------------------------------------
> > v1<-c(1,2,3)
> > v2<-c("a","b","c")
> > df.A<-data.frame(v1,v2)
> > str(df.A)
> `data.frame': 3 obs. of 2 variables:
> $ v1: num 1 2 3
> $ v2: Factor w/ 3 levels "a","b","c": 1 2 3
> > df.B<-data.frame(v1,I(v2))
> > str(df.B)
> `data.frame': 3 obs. of 2 variables:
> $ v1: num 1 2 3
> $ v2:Class 'AsIs' chr [1:3] "a" "b" "c"
> -------------------------------------------------
>
> hope this helps,
>
> Florence.
>
>
>
>
>
> On 11/25/05, John Logsdon <j.logsdon at quantex-research.com> wrote:
> >
> > I have an ordering and factor problem to which there must be a simple
> > solution! The version is R 2.0.1 (2004-11-15) on A Linux platform.
> >
> > A data frame H is read in from a .csv file using read.csv with as.is=TRUE.
> >
> > Another data frame HN is constructed from data and I want to compare two
> > columns both named ss of the (sorted) data frames that are the same
> > length.
> >
> > The problem is that HN$ss is always treated as a factor whatever I do
> > while H$ss is treated as an integer, which is what I want. Somewhere R is
> > making an implicit transformation but I can't see how to correct it.
> >
> > The data are all integers in the range 1:13 - in fact with no gaps. If I
> > tabulate from H:
> >
> > > table(H$ss)
> >
> > 1 2 3 4 5 6 7 8 9 10 11 12 13
> > 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747
> >
> > and for HN:
> >
> > > table(HN$ss)
> >
> > 1 10 11 12 13 2 3 4 5 6 7 8 9
> > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784
> >
> > At some time while constructing HN, I have to make it a character matrix -
> > otherwise gsub doesn't work when removing surplus blanks for example - but
> > I have turned it back into a data frame in the end.
> >
> > If I check the modes, both data frames are lists and both columns are
> > numeric - HN is not reported as a factor. Yet it appears to be treated as
> > a factor, for example:
> >
> > > table(formatC(H$ss,dig=0,width=2,format="f",flag="0"))
> >
> > 01 02 03 04 05 06 07 08 09 10 11 12 13
> > 176 176 176 176 176 176 341 8726 8784 8777 8773 8749 8747
> > > table(formatC(HN$ss,dig=0,width=2,format="f",flag="0"))
> >
> > yet:
> >
> > 1 10 11 12 13 2 3 4 5 6 7 8 9
> > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784
> > Warning messages:
> > 1: "+" not meaningful for factors in: Ops.factor(x, ifelse(x == 0, 1, 0))
> > 2: "<" not meaningful for factors in: Ops.factor(x, 0)
> >
> > I have tried as.numeric but then I get the factor level rather than name
> > returned:
> >
> > > table(formatC(as.numeric(HN$ss),dig=0,width=2,format="f",flag="0"))
> >
> > 01 02 03 04 05 06 07 08 09 10 11 12 13
> > 176 8777 8773 8749 8747 176 176 176 176 176 341 8726 8784
> >
> > which obviously is a tabulation of the internal levels rather than the
> > data.
> >
> > TIA
> >
> > John
> >
> > John Logsdon "Try to make things as simple
> > Quantex Research Ltd, Manchester UK as possible but not simpler"
> > j.logsdon at quantex-research.com a.einstein at relativity.org
> > +44(0)161 445 4951/G:+44(0)7717758675 www.quantex-research.com
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> >
>
More information about the R-help
mailing list