[R] read columns of quoted numbers as factors
james hirschorn
j_hirschorn at yahoo.com
Wed Oct 6 02:41:18 CEST 2010
Yes, your solution of setting quote="" would read the multi-word strings
incorrectly. A more complicated version of your solution should work: First
check which columns are identified as strings, and then apply your solution to
the remaining columns.
I'm a newbie at R, but it seems to me that there is a "logical inconsistency" in
R: write.table puts quotes around numbers when they form a column of factors,
but does not put quotes for a column of integers. Since read.table is the "dual"
of write.table it seems that it should treat quoted and unquoted columns
differently, analogously to write.table. However, there does not even seem to be
an option to make read.table behave analogously.
----- Original Message ----
From: peter dalgaard <pdalgd at gmail.com>
To: james hirschorn <j_hirschorn at yahoo.com>
Cc: r-help at r-project.org
Sent: Tue, October 5, 2010 7:25:52 AM
Subject: Re: [R] read columns of quoted numbers as factors
On Oct 4, 2010, at 18:39 , james hirschorn wrote:
> Suppose I have a data file (possibly with a huge number of columns), where the
> columns with factors are coded as "1", "2", "3", etc ... The default behavior
>of
>
> read.table is to convert these columns to integer vectors.
>
> Is there a way to get read.table to recognize that columns of quoted numbers
> represent factors (while unquoted numbers are interpreted as integers), without
>
> explicitly setting them with colClasses ?
I don't think there's a simple way, because the modus operandi of read.table is
to read everything as character and then see whether it can be converted to
numeric, and at that point any quotes will have been lost.
One possibility, somewhat dependent on the exact file format, would be to
temporarily set quote="", see which columns contains quote characters, and, on a
second pass, read those columns as factors, using a computed colClasses
argument. It will break down if you have space-separated columns with quoted
multi-word strings, though.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list