[R] read columns of quoted numbers as factors
Mike Marchywka
marchywka at hotmail.com
Tue Oct 5 13:46:49 CEST 2010
----------------------------------------
> From: pdalgd at gmail.com
> Date: Tue, 5 Oct 2010 13:25:52 +0200
> To: j_hirschorn at yahoo.com
> CC: r-help at r-project.org
> Subject: Re: [R] read columns of quoted numbers as factors
>
>
> On Oct 4, 2010, at 18:39 , james hirschorn wrote:
>
> > Suppose I have a data file (possibly with a huge number of columns), where the
> > columns with factors are coded as "1", "2", "3", etc ... The default behavior of
> > read.table is to convert these columns to integer vectors.
> >
> > Is there a way to get read.table to recognize that columns of quoted numbers
> > represent factors (while unquoted numbers are interpreted as integers), without
> > explicitly setting them with colClasses ?
>
> I don't think there's a simple way, because the modus operandi of read.table is to read everything as character and then see whether it can be converted to numeric, and at that point any quotes will have been lost.
>
> One possibility, somewhat dependent on the exact file format, would be to temporarily set quote="", see which columns contains quote characters, and, on a second pass, read those columns as factors, using a computed colClasses argument. It will break down if you have space-separated columns with quoted multi-word strings, though.
>
>
While this specific example may or may not lend itself to a solution within R,
I would just mention that it is not a faux pas to modify your data file
with something like sed or awk prior to feeding it to some program like R.
Quotes,spaces, commas, etc, may be something that the target app can handle
or it may just be easier to change the format with a familiar tool designed
for that.
More information about the R-help
mailing list