[R] read columns of quoted numbers as factors

David Winsemius dwinsemius at comcast.net
Wed Oct 6 04:22:19 CEST 2010


On Oct 5, 2010, at 8:41 PM, james hirschorn wrote:

> Yes, your solution of setting quote="" would read the multi-word  
> strings
> incorrectly. A more complicated version of your solution should  
> work: First
> check which columns are identified as strings, and then apply your  
> solution to
> the remaining columns.
>
> I'm a newbie at R, but it seems to me that there is a "logical  
> inconsistency" in
> R: write.table puts quotes around numbers when they form a column of  
> factors,
> but does not put quotes for a column of integers.

Factors are internally represented as positive integers, but have a  
separate "layer" of their levels and labels. What I suspect you are  
seeing and calling "numbers" are the character-valued labels.

 > write.table(data.frame(nums=-1:-5, facs= factor(-1:-5)), file="",  
row.names=F)
"nums" "facs"
-1 "-1"
-2 "-2"
-3 "-3"
-4 "-4"
-5 "-5"

That does not seem at all "logically inconsistent" to me.

-- 
David.

> Since read.table is the "dual"
> of write.table it seems that it should treat quoted and unquoted  
> columns
> differently, analogously to write.table. However, there does not  
> even seem to be
> an option to make read.table behave analogously.
>
>
> ----- Original Message ----
> From: peter dalgaard <pdalgd at gmail.com>
> To: james hirschorn <j_hirschorn at yahoo.com>
> Cc: r-help at r-project.org
> Sent: Tue, October 5, 2010 7:25:52 AM
> Subject: Re: [R] read columns of quoted numbers as factors
>
>
> On Oct 4, 2010, at 18:39 , james hirschorn wrote:
>
>> Suppose I have a data file (possibly with a huge number of  
>> columns), where the
>
>> columns with factors are coded as "1", "2", "3", etc ... The  
>> default behavior
>> of
>>
>> read.table is to convert these columns to integer vectors.
>>
>> Is there a way to get read.table to recognize that columns of  
>> quoted numbers
>> represent factors (while unquoted numbers are interpreted as  
>> integers), without
>>
>> explicitly setting them with colClasses ?
>
> I don't think there's a simple way, because the modus operandi of  
> read.table is
> to read everything as character and then see whether it can be  
> converted to
> numeric, and at that point any quotes will have been lost.
>
> One possibility, somewhat dependent on the exact file format, would  
> be to
> temporarily set quote="", see which columns contains quote  
> characters, and, on a
> second pass, read those columns as factors, using  a computed  
> colClasses
> argument. It will break down if you have space-separated columns  
> with quoted
> multi-word strings, though.
>
>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> -- 
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list