[R] RODBC and Excel: Wrong Data Type Assumed on Import

Earl F. Glynn efg at stowers-institute.org
Thu Nov 3 22:04:41 CET 2005


"Kevin Wright" <kwright68 at gmail.com> wrote in message
news:adf71a630511030726v10f1dc77sa7c760b96a27480f at mail.gmail.com...
> >From my experience (somewhat of a guess):

> Excel uses the first 16 rows of data to determine if a column is numeric
or
> character. The data type which is most common in the first 16 rows will
then
> be used for the whole column.

I ran some experiments trying to force RODBC to read column 1 of my
worksheet as character data (the data are mostly numbers with two
exceptions, 275a and 275b, as mentioned earlier).



Here's the base code:



> library(RODBC)

> channel <- odbcConnectExcel("U:/efg/lab/R/Krumlauf-Plasmid/construct
list.xls")

> plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE)

> odbcClose(channel)

> names(plasmid)

[1] "Plasmid Number" "Plasmid"        "Concentration"  "Comments"
"Lost"



When Excel Sheet1 has rows 2:13 as an "X" to attempt to force treatment of
column 1 as character data:



> class(plasmid$"Plasmid Number")

[1] "numeric"

> typeof(plasmid$"Plasmid Number")

[1] "double"

> plasmid$"Plasmid Number"[1:20]

 [1] NA NA NA NA NA NA NA NA NA NA NA NA  2  3  4  5  6  7  8  9



Why would any software with 12 consecutive "X" character strings "assume"
the data are purely numeric?



Add one more "X" so rows 2:14 have an "X" to attempt to force treatment of
column 1 as character data:



> class(plasmid$"Plasmid Number")

[1] "character"

> typeof(plasmid$"Plasmid Number")

[1] "character"

> plasmid$"Plasmid Number"[1:20]

 [1] "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" NA  NA  NA  NA  NA
NA  NA



So RODBC now recognizes "character" Xs in column 1 and then declares all
numbers as invalid?  These are incredibly (bad) assumptions.



I say this is a "bug", but it may be an ODBC problem and not one with "R.
And if this is not an official "bug", then it's a serious design problem.
Minimally, this issue should be described in the R Data Import/Export
document, which everyone is told to read before asking a question.



It's frustrating when packages like this work for "toy" problems, and the
documentation never mentions the pitfalls of real data.


> The gregmisc bundle has a different read.xls function that uses a Perl
> script (xls2csv) and seems to be safer with mixed-type columns.
> Requires a working version of Perl.

Thanks for this suggestion, but I think I'll just convert the Excel
spreadsheet to a .csv and maintain it in that format.

efg




More information about the R-help mailing list