[R] RODBC and Excel: Wrong Data Type Assumed on Import
Earl F. Glynn
efg at stowers-institute.org
Thu Nov 3 22:04:41 CET 2005
"Kevin Wright" <kwright68 at gmail.com> wrote in message
news:adf71a630511030726v10f1dc77sa7c760b96a27480f at mail.gmail.com...
> >From my experience (somewhat of a guess):
> Excel uses the first 16 rows of data to determine if a column is numeric
or
> character. The data type which is most common in the first 16 rows will
then
> be used for the whole column.
I ran some experiments trying to force RODBC to read column 1 of my
worksheet as character data (the data are mostly numbers with two
exceptions, 275a and 275b, as mentioned earlier).
Here's the base code:
> library(RODBC)
> channel <- odbcConnectExcel("U:/efg/lab/R/Krumlauf-Plasmid/construct
list.xls")
> plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE)
> odbcClose(channel)
> names(plasmid)
[1] "Plasmid Number" "Plasmid" "Concentration" "Comments"
"Lost"
When Excel Sheet1 has rows 2:13 as an "X" to attempt to force treatment of
column 1 as character data:
> class(plasmid$"Plasmid Number")
[1] "numeric"
> typeof(plasmid$"Plasmid Number")
[1] "double"
> plasmid$"Plasmid Number"[1:20]
[1] NA NA NA NA NA NA NA NA NA NA NA NA 2 3 4 5 6 7 8 9
Why would any software with 12 consecutive "X" character strings "assume"
the data are purely numeric?
Add one more "X" so rows 2:14 have an "X" to attempt to force treatment of
column 1 as character data:
> class(plasmid$"Plasmid Number")
[1] "character"
> typeof(plasmid$"Plasmid Number")
[1] "character"
> plasmid$"Plasmid Number"[1:20]
[1] "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" "X" NA NA NA NA NA
NA NA
So RODBC now recognizes "character" Xs in column 1 and then declares all
numbers as invalid? These are incredibly (bad) assumptions.
I say this is a "bug", but it may be an ODBC problem and not one with "R.
And if this is not an official "bug", then it's a serious design problem.
Minimally, this issue should be described in the R Data Import/Export
document, which everyone is told to read before asking a question.
It's frustrating when packages like this work for "toy" problems, and the
documentation never mentions the pitfalls of real data.
> The gregmisc bundle has a different read.xls function that uses a Perl
> script (xls2csv) and seems to be safer with mixed-type columns.
> Requires a working version of Perl.
Thanks for this suggestion, but I think I'll just convert the Excel
spreadsheet to a .csv and maintain it in that format.
efg
More information about the R-help
mailing list