[R] RODBC and Excel: Wrong Data Type Assumed on Import

Gabor Grothendieck ggrothendieck at gmail.com
Fri Nov 4 02:53:04 CET 2005


You could try using the COM interface rather than the ODBC
interface.  Try code such as this:

library(RDCOMClient)
xls <- COMCreate("Excel.Application")
xls[["Workbooks"]]$Open("MySpreadsheet.xls")
sheet <- xls[["ActiveSheet"]]
mydata <- sheet[["UsedRange"]][["value"]]
xls$Quit()

# convert mydata to a character matrix
mydata.char <- matrix(unlist(mydata), nc = length(xx))



On 11/3/05, Kevin Wright <kwright68 at gmail.com> wrote:
> >From my experience (somewhat of a guess):
>
> 1.
>
> Excel uses the first 16 rows of data to determine if a column is numeric or
> character. The data type which is most common in the first 16 rows will then
> be used for the whole column. If you sort the data so that at least the
> first 9 rows have character data, you may find this allows the data to be
> interpreted as character. There is supposedly a registy setting that can
> control how many lines to use (instead of 16), but I have not had success
> with the setting. I suspect that ODBC uses JET4, which may be the real
> source of the problem. See more here:
> http://www.dicks-blog.com/archives/2004/06/03/external-data-mixed-data-types/
>
> 2.
>
> The gregmisc bundle has a different read.xls function that uses a Perl
> script (xls2csv) and seems to be safer with mixed-type columns.
> Requires a working version of Perl.
>
> Best,
>
> Kevin Wright
>
>
>
> The first column in my Excel sheet has mostly numbers but I need to treat it
> as character data:
>
> > library(RODBC)
> <http://tolstoy.newcastle.edu.au/R/help/05/09/11324.html#14938qlink1>
> *> channel <- odbcConnectExcel("U:/efg/lab/R/Plasmid/construct list.xls") *
> *> plasmid <- sqlFetch(channel,"Sheet1", as.is=TRUE) *
> *> odbcClose(channel) *
>
> > names(plasmid)
>
> [1] "Plasmid Number" "Plasmid" "Concentration" "Comments" "Lost"
>
> # How is the type decided? I need a character type.
> > class(plasmid$"Plasmid Number")
>
> [1] "numeric"
> > typeof(plasmid$"Plasmid Number")
>
> [1] "double"
>
> > plasmid$"Plasmid Number"[273:276]
>
> [1] 274 NA NA 276
>
> The two NAs are supposed to be 275a and 275b. I tried the "as.is=TRUE" but
> that didn't help.
>
> I consulted Section 4, Relational databases, in the R Data Import/Export
> document (for Version 2.2.0).
>
> Section 4.2.2, Data types, was not helpful. In particular, this did not seem
> helpful: "The more comprehensive of the R interface packages hide the type
> conversion issues from the user."
>
> Section 4.3.2, Package RODBC, provided a "simple example of using ODBC ..
> with a(sic) Excel spreadsheet" but is silent on how to control the data type
> on import. Could the documentation be expanded to address this issue?
>
> I really need to show "Plasmid 275a" and "Plasmid 275b" instead of "Plasmid
> NA".
>
> Thanks for any help with this.
>
> efg
>
> --
> Earl F. Glynn
> Scientific Programmer
> Bioinformatics Department
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list