[R] RODBC and Excel: Wrong Data Type Assumed on Import
Gabor Grothendieck
ggrothendieck at gmail.com
Sat Nov 5 08:04:00 CET 2005
On 11/4/05, Earl F. Glynn <efg at stowers-institute.org> wrote:
> "Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in message
> news:971536df0511031753y6eb4e271o797e862bc2740e1a at mail.gmail.com...
> > You could try using the COM interface rather than the ODBC
> > interface. Try code such as this:
> >
> > library(RDCOMClient)
> > xls <- COMCreate("Excel.Application")
> > xls[["Workbooks"]]$Open("MySpreadsheet.xls")
> > sheet <- xls[["ActiveSheet"]]
> > mydata <- sheet[["UsedRange"]][["value"]]
> > xls$Quit()
> >
> > # convert mydata to a character matrix
> > mydata.char <- matrix(unlist(mydata), nc = length(xx))
>
> Gabor,
>
> Thank you for that suggestion. I try to avoid COM, but it seems to work
> well with this problem.
>
> Because I have empty cells, which are treated as NULLS, the unlist didn't
> quite work.
>
> Here's what I did:
>
> library(RDCOMClient)
> xls <- COMCreate("Excel.Application")
> xls[["Workbooks"]]$Open("U:/efg/lab/R/Plasmid/construct list.xls")
> sheet <- xls[["ActiveSheet"]]
> mydata <- sheet[["UsedRange"]][["value"]]
> xls$Quit()
>
> for (column in 1:length(mydata))
> {
> cat(column, " ", length(mydata[[column]]), " ",
> length(unlist(mydata[[column]])), "\n")
> }
>
> The results show that while mydata is a list of columns, if you unlist each
> column you'll be short by the number of NULL values.
>
> 1 1251 1251
> 2 1251 1198
> 3 1251 870
> 4 1251 327
> 5 1251 1250
>
> This seemed a bit crude to fix that problem (can someone suggest a more
> elegant way?):
>
> mymatrix <- NULL
> for (column in 1:length(mydata))
> {
> # Use lappy to replace NULLs with "" strings, column-by-column
> mymatrix <- cbind(mymatrix, lapply(mydata[[column]], function(cell) {
> ifelse(is.null(cell), "", cell) } ))
> }
> # Fix column names
> colnames(mymatrix) <- mymatrix[1,]
> mymatrix <- mymatrix[-1,]
>
> > mymatrix[273:276,]
> Plasmid Number Plasmid
> Concentration Comments Lost
> [1,] 274 "yxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy" "1 ug/ul"
> "4 mg" ""
> [2,] "275a" "xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyx" "1 ug/2
> ul" "" ""
> [3,] "275b" "xyxyxyxyxyxyxyxyxyxyxyxyx" "1 ug/5
> ul" "" ""
> [4,] 276 "xyxyxyxyxyxyxyxyxyxyxyxyxyxy" "1 ug/5
> ul" "" "Assumed Lost"
>
> Thank you for preserving "275a" and "275b" as the names here.
>
> So, I'd recommend RDCOMClient over RODBC with Excel files. "Being lucky"
> shouldn't be part of processing Excel files.
>
You could try something like this which turns the data into
a textConnection which is read using read.table:
con <- textConnection(do.call("paste", mydata))
dd <- read.table(con, header = TRUE, na.strings = "NULL", as.is = TRUE)
You might need to vary the arguments to read.table depending on what
it is you want to get out. Also, I have assumed that none of the
strings contain
spaces though using a sep= arg on paste and read.table could handle that
too.
More information about the R-help
mailing list