[R] RODBC and Excel: Wrong Data Type Assumed on Import
Earl F. Glynn
efg at stowers-institute.org
Fri Nov 4 20:44:12 CET 2005
"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in message
news:971536df0511031753y6eb4e271o797e862bc2740e1a at mail.gmail.com...
> You could try using the COM interface rather than the ODBC
> interface. Try code such as this:
>
> library(RDCOMClient)
> xls <- COMCreate("Excel.Application")
> xls[["Workbooks"]]$Open("MySpreadsheet.xls")
> sheet <- xls[["ActiveSheet"]]
> mydata <- sheet[["UsedRange"]][["value"]]
> xls$Quit()
>
> # convert mydata to a character matrix
> mydata.char <- matrix(unlist(mydata), nc = length(xx))
Gabor,
Thank you for that suggestion. I try to avoid COM, but it seems to work
well with this problem.
Because I have empty cells, which are treated as NULLS, the unlist didn't
quite work.
Here's what I did:
library(RDCOMClient)
xls <- COMCreate("Excel.Application")
xls[["Workbooks"]]$Open("U:/efg/lab/R/Plasmid/construct list.xls")
sheet <- xls[["ActiveSheet"]]
mydata <- sheet[["UsedRange"]][["value"]]
xls$Quit()
for (column in 1:length(mydata))
{
cat(column, " ", length(mydata[[column]]), " ",
length(unlist(mydata[[column]])), "\n")
}
The results show that while mydata is a list of columns, if you unlist each
column you'll be short by the number of NULL values.
1 1251 1251
2 1251 1198
3 1251 870
4 1251 327
5 1251 1250
This seemed a bit crude to fix that problem (can someone suggest a more
elegant way?):
mymatrix <- NULL
for (column in 1:length(mydata))
{
# Use lappy to replace NULLs with "" strings, column-by-column
mymatrix <- cbind(mymatrix, lapply(mydata[[column]], function(cell) {
ifelse(is.null(cell), "", cell) } ))
}
# Fix column names
colnames(mymatrix) <- mymatrix[1,]
mymatrix <- mymatrix[-1,]
> mymatrix[273:276,]
Plasmid Number Plasmid
Concentration Comments Lost
[1,] 274 "yxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxy" "1 ug/ul"
"4 mg" ""
[2,] "275a" "xyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyxyx" "1 ug/2
ul" "" ""
[3,] "275b" "xyxyxyxyxyxyxyxyxyxyxyxyx" "1 ug/5
ul" "" ""
[4,] 276 "xyxyxyxyxyxyxyxyxyxyxyxyxyxy" "1 ug/5
ul" "" "Assumed Lost"
Thank you for preserving "275a" and "275b" as the names here.
So, I'd recommend RDCOMClient over RODBC with Excel files. "Being lucky"
shouldn't be part of processing Excel files.
efg
More information about the R-help
mailing list