[R] RODBC, missing values, and Excel

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jul 13 05:04:55 CEST 2006


In thinking about this some more I have a better idea.  Use rcom (or
RDCOMClient)
to get a list of the sheet names and then use that to determine which sheet you
need.  Then use read.xls to get it like this assuming that the Excel
file and path are C:\test.xls and that one of the sheets in that spreadsheet
is xyz.  In my version the na.strings had a space at the end so you may
need to change the na.strings= setting:

library(rcom)
xls <- "C:\\test.xls"
oxl <- comCreateObject('Excel.Application')
comSetProperty(oxl, "Visible", TRUE)  # this line optional
owb <- comGetProperty(oxl, "Workbooks")
ob <- comInvoke(owb, "Open", xls)
osheets <- comGetProperty(ob, "Worksheets")
n <- comGetProperty(osheets, "Count")
ithSheetName <- function(i)
	comGetProperty(comGetProperty(osheets, "Item", i), "Name")
sheetNames <- sapply(1:n, ithSheetName)
comInvoke(oxl, "Quit")

library(gdata)
read.xls(xls, match("xyz", sheetNames), na.strings = "na ")


On 7/12/06, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Would it be good enough to just read all the sheets in?
>
> The perl program can do that and although the read.xls R function does not
> interface to that aspect of its functionality its not that difficult to access
> it yourself.  Assume your excel file is in \test.xls .  Just
> switch to that folder.  paste together a command to run the perl
> program, run it, get a list of the file names it produced and read them in:
>
> library(gdata)
> setwd("/")
> cmd <- paste("perl", system.file("perl/xls2csv.pl", package = "gdata"), "test")
> system(cmd)
> ff <- list.files(patt = "test_Sheet.*.csv")
> sapply(ff, read.csv, na.strings = "na ", simplify = FALSE)
>
>
> On 7/12/06, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
> > Hi, Gabor,
> >
> > Thanks for the reply. Perhaps Prof. Ripley will enlighten us as he is
> > the RODBC maintainer.
> >
> > Unfortunately, gdata::read.xls will not work for me (at least I don't
> > think it will) because I need to refer to each worksheet by name and not
> > by number. For example, I need extract data from "Sheet1" and not simply
> > the first sheet.
> >
> > Thanks,
> >
> > --sundar
> >
> > Gabor Grothendieck wrote:
> > > I also got a strange result too (I renamed it sdr.read.xls
> > > to distinguish it from read.xls in gdata and noticed that a
> > > space got into my na's somehow so I used "na " for my
> > > na.strings:
> > >
> > >
> > >>sdr.read.xls("/test.xls", "Sheet2", na.strings = "na ")
> > >
> > >      x
> > > 1 <NA>
> > > 2 <NA>
> > > 3   na
> > > 4   na
> > > 5   na
> > > 6   na
> > > 7 <NA>
> > >
> > > I had more success using read.xls in the gdata package.
> > > Note that we need to install perl first if not already present:
> > >
> > >
> > >>library(gdata)  # for read.xls
> > >>read.xls("/test.xls", 2, na.strings = "na ")
> > >
> > >      x
> > > 1 0.11
> > > 2 0.11
> > > 3   NA
> > > 4   NA
> > > 5   NA
> > > 6   NA
> > > 7 0.11
> > >
> > >
> > >>R.version.string # XP
> > >
> > > [1] "Version 2.3.1 Patched (2006-06-04 r38279)"
> > >
> > >>packageDescription("gdata")$Version
> > >
> > > [1] "2.1.2"
> > >
> > >>packageDescription("RODBC")$Version
> > >
> > > [1] "1.1-7"
> > >
> > >
> > > On 7/12/06, Sundar Dorai-Raj <sundar.dorai-raj at pdf.com> wrote:
> > >
> > >>Hi, all,
> > >>
> > >>I'm trying to use RODBC to read data from Excel. However, I'm having
> > >>trouble converting missing values to NA and rather perplexed by the
> > >>output. Below illustrates my problem:
> > >>
> > >>## DATA - copy to Excel and save as "tmp.xls"
> > >>## tmp.xls!Sheet1
> > >>x
> > >>0.11
> > >>0.11
> > >>na
> > >>na
> > >>na
> > >>0.11
> > >>
> > >>## tmp.xls!Sheet2
> > >>x
> > >>0.11
> > >>0.11
> > >>na
> > >>na
> > >>na
> > >>na
> > >>0.11
> > >>
> > >>## R Code
> > >>read.xls <- function(file, sheet = "Sheet1", ...) {
> > >>  require(RODBC)
> > >>  channel <- odbcConnectExcel(file)
> > >>  sheet <- sprintf("select * from `%s$`", sheet)
> > >>  x <- sqlQuery(channel, sheet, ...)
> > >>  odbcClose(channel)
> > >>  x
> > >>}
> > >>
> > >>read.xls("./tmp.xls", "Sheet1", na.strings = "na")
> > >>## works as expected
> > >>#     x
> > >>#1 0.11
> > >>#2 0.11
> > >>#3   NA
> > >>#4   NA
> > >>#5   NA
> > >>#6 0.11
> > >>
> > >>read.xls("./tmp.xls", "Sheet2", na.strings = "na")
> > >>## Huh? What happened?
> > >>#   x
> > >>#1 NA
> > >>#2 NA
> > >>#3 NA
> > >>#4 NA
> > >>#5 NA
> > >>#6 NA
> > >>#7 NA
> > >>
> > >> > sessionInfo()
> > >>Version 2.3.1 (2006-06-01)
> > >>i386-pc-mingw32
> > >>
> > >>attached base packages:
> > >>[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
> > >>[7] "base"
> > >>
> > >>other attached packages:
> > >>  RODBC
> > >>"1.1-7"
> > >>
> > >>______________________________________________
> > >>R-help at stat.math.ethz.ch mailing list
> > >>https://stat.ethz.ch/mailman/listinfo/r-help
> > >>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> > >>
> > >
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
> >
>



More information about the R-help mailing list