[R] sqldf file specification, non-ASCII
Duncan Murdoch
murdoch at stats.uwo.ca
Thu Apr 3 16:58:12 CEST 2008
On 4/3/2008 10:22 AM, Peter Jepsen wrote:
> Dear R-Listers,
>
> I am a Windows user (R 2.6.2) using the development version of sqldf to
> try to read a 3GB file originally stored in .sas7bdat-format. I convert
> it to comma-delimited ASCII format with StatTransfer before trying to
> import just the rows I need into R. The problem is that I get this
> error:
>
>> f <- file("hugedata.csv")
>> DF <- sqldf("select * from f where C_OPR like 'KKA2%'",
> file.format=list(header=T, row.names=F))
> Error in try({ :
> RS-DBI driver: (RS_sqlite_import: hugedata.csv line 1562740 expected
> 52 columns of data but found 19)
> Error in sqliteExecStatement(con, statement, bind.data) :
> RS-DBI driver: (error in statement: no such table: f)
That error message looks pretty clear: there's a problem on line
1562740. Can you look at that line and spot what the problem is? If
you don't have a text editor that can handle big files, you should be
able to do it with something like this:
f <- file("hugedata.csv", "r")
skip <- 1562739
while (skip > 10000) {
junk <- readLines(f, 10000)
skip <- skip - 10000
}
junk <- readLines(f, skip)
readLines(f, 1)
>
> Now, I know that my SAS-using colleagues are able to use this file with
> SAS, so I was wondering whether StatTransfer'ing it to the SAS XPORT
> format which can be read with the 'read.xport' function in the 'foreign'
> package would be a better approach.
R can usually read CSV files without a problem. You've likely got a
problem in your file on that line; you just need to figure out what it
is, and fix it. (It's possible the sqldf function has a bug, but I'd
suspect the file, first.)
Duncan Murdoch
More information about the R-help
mailing list