[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?
Seth Falcon
@|@|con @end|ng |rom |hcrc@org
Fri Jan 5 16:34:12 CET 2007
ronggui <ronggui.huang using gmail.com> writes:
> I think there is still one more thins need to do. RSQLite does not
> take care of the "NA" (my case: na.strings is Blank fields in the
> test.txt file ) when import from a file to db table.
You are right that an na.strings argument is missing. You will find
that if you use '\N' in your text files, it will be recognized as NA.
This file import feature is implemented by reading the file in C and
borrows heavily from the SQLite command line tool's .import command.
With this implementation, changes such as adding a flexible na.strings
argument will not be trivial to implement.
Now that dbWriteTable (using data.frame) is more efficient, it can be
used in a straight forward way to load very large text files. I
prefer this approach. And a possibly easier patch is to refactor
dbWriteTable (file path) such that it does something like the code
below (and remove the C code entirely):
(untested, approx code)
con <- file(fname, open="r")
on.exit(close(con))
df <- read.table(con, sep=sep, stringsAsFactors=FALSE, nrows=10,
na.strings=na.strings, header=TRUE)
# use DBI helper function here instead
header <- gsub(".", "_", names(df), fixed=TRUE)
names(df) <- header
dbWriteTable(db, tablename, df)
## Now do the rest in batches
done <- FALSE
while (!done) {
df <- read.table(con, sep=sep, stringsAsFactors=FALSE,
nrows=batch_size, na.strings=na.strings,
header=FALSE)
if (nrow(df) < batch_size) {
done <- TRUE
if (nrow(df) == 0)
break
}
names(df) <- header
dbWriteTable(db, tablename, df, append=TRUE)
}
+ seth
More information about the R-sig-DB
mailing list