[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?
David James
d@j025 @end|ng |rom gm@||@com
Fri Jan 5 18:05:39 CET 2007
Hi,
A word of caution: relying on read.table() to read very large
files (even piecemeal) can be extremely inefficient. The reason
I implemented the dbWriteTable from a file is exactly to circumvent
this problem. You can upload a huge data file into the db
extremely fast, but going through read.table() could be considerably
slower.
Perhaps we're considering two slightly different situations:
(1) export a data.frame to sqlite (here, the prepared statement/binding
approach is optimal).
(2) load a csv file into R or SQLite (here using the C code from .import
is optimal).
Thoughts?
--
David
On 1/5/07, Seth Falcon <sfalcon using fhcrc.org> wrote:
> ronggui <ronggui.huang using gmail.com> writes:
>
> > I think there is still one more thins need to do. RSQLite does not
> > take care of the "NA" (my case: na.strings is Blank fields in the
> > test.txt file ) when import from a file to db table.
>
> You are right that an na.strings argument is missing. You will find
> that if you use '\N' in your text files, it will be recognized as NA.
>
> This file import feature is implemented by reading the file in C and
> borrows heavily from the SQLite command line tool's .import command.
> With this implementation, changes such as adding a flexible na.strings
> argument will not be trivial to implement.
>
> Now that dbWriteTable (using data.frame) is more efficient, it can be
> used in a straight forward way to load very large text files. I
> prefer this approach. And a possibly easier patch is to refactor
> dbWriteTable (file path) such that it does something like the code
> below (and remove the C code entirely):
>
> (untested, approx code)
>
> con <- file(fname, open="r")
> on.exit(close(con))
>
> df <- read.table(con, sep=sep, stringsAsFactors=FALSE, nrows=10,
> na.strings=na.strings, header=TRUE)
> # use DBI helper function here instead
> header <- gsub(".", "_", names(df), fixed=TRUE)
> names(df) <- header
>
> dbWriteTable(db, tablename, df)
>
> ## Now do the rest in batches
> done <- FALSE
> while (!done) {
> df <- read.table(con, sep=sep, stringsAsFactors=FALSE,
> nrows=batch_size, na.strings=na.strings,
> header=FALSE)
> if (nrow(df) < batch_size) {
> done <- TRUE
> if (nrow(df) == 0)
> break
> }
> names(df) <- header
> dbWriteTable(db, tablename, df, append=TRUE)
> }
>
> + seth
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
--
Regards,
--
David
More information about the R-sig-DB
mailing list