[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?

David James d@j025 @end|ng |rom gm@||@com
Fri Jan 5 18:05:39 CET 2007


Hi,

A word of caution:  relying on read.table() to read very large
files (even piecemeal) can be extremely inefficient.  The reason
I implemented the dbWriteTable from a file is exactly to circumvent
this problem.  You can upload a huge data file into the db
extremely fast, but going through read.table() could be considerably
slower.

Perhaps we're considering two slightly different situations:
(1) export a data.frame to sqlite (here, the prepared statement/binding
approach is optimal).
(2) load a csv file into R or SQLite (here using the C code from .import
is optimal).


Thoughts?

--
David


On 1/5/07, Seth Falcon <sfalcon using fhcrc.org> wrote:
> ronggui <ronggui.huang using gmail.com> writes:
>
> > I think there is still one more thins need to do. RSQLite does not
> > take care of the "NA" (my case: na.strings is  Blank fields in the
> > test.txt file ) when import from a file to db table.
>
> You are right that an na.strings argument is missing.  You will find
> that if you use '\N' in your text files, it will be recognized as NA.
>
> This file import feature is implemented by reading the file in C and
> borrows heavily from the SQLite command line tool's .import command.
> With this implementation, changes such as adding a flexible na.strings
> argument will not be trivial to implement.
>
> Now that dbWriteTable (using data.frame) is more efficient, it can be
> used in a straight forward way to load very large text files.  I
> prefer this approach.  And a possibly easier patch is to refactor
> dbWriteTable (file path) such that it does something like the code
> below (and remove the C code entirely):
>
> (untested, approx code)
>
>    con <- file(fname, open="r")
>    on.exit(close(con))
>
>    df <- read.table(con, sep=sep, stringsAsFactors=FALSE, nrows=10,
>                     na.strings=na.strings, header=TRUE)
>    # use DBI helper function here instead
>    header <- gsub(".", "_", names(df), fixed=TRUE)
>    names(df) <- header
>
>    dbWriteTable(db, tablename, df)
>
>    ## Now do the rest in batches
>    done <- FALSE
>    while (!done) {
>        df <- read.table(con, sep=sep, stringsAsFactors=FALSE,
>                         nrows=batch_size, na.strings=na.strings,
>                         header=FALSE)
>        if (nrow(df) < batch_size) {
>            done <- TRUE
>            if (nrow(df) == 0)
>              break
>        }
>        names(df) <- header
>        dbWriteTable(db, tablename, df, append=TRUE)
>    }
>
> + seth
>
> _______________________________________________
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>


-- 
Regards,

--
David




More information about the R-sig-DB mailing list