[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?

Seth Falcon @|@|con @end|ng |rom |hcrc@org
Fri Jan 5 16:34:12 CET 2007


ronggui <ronggui.huang using gmail.com> writes:

> I think there is still one more thins need to do. RSQLite does not
> take care of the "NA" (my case: na.strings is  Blank fields in the
> test.txt file ) when import from a file to db table.

You are right that an na.strings argument is missing.  You will find
that if you use '\N' in your text files, it will be recognized as NA.

This file import feature is implemented by reading the file in C and
borrows heavily from the SQLite command line tool's .import command.
With this implementation, changes such as adding a flexible na.strings
argument will not be trivial to implement.

Now that dbWriteTable (using data.frame) is more efficient, it can be
used in a straight forward way to load very large text files.  I
prefer this approach.  And a possibly easier patch is to refactor
dbWriteTable (file path) such that it does something like the code
below (and remove the C code entirely):

(untested, approx code)

    con <- file(fname, open="r")
    on.exit(close(con))

    df <- read.table(con, sep=sep, stringsAsFactors=FALSE, nrows=10,
                     na.strings=na.strings, header=TRUE)
    # use DBI helper function here instead
    header <- gsub(".", "_", names(df), fixed=TRUE) 
    names(df) <- header

    dbWriteTable(db, tablename, df)

    ## Now do the rest in batches
    done <- FALSE
    while (!done) {
        df <- read.table(con, sep=sep, stringsAsFactors=FALSE,
                         nrows=batch_size, na.strings=na.strings,
                         header=FALSE)
        if (nrow(df) < batch_size) {
            done <- TRUE
            if (nrow(df) == 0)
              break
        }
        names(df) <- header
        dbWriteTable(db, tablename, df, append=TRUE)
    }

+ seth




More information about the R-sig-DB mailing list