[R-sig-DB] [R] SQLite: When reading a table, a "\r" is padded onto the last column. Why?

ronggui ronggui.huang at gmail.com
Fri Jan 5 16:58:47 CET 2007


On 1/5/07, Seth Falcon <sfalcon at fhcrc.org> wrote:
> ronggui <ronggui.huang at gmail.com> writes:
>
> > I think there is still one more thins need to do. RSQLite does not
> > take care of the "NA" (my case: na.strings is  Blank fields in the
> > test.txt file ) when import from a file to db table.
>
> You are right that an na.strings argument is missing.  You will find
> that if you use '\N' in your text files, it will be recognized as NA.

And if NA is used to represent as missing in txt file, it will be
recognized as NA too.

> This file import feature is implemented by reading the file in C and
> borrows heavily from the SQLite command line tool's .import command.
> With this implementation, changes such as adding a flexible na.strings
> argument will not be trivial to implement.

Oops, so let it be, I should not ask for too much:)
I have no idea with C, so I can't give a patch.

> Now that dbWriteTable (using data.frame) is more efficient, it can be
> used in a straight forward way to load very large text files.  I
> prefer this approach.  And a possibly easier patch is to refactor
> dbWriteTable (file path) such that it does something like the code
> below (and remove the C code entirely):

This is the way I follow before dbWriteTable can read file directly:)

Thanks for your quick reply:)

Ronggui Huang

> (untested, approx code)
>
>     con <- file(fname, open="r")
>     on.exit(close(con))
>
>     df <- read.table(con, sep=sep, stringsAsFactors=FALSE, nrows=10,
>                      na.strings=na.strings, header=TRUE)
>     # use DBI helper function here instead
>     header <- gsub(".", "_", names(df), fixed=TRUE)
>     names(df) <- header
>
>     dbWriteTable(db, tablename, df)
>
>     ## Now do the rest in batches
>     done <- FALSE
>     while (!done) {
>         df <- read.table(con, sep=sep, stringsAsFactors=FALSE,
>                          nrows=batch_size, na.strings=na.strings,
>                          header=FALSE)
>         if (nrow(df) < batch_size) {
>             done <- TRUE
>             if (nrow(df) == 0)
>               break
>         }
>         names(df) <- header
>         dbWriteTable(db, tablename, df, append=TRUE)
>     }
>
> + seth
>


-- 
Ronggui Huang
Department of Sociology
Fudan University, Shanghai, China
黄荣贵
复旦大学社会学系




More information about the R-sig-DB mailing list