[R] Data type problem when extract data from SQLite to R by using RSQLite
Seth Falcon
seth at userprimary.net
Tue Mar 1 07:16:42 CET 2011
Hi Jia,
On Mon, Feb 28, 2011 at 6:57 PM, chen jia <chen_1002 at fisher.osu.edu> wrote:
> The .schema of table annual_data3 is
> sqlite> .schema annual_data3
> CREATE TABLE "annual_data3"(
> PERMNO INT,
> DATE INT,
> CUSIP TEXT,
> EXCHCD INT,
> SICCD INT,
> SHROUT INT,
> PRC REAL,
> RET REAL,
> ...
> pret_var,
> pRET_sd,
> nmret,
> pya_var,
[snip]
Is there a reason that you've told SQLite the expected data type for
only some of the columns?
> Interestingly, I find that the problem I reported does not for columns
> labeled real in the schema info. For example, the type of column RET
> never changes no matter what the first observation is.
Yes, that is expected and I think it is the solution to your problem:
setup your schema so that all columns have a declared type. For some
details on SQLite's type system see
http://www.sqlite.org/datatype3.html.
RSQLite currently maps NA values to NULL in the database. Pulling
data out of a SELECT query, RSQLite uses the sqlite3_column_type
SQLite API to determine the data type and map it to an R type. If
NULL is encountered, then the schema is inspected using
sqlite3_column_decltype to attempt to obtain a type. If that fails,
the data is mapped to a character vector at the R level. The type
selection is done once after the first row has been fetched.
To work around this you can:
- make sure your schema has well defined
types (which will help SQLite perform its operations);
- check whether the returned column has the expected type and convert
if needed at the R level.
- remove NA/NULL values from the db or decide on a different way of
encoding them (e.g you might be able to use -1 in the db in some
situation to indicate missing). Your R code would then need to map
these to proper NA.
Hope that helps.
+ seth
--
Seth Falcon | @sfalcon | http://userprimary.net/
More information about the R-help
mailing list