[R] Data type problem when extract data from SQLite to R by using RSQLite
chen jia
chen_1002 at fisher.osu.edu
Tue Mar 1 03:57:13 CET 2011
Hi Seth,
Thanks for the reply. I provide info from sessionInfo() and about
schema that you ask. Please take a look.
The output from sessionInfo() is
> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UT> sessionInfo()
R version 2.12.2 (2011-02-25)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.2.1 RSQLite_0.9-2 DBI_0.2-5 filehash_2.1-1F-8
LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plyr_1.2.1 RSQLite_0.9-2 DBI_0.2-5 filehash_2.1-1
The .schema of table annual_data3 is
sqlite> .schema annual_data3
CREATE TABLE "annual_data3"(
PERMNO INT,
DATE INT,
CUSIP TEXT,
EXCHCD INT,
SICCD INT,
SHROUT INT,
PRC REAL,
RET REAL,
...
pret_var,
pRET_sd,
nmret,
pya_var,
pya_sd,
nya,
pya_var_ebi,
pya_sd_ebi,
pya_var_ebit,
pya_sd_ebit,
pya_var_ebitda,
pya_sd_ebitda,
logage REAL,
logasset REAL,
...
loglead1stdaret, loglead2stdaret)
Table annual_data3 is created by joining table annual_data2 and
ya_vol. The column pya_var is initially in ya_vol.
dbGetQuery(sql.industry,
"create table annual_data3 as
select a.*, b.pya_var, b.pya_sd,
b.nya, b.pya_var_ebi, b.pya_sd_ebi,
b.pya_var_ebit, b.pya_sd_ebit,
b.pya_var_ebitda, b.pya_sd_ebitda
from annual_data2 as a left join
ya_vol as b
on a.permno = b.permno and
a.year = b.year
order by permno, year")
Table ya_vol is created by
dbGetQuery(sql.industry,
"create table ya_vol as
select PERMNO, year,
variance(ya) as pya_var,
stdev(ya) as pya_sd,
count(*) as nya,
variance(ya_ebi) as pya_var_ebi,
stdev(ya_ebi) as pya_sd_ebi,
variance(ya_ebit) as pya_var_ebit,
stdev(ya_ebit) as pya_sd_ebit,
variance(ya_ebitda) as pya_var_ebitda,
stdev(ya_ebitda) as pya_sd_ebitda
from past_ya
where ya is not null
group by PERMNO, year
order by PERMNO, year")
The schema info of ya_vol is
sqlite> .schema ya_vol
CREATE TABLE ya_vol(
PERMNO INT,
year INT,
pya_var,
pya_sd,
nya,
pya_var_ebi,
pya_sd_ebi,
pya_var_ebit,
pya_sd_ebit,
pya_var_ebitda,
pya_sd_ebitda
);
CREATE INDEX ya_vol_permno_year_idx
on ya_vol (permno,year);
Interestingly, I find that the problem I reported does not for columns
labeled real in the schema info. For example, the type of column RET
never changes no matter what the first observation is.
> str(dbGetQuery(sql.industry,
+ "select RET from annual_data3
+ where RET is not null limit 5"))
'data.frame': 5 obs. of 1 variable:
$ RET: num -0.03354 -0.02113 0.03797 0.0013 -0.00678
>
> str(dbGetQuery(sql.industry,
+ "select RET from annual_data3
+ where RET is null limit 5"))
'data.frame': 5 obs. of 1 variable:
$ RET: num NA NA NA NA NA
> sapply(dbGetQuery(sql.industry,
+ "select RET from annual_data3
+ where RET is null limit 5"),
+ typeof)
RET
"double"
> sapply(dbGetQuery(sql.industry,
+ "select RET from annual_data3
+ where RET is not null limit 5"),
+ typeof)
RET
"double"
I still don't know how to solve this problem for variable pya_var,
please help. Thanks.
Best,
Jia
On Mon, Feb 28, 2011 at 6:48 PM, Seth Falcon <seth at userprimary.net> wrote:
> Hi Jia,
>
> On Mon, Feb 28, 2011 at 12:37 PM, chen jia <chen_1002 at fisher.osu.edu> wrote:
>> When I extract data from SQLite to R, the data types (or modes) of the
>> extracted data seems to be determined by the value of the first row.
>> Please see the following example.
>
> It would help to provide the output of sessionInfo() as well as the
> schema definition for the table in SQLite (or at least description of
> how it was created).
>
> Here's an example that works as you'd like:
>
> > library(RSQLite)
> > db = dbConnect(SQLite(), dbname = ":memory:")
> > dbGetQuery(db, "create table t (a int, b real, c text)")
> > df = data.frame(a=c(NA, 1L, 2L), b=c(NA, 1.1, 2.2), c=c(NA, "x",
> "y"),stringsAsFactors=FALSE)
> > df
> a b c
> 1 NA NA <NA>
> 2 1 1.1 x
> 3 2 2.2 y
> > dbGetPreparedQuery(db, "insert into t values (?, ?, ?)", df)
> > dbGetQuery(db, "select * from t")
> a b c
> 1 NA NA <NA>
> 2 1 1.1 x
> 3 2 2.2 y
> > sapply(dbGetQuery(db, "select * from t"), typeof)
> a b c
> "integer" "double" "character"
> > sapply(dbGetQuery(db, "select * from t limit 1"), typeof)
> a b c
> "integer" "double" "character"
> > sapply(dbGetQuery(db, "select a from t limit 1"), typeof)
> a
> "integer"
> > sapply(dbGetQuery(db, "select a from t limit 2"), typeof)
> a
> "integer"
> > sapply(dbGetQuery(db, "select a from t limit 1"), typeof)
> a
> "integer"
>
>
>> sessionInfo()
> R version 2.11.1 (2010-05-31)
> x86_64-apple-darwin9.8.0
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats graphics grDevices datasets utils methods base
>
> other attached packages:
> [1] RSQLite_0.9-4 DBI_0.2-5
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.1
>
>
>
>
> --
> Seth Falcon | @sfalcon | http://userprimary.net/
>
--
700 Fisher Hall
2100 Neil Ave.
Columbus, Ohio 43210
http://www.fisher.osu.edu/~chen_1002/
More information about the R-help
mailing list