[R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

Gabor Grothendieck ggrothendieck at gmail.com
Mon Jan 23 06:46:09 CET 2012


On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth <gvfarns at gmail.com> wrote:
> I've been using sqldf heavily lately but have encountered problems
> with ordering of observations or calculating statistics such as max()
> and min() when the variable used is of class Date.
>
> For example, if I run the following code:
>
> =============== begin code =================
> library(sqldf)
> A<-data.frame(Dates=as.Date(c("1994-02-14","1977-02-23","2001-09-18","2009-08-01")),Ret=rnorm(4))
> OrderedA<-sqldf('select * from A order by Dates')
> MaxA<-sqldf('select max(Dates) as Dates from A')[1,1]
> MinA<-sqldf('select min(Dates) as Dates from A')[1,1]
> =============== end code =================
>
> Then the result is this:
>
>> A
>       Dates        Ret
> 1 1994-02-14  1.2414706
> 2 1977-02-23 -0.7728146
> 3 2001-09-18  1.2551331
> 4 2009-08-01 -0.2538359
>
>> OrderedA
>       Dates        Ret
> 1 2001-09-18  1.2551331
> 2 2009-08-01 -0.2538359
> 3 1977-02-23 -0.7728146
> 4 1994-02-14  1.2414706
>
>> MaxA
> [1] "1994-02-14"
>
>> MinA
> [1] "2001-09-18"
>
> Completely wrong order, no warnings issued, and the summary stats are
> wrong as well (but consistent with the ordering).
>
> According to the sqldf manual found at the following URL
>
> http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?
>
> this type of query should work correctly.  Any clue why it is not
> doing so?  User error or bug?
>

You are using an old version.  Update to latest version of R and sqldf.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



More information about the R-help mailing list