[R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.

Grant Farnsworth gvfarns at gmail.com
Mon Jan 23 08:25:52 CET 2012


On Mon, Jan 23, 2012 at 12:46 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth <gvfarns at gmail.com> wrote:
>> I've been using sqldf heavily lately but have encountered problems
>> with ordering of observations or calculating statistics such as max()
>> and min() when the variable used is of class Date.
>>
>> For example, if I run the following code:
>>
>> =============== begin code =================
>> library(sqldf)
>> A<-data.frame(Dates=as.Date(c("1994-02-14","1977-02-23","2001-09-18","2009-08-01")),Ret=rnorm(4))
>> OrderedA<-sqldf('select * from A order by Dates')
>> MaxA<-sqldf('select max(Dates) as Dates from A')[1,1]
>> MinA<-sqldf('select min(Dates) as Dates from A')[1,1]
>> =============== end code =================
>>
>> Then the result is this:
>>
>>> A
>>       Dates        Ret
>> 1 1994-02-14  1.2414706
>> 2 1977-02-23 -0.7728146
>> 3 2001-09-18  1.2551331
>> 4 2009-08-01 -0.2538359
>>
>>> OrderedA
>>       Dates        Ret
>> 1 2001-09-18  1.2551331
>> 2 2009-08-01 -0.2538359
>> 3 1977-02-23 -0.7728146
>> 4 1994-02-14  1.2414706
>>
>>> MaxA
>> [1] "1994-02-14"
>>
>>> MinA
>> [1] "2001-09-18"
>>
>> Completely wrong order, no warnings issued, and the summary stats are
>> wrong as well (but consistent with the ordering).
>>
>> According to the sqldf manual found at the following URL
>>
>> http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables?
>>
>> this type of query should work correctly.  Any clue why it is not
>> doing so?  User error or bug?
>>
>
> You are using an old version.  Update to latest version of R and sqldf.
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com


Thanks, that worked.  Known bug, I guess.



More information about the R-help mailing list