[R-sig-DB] NULL data not mapped to NA with RODBC on 64-bit Mac OS X

Harlan Harris h@r|@n @end|ng |rom h@rr|@@n@me
Tue May 3 21:25:13 CEST 2011


(Missed this until now! Sorry for the delay in responding!)

You could be right about the solution I'm suggesting being wrong, but I
don't know. Does anyone else have any thoughts on the matter? It works for
me....!

I don't know enough to answer your other question, I'm afraid. It seems to
me as if the nature of 64-bit equality is the real problem here...

I'd really like this to get resolved officially. I've done the best I can to
write a patch. Prof. Ripley, would you be able to weigh in on this?

 -Harlan


On Mon, Apr 18, 2011 at 9:20 PM, Tomoaki NISHIYAMA <
tomoakin using kenroku.kanazawa-u.ac.jp> wrote:

> Hi Harlan,
>
> I don't think this is a right way to fix.
> Because
> (int64_t) 0xFFFFFFFF == 4294967295 != -1
> the bug resides on the code setting
> ColData[i].IndPtr[row].
> Changing the way to interpret it is a wrong solution.
> Though, it might work as a workaround for the bug in other parts.
>
> The commented-out warning() in the code was used to confirm that without
> the casts,
>
>
> You did see only the lower 32 bits?
>
> --
>
> Tomoaki NISHIYAMA
>
>
> Advanced Science Research Center,
>
> Kanazawa University,
>
> 13-1 Takara-machi,
>
> Kanazawa, 920-0934, Japan
>
>
> On 2011/04/19, at 1:31, Harlan Harris wrote:
>
> OK, with the help of a rusty C coder at work, I've confirmed that the
> solution the Python crew figured out also seems to work for me. I have no
> idea if it breaks the 32-bit MacOS build, or any other build. Note that this
> seems to be a very subtle effect of the way the compiler and header files
> are interacting, as far as I can tell. The commented-out warning() in the
> code was used to confirm that without the casts, the len variable and
> SQL_NULL_DATA are, in fact, the same, and that despite their being the same,
> the equality test fails.
>
> A patch is attached. All of the len variables are cast to ints. (Note that
> casting to long, which should work, doesn't.) Would someone please apply to
> patch to the source tree and do some regression testing?
>
> -Harlan
>
>
> On Thu, Mar 31, 2011 at 9:35 AM, Harlan Harris <harlan using harris.name> wrote:
>
>> Prof. Ripley? Anyone? Should I cross-post this to R-Help? If someone would
>> be willing to fix this for pay, please email me a proposal and I'll see if
>> my employer would be willing to sponsor a fix.
>>
>>  -Harlan
>>
>>
>> On Tue, Mar 29, 2011 at 9:43 AM, Harlan Harris <harlan using harris.name>wrote:
>>
>>> Following up on this... Prof. Ripley? Is this something you can address?
>>>
>>>  -Harlan
>>>
>>>
>>> On Thu, Mar 24, 2011 at 9:54 AM, Harlan Harris <harlan using harris.name>wrote:
>>>
>>>> Thanks for the reply, Marc!
>>>>
>>>> Ah, it does appear I'm booting in 32-bit mode. From the terminal:
>>>>
>>>> > uname -a
>>>> Darwin Harlan-Harris-MacBook-Pro.local 10.6.0 Darwin Kernel Version
>>>> 10.6.0: Wed Nov 10 18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386
>>>>
>>>> I can usually run stuff in 32-bit mode, but sometimes I would like the
>>>> ability to use the extra RAM. Also, certain interfaces to R (such as the new
>>>> RStudio) will run R in 64-bit mode, so for now, I can't use RStudio when I'm
>>>> pulling data via ODBC that might have NULLs. I think Rscript also wants to
>>>> run in 64-bit mode.
>>>>
>>>>  -Harlan
>>>>
>>>>
>>>> On Thu, Mar 24, 2011 at 9:09 AM, Marc Schwartz <marc_schwartz using me.com>wrote:
>>>>
>>>>> Hi Harlan,
>>>>>
>>>>> I suspect that we will need to wait for Prof. Ripley to be able to look
>>>>> into this further. There may be subtle issues at play here and I would defer
>>>>> to his more intimate knowledge of the ODBC tool chain.
>>>>>
>>>>> I do know that the Actual driver is both 32 and 64 bit in a single
>>>>> installation. However, I run R in 32 bit mode (don't need the extra address
>>>>> space, even though I now have 8Gb on my MacBook Pro), so have not
>>>>> encountered the 64 bit issues that you report here.
>>>>>
>>>>> I have the same header file. Note that SQLLEN is a signed declaration
>>>>> below. It is defined as a 'long' generally and as a INT64 for 64 bit
>>>>> Windows. It would be 'unsigned long' or UINT64, respectively for an unsigned
>>>>> value.
>>>>>
>>>>> One question for you however, which is do you know if you are booting
>>>>> the 32 bit or 64 bit OSX kernel on your Mac? Macs prior to mid-2010 (like
>>>>> mine) running Snow Leopard typically default to the 32 bit kernel, though
>>>>> can be 64 bit 'capable'. It depends upon various factors including the
>>>>> presence of a 64 bit EFI. Newer Macs now boot into the 64 bit kernel by
>>>>> default. I have seen some problems reported on older Macs where the default
>>>>> kernel was changed from 32 bit to 64 bit, resulting in subtle and not so
>>>>> subtle problems with some drivers. Booting to the 32 bit kernel still allows
>>>>> you to run 64 bit apps under Snow Leopard, just that the kernel and
>>>>> extensions are 32 bit.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Marc
>>>>>
>>>>>
>>>>> On Mar 23, 2011, at 2:29 PM, Harlan Harris wrote:
>>>>>
>>>>> > Any thoughts? Is there someone else or somewhere else I might ask?
>>>>> >
>>>>> > -Harlan
>>>>> >
>>>>> > On Tue, Mar 22, 2011 at 1:31 PM, Harlan Harris <harlan using harris.name>
>>>>> wrote:
>>>>> >
>>>>> >> Hello,
>>>>> >>
>>>>> >> I seem to have hit a bug in RODBC on 64-bit versions of R on Mac OS
>>>>> X. I'm
>>>>> >> running R 2.12.1, and using R to talk to Oracle and other databases
>>>>> using
>>>>> >> RODBC 1.3-2. In 32-bit mode (R32), everything works. (The ODBC
>>>>> drivers are
>>>>> >> the ones from Actual.) In 64-bit mode (R), it seems as if NULL data
>>>>> is not
>>>>> >> being treated as missing. For example, a NULL value in an integer
>>>>> column is
>>>>> >> being returned as a 0 instead, which is wrong/bad.
>>>>> >>
>>>>> >> Diving into the code and Google a bit, it seems like an issue that's
>>>>> been
>>>>> >> seen before in other contexts. Here's the relevant C code, from
>>>>> >> RODBCFetchRows:
>>>>> >>
>>>>> >>        for(row = thisHandle->rowsUsed;
>>>>> >>        row < thisHandle->rowsFetched && j <= maximum;
>>>>> >>        j++, row++)
>>>>> >>        {
>>>>> >>        thisHandle->rowsUsed++;
>>>>> >>        if(j > blksize) {
>>>>> >>            blksize *= 2;
>>>>> >>            for (i = 0; i < nc; i++)
>>>>> >>            SET_VECTOR_ELT(data, i,
>>>>> >>                       lengthgets(VECTOR_ELT(data, i), blksize));
>>>>> >>        }
>>>>> >>        for (i = 0; i < nc; i++) {
>>>>> >>            SQLLEN len = thisHandle->ColData[i].IndPtr[row];
>>>>> >>            switch(thisHandle->ColData[i].DataType) {
>>>>> >>            case SQL_DOUBLE:
>>>>> >>            REAL(VECTOR_ELT(data, i))[j-1] =
>>>>> >>                len == SQL_NULL_DATA ? NA_REAL :
>>>>> >>                thisHandle->ColData[i].RData[row];
>>>>> >>            break;
>>>>> >>
>>>>> >> etc...
>>>>> >>
>>>>> >> Googling for SQL_NULL_DATA and related terms finds this page, about
>>>>> the
>>>>> >> same issue in a Python driver:
>>>>> >>
>>>>> >> http://code.google.com/p/pyodbc/issues/detail?id=51
>>>>> >>
>>>>> >> It appears as if SQLLEN might be unsigned in 64-bit builds, while
>>>>> >> SQL_NULL_DATA is -1. Casting len to be an int rather than an
>>>>> unsigned int
>>>>> >> seems to solve the problem for them.
>>>>> >>
>>>>> >> On my machine, /usr/include/sqltypes.h starts like this:
>>>>> >>
>>>>> >> /*
>>>>> >> *  sqltypes.h
>>>>> >> *
>>>>> >> *  $Id: sqltypes.h,v 1.23 2007/10/07 13:27:13 source Exp $
>>>>> >> *
>>>>> >> *  ODBC typedefs
>>>>> >> *
>>>>> >> *  The iODBC driver manager.
>>>>> >>
>>>>> >> and defines SQLLEN as follows:
>>>>> >>
>>>>> >> #ifdef _WIN64
>>>>> >> typedef INT64                   SQLLEN;
>>>>> >> typedef UINT64                  SQLULEN;
>>>>> >> typedef UINT64                  SQLSETPOSIROW;
>>>>> >> #elif defined(STRICT_ODBC_TYPES)
>>>>> >> typedef long                    SQLLEN;
>>>>> >> typedef unsigned long           SQLULEN;
>>>>> >> typedef unsigned short          SQLSETPOSIROW;
>>>>> >> #else
>>>>> >> #define SQLLEN                  long
>>>>> >> #define SQLULEN                 unsigned long
>>>>> >> #define SQLSETPOSIROW           unsigned short
>>>>> >> #endif
>>>>> >>
>>>>> >> Seems to me like SQLLEN should be signed, but apparently not?
>>>>> >>
>>>>> >> Can anyone help? Thank you!
>>>>> >>
>>>>> >> -Harlan
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> <RODBC.c.diff>_______________________________________________
>
> R-sig-DB mailing list -- R Special Interest Group
> R-sig-DB using r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-db
>
>
>

	[[alternative HTML version deleted]]




More information about the R-sig-DB mailing list