[R] RODBC and NULL values
Mark Wardle
mark at wardle.org
Tue Oct 17 11:03:30 CEST 2006
Dear All,
Writing sooner than I thought I'd need to.
I'm using R 2.4 on Mac OS X, with RODBC, PostgreSQL 8.1 and Actual's
ODBC driver. I have all my data in Filemaker 8.5, but it is
automatically exported into PostgreSQL for analysis as Filemaker's ODBC
and JDBC access is awful, slow and has a tendency to crash.
I have disability data where for each patient there is a survival time
in years from disease onset to a particular disease stage, namely
unilateral support, bilateral support, wheelchair use, and death. Valid
values may include NULL (patient hasn't reached that stage), 0 (for
example, patient needed support immediately at disease onset), and any
positive integer.
When I query the database manually using psql, it is clear there are
NULL values.
3 | 3 | 18 | | 27 | 1
| | | | 13 | 1
1 | 5 | | | 10 | 0
10 | 13 | 13 | | 22 | 0
However, these are all converted to zeros when I use RODBC's sqlQuery(),
making interpretation impossible. I have tried using the nullstring and
na.strings options, but these don't seem to have any effect. I have
tried various combinations of NULL, NA and "". Forgive my awkward SQL.
> channel = odbcConnect("ataxia", uid="mark")
> disease = sqlQuery(channel, "select calc_survival_unilateral_support
as unlateral, calc_survival_bilateral_support as bilateral,
calc_survival_wheelchair as wheelchair,calc_survival_death as death,
calc_follow_up as followup, has_family_history_ataxia as familial from
clinical, patient where clinical.patient_fk = patient_id and excluded=0
and calc_walking_disability_valid=1")
> disease # and show results
127 3 3 18 0 27 1
128 0 0 0 0 13 1
129 1 5 0 0 10 0
130 10 13 13 0 22 0
It doesn't seem to be the old repeating rows NULL bug talked about <a
href="http://tolstoy.newcastle.edu.au/R/help/04/07/0803.html">here</a>.
Is this because my ODBC driver is not returning the correct values for
RODBC to parse? Is there anyway of debugging this (the intricacies of
ODBC are beyond my skill) and is my only alternative to store a
non-valid number in the database (999?) and use my query or R to remove
those datapoints afterwards?
Looking in the archives, there are lots of people asking about how to
convert NAs to numeric, but I want the NAs passed through unaltered!
Many thanks in advance,
Mark
--
Dr. Mark Wardle
Clinical research fellow and Specialist Registrar in Neurology,
C2-B2 link, Cardiff University, Heath Park, CARDIFF, CF14 4XN. UK
More information about the R-help
mailing list