[R] Finding Source of Error Message of 'Non-Unique Index Entries'

Wed Jan 4 19:31:14 CET 2012

On Wed, 4 Jan 2012, David Winsemius wrote:

> You didn't ask for what was duplicated, but rather what was NOT duplicated
> with that code. In the case of a dataframe it is the entire row that is
> tested.

   My original question was what was duplicated, but ... I changed the
function by dropping the 'not'. There's something seriously wrong here and I
need help from R gurus to tell me why.

   Example:

burns.tds[duplicated(burns.tds), ]
   ...
25760 BC-1.5 1996-09-19      NA
25761 BC-1.5 1996-09-19   0.010
   ...

   But, when I query the database table I see this:

select * from chemistry where site = 'BC-1.5' and sampdate = '1996-09-19'
and param = 'TDS';
   site  |  sampdate  | param | quant | units | qual | easting | northing |
  stream  | basin 
--------+------------+-------+-------+-------+------+---------+----------+-
---------+--------
  BC-1.5 | 1996-09-19 | TDS   |   935 | mg/L  |      |         |          | 
BurnsCrk | 
(1 row)

   There is only a single row for that site, sampdate, and parameter and the
quantity is different from those in the R data frame.

> I think you need to reduce this problem to a dataframe that you either
> post an access method for or use dput() to include. Then you need to say
> what you goals are and what code is not working on that example.

   I'll gladly do this. Which data frame should I make available: the
original chemdata or the subset burns.tds? I'll start with the latter.
Compressed dput() output attached.

   My goal is to produce time series plots of TDS, by site, on several
streams over the period for which that component was measured. Lattice lets
me superpose multiple lines on the same axis set with different color lines
and a legend.

   What's not working is something in the workflow of subsettiong chemdata to
extract all TDS data for a named stream (e.g., burns.tds and winters.tds),
then convert them to zoo objects using read.zoo(). Somewhere along this
process my data are being mangled. It's not in the source data frame,
chemdata:

chemdata[duplicated(chemdata), ]
  [1] site     sampdate param    quant    units    qual     easting  northing
  [9] stream   basin 
<0 rows> (or 0-length row.names)

   The command I used to subset burns.tds from chemdata was:

burns.tds <- subset(chemdata, stream == 'BurnsCrk', select = c(site,
sampdate, param == 'TDS', quant), drop = T)

Thanks, David,

Rich