[R] problem with duplicated function
Jeff Newmiller
jdnewmil at dcn.davis.CA.us
Mon May 25 00:12:14 CEST 2015
You are going wrong in a few places: posting using HTML format, not using dput to share your data sample, and comparing floating point numbers for equality.
HTML email is stripped to plain text on this list so we don't see what you see. In addition, HTML formatting corrupts code, so we cannot even run it.
The dput function is highly recommended for making reproducible examples. [1]
FAQ 7.31 warns against expecting floating point numbers that appear the same when printed to actually be equal. This advice actually applies to all programming languages.
[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On May 24, 2015 2:34:13 PM PDT, Curtis Burkhalter <curtisburkhalter at gmail.com> wrote:
>Hello everyone,
>
>I have two very large dataframes (~1 million rows x 5 columns), of
>which
>two of the columns are lat/long coordinates. The names of the
>dataframes
>are 'data07' and 'data 08'. Data08 has a few more sampling points than
>data
>07 so I want to subset data08 so that it has the same number of data
>points
>as data07 using the unique lat/long coordinates.
>
>Here are the associated data structures:
>
>*str(data07)*
>'data.frame': 969109 obs. of 5 variables:
>$ cell : int 710228 715545 720690 720824 695611 700490 700626
>705371
>705507 710363 ...
> $ prN : int 288 276 286 304 258 257 264 272 286 316 ...
>$ Location: Factor w/ 32 levels " ","Blacks_Fork",..: 24 24 24 24 24 24
>24
>24 24 24 ...
> $ Xcor : num -111 -111 -111 -111 -111 ...
> $ Ycor : num 41.7 41.7 41.7 41.7 41.8 ...
>
>*str(data08)*
>'data.frame': 969810 obs. of 5 variables:
>$ cell : int 705528 710321 710456 715677 720762 720896 699953
>700635
>700771 705664 ...
> $ prN : int 293 281 299 278 276 266 282 255 287 280 ...
>$ Location: Factor w/ 31 levels "Blacks_Fork",..: 23 23 23 23 23 23 23
>23
>23 23 ...
> $ Xcor : num -111 -111 -111 -111 -111 ...
> $ Ycor : num 41.8 41.7 41.7 41.7 41.7 ...
>
>I've tried using the following code to accomplish my problem:
>
>tt <- rbind(data07, data08)
>
>tt.dup <- duplicated(tt[,4:5]) # marks all duplicate rows in data08
>from
>last 2 cols #that correspond
>to
>the lat/long
>
>tt.dup <- tt.dup[-seq_len(nrow(data07))] # remove all data07 entries
>(first
>n)
>
>test=ddata08[tt.dup, ] # index only TRUE/duplicated elements from
>data08
>
>When I run the code 'tt.dup' is FALSE for all entries, which I know
>isn't
>true.
>
>Here's a small subset of the data so that you can see exactly where
>there
>are duplicates
>
>data07[1:10,]
> cell prN Location Xcor Ycor
>710229 *710228 288 Sage -111.044 41.7403*
>715546 *715545 276 Sage -111.044 41.7245*
>720691 *720690 286 Sage -111.044 41.7131*
>720825 *720824 304 Sage -111.044 41.7109*
>695612 695611 258 Sage -111.043 41.7766
>700491 700490 257 Sage -111.043 41.7653
>700627 700626 264 Sage -111.043 41.7630
>705372 705371 272 Sage -111.043 41.7517
>705508 705507 286 Sage -111.043 41.7495
>710364 710363 316 Sage -111.043 41.7381
>
> data08[1:10,]
> cell prN Location Xcor Ycor
>705529 705528 293 Sage -111.044 41.7517
>710322 *710321 281 Sage -111.044 41.7403*
>710457 710456 299 Sage -111.044 41.7381
>715678 *715677 278 Sage -111.044 41.7245*
>720763 *720762 276 Sage -111.044 41.7131*
>720897 *720896 266 Sage -111.044 41.7109*
>699954 699953 282 Sage -111.043 41.7767
>700636 700635 255 Sage -111.043 41.7653
>700772 700771 287 Sage -111.043 41.7631
>705665 705664 280 Sage -111.043 41.7495
>
>
>If anyone has any suggestions as to where I might be going wrong I'd
>greatly appreciate it.
>
>Thank you
More information about the R-help
mailing list