[R] Extracting unique entries by a column

Vikram Chhatre crypticlineage at gmail.com
Tue Apr 14 22:32:37 CEST 2015


Hi David,

Thanks.  That was enlightening.

Whoop.

V

On Tue, Apr 14, 2015 at 3:53 PM, David L Carlson <dcarlson at tamu.edu> wrote:

> Try all.equal(df[1,3], df[2,3])
>
> This relates to how decimal numbers are stored in computers. It is not an
> R only issue, but it is described in the R-FAQ:
>
> From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html
>
> 7.31 Why doesn't R think these numbers are equal?
>
> The only numbers that can be represented exactly in R's numeric type are
> integers and fractions whose denominator is a power of 2. Other numbers
> have to be rounded to (typically) 53 binary digits accuracy. As a result,
> two floating point numbers will not reliably be equal unless they have been
> computed by the same algorithm, and not always even then. For example
>
> R> a <- sqrt(2)
> R> a * a == 2
> [1] FALSE
> R> a * a - 2
> [1] 4.440892e-16
>
> The function all.equal() compares two objects using a numeric tolerance of
> .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you
> will need to consider error propagation carefully.
>
> For more information, see e.g. David Goldberg (1991), "What Every Computer
> Scientist Should Know About Floating-Point Arithmetic", ACM Computing
> Surveys, 23/1, 5-48, also available via
> http://www.validlab.com/goldberg/paper.pdf.
>
> To quote from "The Elements of Programming Style" by Kernighan and Plauger:
>
>     10.0 times 0.1 is hardly ever 1.0.
>
>
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram
> Chhatre
> Sent: Tuesday, April 14, 2015 2:40 PM
> To: r-help
> Subject: [R] Extracting unique entries by a column
>
> I have a data frame of dim 3x600.  There are pairs of rows which have the
> exact same value in column 3.
>
> head(df)
>                 POP1         POP2   ABSDIFF
> L0005.01 0.98484848 0.688118812 0.2967297
> L0005.03 0.01515152 0.311881188 0.2967297
> L0008.02 0.97727273 0.004424779 0.9728479
> L0008.04 0.02272727 0.995575221 0.9728479
> L0012.03 0.98684211 0.004385965 0.9824561
> L0012.01 0.01315789 0.995614035 0.9824561
>
> I want to unique sort on df$ABSDIFF so that only one row per pair remains
> in the subset.
>
> >df_subset <- df[df(!duplicated(df$ABSDIFF), ]
>
> This does not work. So I literally checked:
>
> >identical(df[1,3], df[2,3])
> FALSE
>
> How is 0.2967297 different from 0.2967297?  I am puzzled.
>
> Thanks for any insight.
>
> Vikram
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list