[R] Extracting unique entries by a column

Tue Apr 14 21:53:37 CEST 2015

Try all.equal(df[1,3], df[2,3])

This relates to how decimal numbers are stored in computers. It is not an R only issue, but it is described in the R-FAQ:

>From the R-FAQ - http://cran.r-project.org/doc/FAQ/R-FAQ.html

7.31 Why doesn't R think these numbers are equal?

The only numbers that can be represented exactly in R's numeric type are integers and fractions whose denominator is a power of 2. Other numbers have to be rounded to (typically) 53 binary digits accuracy. As a result, two floating point numbers will not reliably be equal unless they have been computed by the same algorithm, and not always even then. For example

R> a <- sqrt(2)
R> a * a == 2
[1] FALSE
R> a * a - 2
[1] 4.440892e-16

The function all.equal() compares two objects using a numeric tolerance of .Machine$double.eps ^ 0.5. If you want much greater accuracy than this you will need to consider error propagation carefully.

For more information, see e.g. David Goldberg (1991), "What Every Computer Scientist Should Know About Floating-Point Arithmetic", ACM Computing Surveys, 23/1, 5-48, also available via http://www.validlab.com/goldberg/paper.pdf.

To quote from "The Elements of Programming Style" by Kernighan and Plauger:

    10.0 times 0.1 is hardly ever 1.0.

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Vikram Chhatre
Sent: Tuesday, April 14, 2015 2:40 PM
To: r-help
Subject: [R] Extracting unique entries by a column

I have a data frame of dim 3x600.  There are pairs of rows which have the
exact same value in column 3.

head(df)
                POP1         POP2   ABSDIFF
L0005.01 0.98484848 0.688118812 0.2967297
L0005.03 0.01515152 0.311881188 0.2967297
L0008.02 0.97727273 0.004424779 0.9728479
L0008.04 0.02272727 0.995575221 0.9728479
L0012.03 0.98684211 0.004385965 0.9824561
L0012.01 0.01315789 0.995614035 0.9824561

I want to unique sort on df$ABSDIFF so that only one row per pair remains
in the subset.

>df_subset <- df[df(!duplicated(df$ABSDIFF), ]

This does not work. So I literally checked:

>identical(df[1,3], df[2,3])
FALSE

How is 0.2967297 different from 0.2967297?  I am puzzled.

Thanks for any insight.

Vikram

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.