[R] Numbers that look equal, should be equal, but if() doesn't see as equal (repost with code included)

Thomas Lumley tlumley at u.washington.edu
Wed May 28 16:16:53 CEST 2003


On Wed, 28 May 2003, Paul Lemmens wrote:

> Hi!
>
> Apologies for sending the mail without any code. Apparently somewhere along
> the way the .R attachments got filtered out. I have included the code below
> as clean as possible. My original mail is below the code.

I still think you need not to be using ==.  You want something like

if ( abs(mean.b-mean.orig)/(epsilon+abs(mean.orig) < epsilon){

You are effectively using epsilon=0, but epsilon=10e-10 should be
adequate.

	-thomas



> Thank you again for your time.
> regards,
> Paul
>
> vincentize <- function(data, bins)
> {
> 	if ( length(data) < 2 )
> 	{
> 		stop("The data is really short. Is that ok?");
> 	}
>
> 	if ( bins < 2 )
> 	{
> 		stop("A number of bins smaller than 2 just really isn't useful");
> 	}
>
> 	if ( bins > length(data) )
> 	{
> 		stop("This is really unusual, although perhaps possible. If your eally
> know what you're doing, maybe you should disable this check!?.");
> 	}
>
> 	ret <- c();
> 	for ( i in 1:length(data))
> 	{
> 		rt <- data[i];
> 		b <- 0;
> 		while ( b < bins )
> 		{
> 			ret <- c(ret, rt);
> 			b <- b+1;
> 		}
> 	}
>
> 	ret;
> }
>
>
> binify <- function(data, bins, n)
> {
> 	if ( bins < 2 )
> 	{
> 		stop("Number of bins is smaller than 2. Nothing to split, exiting.");
> 	}
>
> 	if ( length(data) < 2 )
> 	{
> 		stop("The length of the data is really short. Is that ok?");
> 	}
>
> 	if ( bins * n != length(data) )
> 	{
> 		stop("Cannot construct bins of equal length.");
> 	}
>
> 	t(array(data, c(n,bins)));
> }
>
> mean.bins <- function(data)
> {
> 	# For the vincentizing procedures in vincentize() and binify(),
>  	# it made sense to check the data array/vector/matrix. Here,
> 	# we now just need to check that data is a matrix.
> 	if ( !is.matrix(data) )
> 	{
> 		stop("The data is not in matrix form.");
> 	}
>
> 	means <- c();
> 	bins <- dim(data)[1];
> 	for (i in 1:bins)
> 	{
> 		means <- c(means, mean(data[i,]));
> 	}
>
> 	# return a vector of means.
> 	means;
> }
>
> bins.factor <- function(data, bins)
> {
> 	if ( !is.data.frame(data) )
> 	{
> 		stop("data is not a data frame.");
> 	}
>
> 	source('Ratcliff.r', local=TRUE);
> 	subject.bin.means <- c();
>
> 	attach(data);
> 	l <- levels(Cond);
> 	for ( i in 1:length(l) )
> 	{
> 		cat("Calculating bins for factor level ", l[i], ".\n", sep="");
> 		flush.console();
>
> 		data <- RT[Cond == l[i]];
> 		data <- sort(data);
>
> 		n <- length(data);
> 		data.vincent <- vincentize(data,bins);
> 		data.vincent.bins <- binify(data.vincent, bins, n);
> 		bin.means <- mean.bins(data.vincent.bins);
>
> 		# FAILING TEST.
> 		mean.orig <- mean(data);
> 		mean.b <- mean(bin.means);
> 		if ( mean.b != mean.orig )
> 		{
> 			#cat("mean.b\n", str(mean.b), "mean.orig\n", str(mean.orig));
> flush.console;
> 			detach(data);
> 			stop("Something went wrong calculating the bins: means do not equal.");
> 		}
> 		subject.bin.means <- c(subject.bin.means, bin.means);
> 	}
> 	detach(data);
>
> 	if ( !length(subject.bin.means) == bins*length(l) )
> 	{
> 		stop("Inappropriate number of means calculated.");
> 	}
> 	else
> 	{
> 		subject.bin.means
> 	}
> }
>
> ---------- Forwarded Message ----------
> Date: dinsdag 27 mei 2003 14:53 +0200
> From: Paul Lemmens <P.Lemmens at nici.kun.nl>
> To: r-help at stat.math.ethz.ch
> Subject: [R] Numbers that look equal, should be equal, but if() doesn't see
> as equal
>
> Hi!
>
> After a lot of testing and debugging I'm falling silent in figuring out
> what goes wrong in the following.
>
> I'm implementing the Vincentizing procedure that Ratcliff (1979) described.
> It's about calculating RT bins for any distribution of RT data. It boils
> down to rank ordering your data, replicating each data point as many times
> as you need bins and then splitting up the resulting distribution in equal
> bins.
>
> The code that I've written is attached (and not included because it is
> considerable in length due to many comments). Ratcliff.r contains some
> basic functions and distribution.bins.r contains the problematic function
> bins.factor() (problem area marked with 'FAILING TEST'). The final attached
> file is the mock up distribution I made.
>
> The failing test is the check if the mean of the mean RT's for each bin
> equals the mean of the original distribution. These should/are
> mathematically equivalent. Sometimes, however, the test fails. With the
> attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the
> means are mathematically equivalent IMHO it should not be an issue of this
> particular distribution. As a matter of fact, I also have tested some
> rnorm() distributions and my function also fails on those (albeit a little
> less often than with foobar.txt).
>
> Problem description: if one calculates the bins or bin means by hand, the
> mean of the bin means is visually the same as the overall mean, even with
> options(digits=20), but *still* the test fails.
>
> IMHO it's not my code and neither the distribution I use to test, but
> still, can you point out an obvious failure of my programming or is it
> indeed something of R that I don't yet grasp?
>
> thank you for your help,
> Paul
>
>
> --
> Paul Lemmens
> NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
> Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
> NL-6525 HR Nijmegen                                              X
> The Netherlands                                                 / \
> Phonenumber    +31-24-3612648
> Fax            +31-24-3616066
>
>
> ---------- End Forwarded Message ----------
>
>
>
>
> --
> Paul Lemmens
> NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
> Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
> NL-6525 HR Nijmegen                                              X
> The Netherlands                                                 / \
> Phonenumber    +31-24-3612648
> Fax            +31-24-3616066
>
>

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
^^^^^^^^^^^^^^^^^^^^^^^^
- NOTE NEW EMAIL ADDRESS




More information about the R-help mailing list