[R] Numbers that look equal, should be equal, but if() doesn't see as equal (repost with code included)

Paul Lemmens P.Lemmens at nici.kun.nl
Wed May 28 08:33:33 CEST 2003


Hi!

Apologies for sending the mail without any code. Apparently somewhere along 
the way the .R attachments got filtered out. I have included the code below 
as clean as possible. My original mail is below the code.

Thank you again for your time.
regards,
Paul

vincentize <- function(data, bins)
{
	if ( length(data) < 2 )
	{
		stop("The data is really short. Is that ok?");
	}

	if ( bins < 2 )
	{
		stop("A number of bins smaller than 2 just really isn't useful");
	}

	if ( bins > length(data) )
	{
		stop("This is really unusual, although perhaps possible. If your eally 
know what you're doing, maybe you should disable this check!?.");
	}
	
	ret <- c();
	for ( i in 1:length(data))
	{
		rt <- data[i];
		b <- 0;
		while ( b < bins )
		{
			ret <- c(ret, rt);
			b <- b+1;
		}
	}

	ret;
}


binify <- function(data, bins, n)
{
	if ( bins < 2 )
	{
		stop("Number of bins is smaller than 2. Nothing to split, exiting.");
	}

	if ( length(data) < 2 )
	{
		stop("The length of the data is really short. Is that ok?");
	}

	if ( bins * n != length(data) )
	{
		stop("Cannot construct bins of equal length.");
	}

	t(array(data, c(n,bins)));
}

mean.bins <- function(data)
{
	# For the vincentizing procedures in vincentize() and binify(),
 	# it made sense to check the data array/vector/matrix. Here,
	# we now just need to check that data is a matrix.
	if ( !is.matrix(data) )
	{
		stop("The data is not in matrix form.");
	}

	means <- c();
	bins <- dim(data)[1];
	for (i in 1:bins)
	{
		means <- c(means, mean(data[i,]));
	}

	# return a vector of means.
	means;
}

bins.factor <- function(data, bins)
{
	if ( !is.data.frame(data) )
	{
		stop("data is not a data frame.");
	}

	source('Ratcliff.r', local=TRUE);
	subject.bin.means <- c();

	attach(data);
	l <- levels(Cond);
	for ( i in 1:length(l) )
	{
		cat("Calculating bins for factor level ", l[i], ".\n", sep="");
		flush.console();

		data <- RT[Cond == l[i]];
		data <- sort(data);

		n <- length(data);
		data.vincent <- vincentize(data,bins);
		data.vincent.bins <- binify(data.vincent, bins, n);
		bin.means <- mean.bins(data.vincent.bins);

		# FAILING TEST.
		mean.orig <- mean(data);
		mean.b <- mean(bin.means);
		if ( mean.b != mean.orig )
		{
			#cat("mean.b\n", str(mean.b), "mean.orig\n", str(mean.orig)); 
flush.console;
			detach(data);
			stop("Something went wrong calculating the bins: means do not equal.");
		}		
		subject.bin.means <- c(subject.bin.means, bin.means);
	}
	detach(data);

	if ( !length(subject.bin.means) == bins*length(l) )
	{
		stop("Inappropriate number of means calculated.");
	}
	else
	{
		subject.bin.means
	}
}

---------- Forwarded Message ----------
Date: dinsdag 27 mei 2003 14:53 +0200
From: Paul Lemmens <P.Lemmens at nici.kun.nl>
To: r-help at stat.math.ethz.ch
Subject: [R] Numbers that look equal, should be equal, but if() doesn't see 
as equal

Hi!

After a lot of testing and debugging I'm falling silent in figuring out
what goes wrong in the following.

I'm implementing the Vincentizing procedure that Ratcliff (1979) described.
It's about calculating RT bins for any distribution of RT data. It boils
down to rank ordering your data, replicating each data point as many times
as you need bins and then splitting up the resulting distribution in equal
bins.

The code that I've written is attached (and not included because it is
considerable in length due to many comments). Ratcliff.r contains some
basic functions and distribution.bins.r contains the problematic function
bins.factor() (problem area marked with 'FAILING TEST'). The final attached
file is the mock up distribution I made.

The failing test is the check if the mean of the mean RT's for each bin
equals the mean of the original distribution. These should/are
mathematically equivalent. Sometimes, however, the test fails. With the
attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the
means are mathematically equivalent IMHO it should not be an issue of this
particular distribution. As a matter of fact, I also have tested some
rnorm() distributions and my function also fails on those (albeit a little
less often than with foobar.txt).

Problem description: if one calculates the bins or bin means by hand, the
mean of the bin means is visually the same as the overall mean, even with
options(digits=20), but *still* the test fails.

IMHO it's not my code and neither the distribution I use to test, but
still, can you point out an obvious failure of my programming or is it
indeed something of R that I don't yet grasp?

thank you for your help,
Paul


-- 
Paul Lemmens
NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
NL-6525 HR Nijmegen                                              X
The Netherlands                                                 / \
Phonenumber    +31-24-3612648
Fax            +31-24-3616066


---------- End Forwarded Message ----------




-- 
Paul Lemmens
NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
NL-6525 HR Nijmegen                                              X
The Netherlands                                                 / \
Phonenumber    +31-24-3612648
Fax            +31-24-3616066

-------------- next part --------------
Hi!

After a lot of testing and debugging I'm falling silent in figuring out 
what goes wrong in the following.

I'm implementing the Vincentizing procedure that Ratcliff (1979) described. 
It's about calculating RT bins for any distribution of RT data. It boils 
down to rank ordering your data, replicating each data point as many times 
as you need bins and then splitting up the resulting distribution in equal 
bins.

The code that I've written is attached (and not included because it is 
considerable in length due to many comments). Ratcliff.r contains some 
basic functions and distribution.bins.r contains the problematic function 
bins.factor() (problem area marked with 'FAILING TEST'). The final attached 
file is the mock up distribution I made.

The failing test is the check if the mean of the mean RT's for each bin 
equals the mean of the original distribution. These should/are 
mathematically equivalent. Sometimes, however, the test fails. With the 
attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the 
means are mathematically equivalent IMHO it should not be an issue of this 
particular distribution. As a matter of fact, I also have tested some 
rnorm() distributions and my function also fails on those (albeit a little 
less often than with foobar.txt).

Problem description: if one calculates the bins or bin means by hand, the 
mean of the bin means is visually the same as the overall mean, even with 
options(digits=20), but *still* the test fails.

IMHO it's not my code and neither the distribution I use to test, but 
still, can you point out an obvious failure of my programming or is it 
indeed something of R that I don't yet grasp?

thank you for your help,
Paul


-- 
Paul Lemmens
NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
NL-6525 HR Nijmegen                                              X
The Netherlands                                                 / \
Phonenumber    +31-24-3612648
Fax            +31-24-3616066

-------------- next part --------------
"RT" "Cond"
"1"  1 "A"
"2"  1 "A"
"3"  1 "A"
"4"  2 "A"
"5"  2 "A"
"6"  3 "A"
"7"  3 "A"
"8"  3 "A"
"9"  3 "A"
"10"  3 "A"
"11"  4 "A"
"12"  4 "A"
"13"  4 "A"
"14"  4 "A"
"15"  5 "A"
"16"  5 "A"
"17"  5 "A"
"18"  5 "A"
"19"  5 "A"
"20"  5 "A"
"21"  5 "A"
"22"  6 "A"
"23"  6 "A"
"24"  6 "A"
"25"  6 "A"
"26"  6 "A"
"27"  6 "A"
"28"  6 "A"
"29"  6 "A"
"30"  6 "A"
"31"  7 "A"
"32"  7 "A"
"33"  7 "A"
"34"  7 "A"
"35"  8 "A"
"36"  8 "A"
"37"  8 "A"
"38"  9 "A"
"39"  9 "A"
"40" 10 "A"
"41"  2 "B"
"42"  2 "B"
"43"  2 "B"
"44"  4 "B"
"45"  4 "B"
"46"  6 "B"
"47"  6 "B"
"48"  6 "B"
"49"  6 "B"
"50"  6 "B"
"51"  8 "B"
"52"  8 "B"
"53"  8 "B"
"54"  8 "B"
"55" 10 "B"
"56" 10 "B"
"57" 10 "B"
"58" 10 "B"
"59" 10 "B"
"60" 10 "B"
"61" 10 "B"
"62" 12 "B"
"63" 12 "B"
"64" 12 "B"
"65" 12 "B"
"66" 12 "B"
"67" 12 "B"
"68" 12 "B"
"69" 12 "B"
"70" 12 "B"
"71" 14 "B"
"72" 14 "B"
"73" 14 "B"
"74" 14 "B"
"75" 16 "B"
"76" 16 "B"
"77" 16 "B"
"78" 18 "B"
"79" 18 "B"
"80" 20 "B"
"81"  3 "C"
"82"  3 "C"
"83"  3 "C"
"84"  6 "C"
"85"  6 "C"
"86"  9 "C"
"87"  9 "C"
"88"  9 "C"
"89"  9 "C"
"90"  9 "C"
"91" 12 "C"
"92" 12 "C"
"93" 12 "C"
"94" 12 "C"
"95" 15 "C"
"96" 15 "C"
"97" 15 "C"
"98" 15 "C"
"99" 15 "C"
"100" 15 "C"
"101" 15 "C"
"102" 18 "C"
"103" 18 "C"
"104" 18 "C"
"105" 18 "C"
"106" 18 "C"
"107" 18 "C"
"108" 18 "C"
"109" 18 "C"
"110" 18 "C"
"111" 21 "C"
"112" 21 "C"
"113" 21 "C"
"114" 21 "C"
"115" 24 "C"
"116" 24 "C"
"117" 24 "C"
"118" 27 "C"
"119" 27 "C"
"120" 30 "C"
-------------- next part --------------
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


More information about the R-help mailing list