[R] Numbers that look equal, should be equal, but if() doesn't see as equal (repost with code included)

Paul Lemmens P.Lemmens at nici.kun.nl
Wed May 28 08:33:33 CEST 2003

```Hi!

Apologies for sending the mail without any code. Apparently somewhere along
the way the .R attachments got filtered out. I have included the code below
as clean as possible. My original mail is below the code.

Thank you again for your time.
regards,
Paul

vincentize <- function(data, bins)
{
if ( length(data) < 2 )
{
stop("The data is really short. Is that ok?");
}

if ( bins < 2 )
{
stop("A number of bins smaller than 2 just really isn't useful");
}

if ( bins > length(data) )
{
stop("This is really unusual, although perhaps possible. If your eally
know what you're doing, maybe you should disable this check!?.");
}

ret <- c();
for ( i in 1:length(data))
{
rt <- data[i];
b <- 0;
while ( b < bins )
{
ret <- c(ret, rt);
b <- b+1;
}
}

ret;
}

binify <- function(data, bins, n)
{
if ( bins < 2 )
{
stop("Number of bins is smaller than 2. Nothing to split, exiting.");
}

if ( length(data) < 2 )
{
stop("The length of the data is really short. Is that ok?");
}

if ( bins * n != length(data) )
{
stop("Cannot construct bins of equal length.");
}

t(array(data, c(n,bins)));
}

mean.bins <- function(data)
{
# For the vincentizing procedures in vincentize() and binify(),
# it made sense to check the data array/vector/matrix. Here,
# we now just need to check that data is a matrix.
if ( !is.matrix(data) )
{
stop("The data is not in matrix form.");
}

means <- c();
bins <- dim(data)[1];
for (i in 1:bins)
{
means <- c(means, mean(data[i,]));
}

# return a vector of means.
means;
}

bins.factor <- function(data, bins)
{
if ( !is.data.frame(data) )
{
stop("data is not a data frame.");
}

source('Ratcliff.r', local=TRUE);
subject.bin.means <- c();

attach(data);
l <- levels(Cond);
for ( i in 1:length(l) )
{
cat("Calculating bins for factor level ", l[i], ".\n", sep="");
flush.console();

data <- RT[Cond == l[i]];
data <- sort(data);

n <- length(data);
data.vincent <- vincentize(data,bins);
data.vincent.bins <- binify(data.vincent, bins, n);
bin.means <- mean.bins(data.vincent.bins);

# FAILING TEST.
mean.orig <- mean(data);
mean.b <- mean(bin.means);
if ( mean.b != mean.orig )
{
#cat("mean.b\n", str(mean.b), "mean.orig\n", str(mean.orig));
flush.console;
detach(data);
stop("Something went wrong calculating the bins: means do not equal.");
}
subject.bin.means <- c(subject.bin.means, bin.means);
}
detach(data);

if ( !length(subject.bin.means) == bins*length(l) )
{
stop("Inappropriate number of means calculated.");
}
else
{
subject.bin.means
}
}

---------- Forwarded Message ----------
Date: dinsdag 27 mei 2003 14:53 +0200
From: Paul Lemmens <P.Lemmens at nici.kun.nl>
To: r-help at stat.math.ethz.ch
Subject: [R] Numbers that look equal, should be equal, but if() doesn't see
as equal

Hi!

After a lot of testing and debugging I'm falling silent in figuring out
what goes wrong in the following.

I'm implementing the Vincentizing procedure that Ratcliff (1979) described.
It's about calculating RT bins for any distribution of RT data. It boils
down to rank ordering your data, replicating each data point as many times
as you need bins and then splitting up the resulting distribution in equal
bins.

The code that I've written is attached (and not included because it is
considerable in length due to many comments). Ratcliff.r contains some
basic functions and distribution.bins.r contains the problematic function
bins.factor() (problem area marked with 'FAILING TEST'). The final attached
file is the mock up distribution I made.

The failing test is the check if the mean of the mean RT's for each bin
equals the mean of the original distribution. These should/are
mathematically equivalent. Sometimes, however, the test fails. With the
attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the
means are mathematically equivalent IMHO it should not be an issue of this
particular distribution. As a matter of fact, I also have tested some
rnorm() distributions and my function also fails on those (albeit a little
less often than with foobar.txt).

Problem description: if one calculates the bins or bin means by hand, the
mean of the bin means is visually the same as the overall mean, even with
options(digits=20), but *still* the test fails.

IMHO it's not my code and neither the distribution I use to test, but
still, can you point out an obvious failure of my programming or is it
indeed something of R that I don't yet grasp?

Paul

--
Paul Lemmens
NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
NL-6525 HR Nijmegen                                              X
The Netherlands                                                 / \
Phonenumber    +31-24-3612648
Fax            +31-24-3616066

---------- End Forwarded Message ----------

--
Paul Lemmens
NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
NL-6525 HR Nijmegen                                              X
The Netherlands                                                 / \
Phonenumber    +31-24-3612648
Fax            +31-24-3616066

-------------- next part --------------
Hi!

After a lot of testing and debugging I'm falling silent in figuring out
what goes wrong in the following.

I'm implementing the Vincentizing procedure that Ratcliff (1979) described.
It's about calculating RT bins for any distribution of RT data. It boils
down to rank ordering your data, replicating each data point as many times
as you need bins and then splitting up the resulting distribution in equal
bins.

The code that I've written is attached (and not included because it is
considerable in length due to many comments). Ratcliff.r contains some
basic functions and distribution.bins.r contains the problematic function
bins.factor() (problem area marked with 'FAILING TEST'). The final attached
file is the mock up distribution I made.

The failing test is the check if the mean of the mean RT's for each bin
equals the mean of the original distribution. These should/are
mathematically equivalent. Sometimes, however, the test fails. With the
attached distribution most notably for 4, 7, 8, 9, and 13 bins. Since the
means are mathematically equivalent IMHO it should not be an issue of this
particular distribution. As a matter of fact, I also have tested some
rnorm() distributions and my function also fails on those (albeit a little
less often than with foobar.txt).

Problem description: if one calculates the bins or bin means by hand, the
mean of the bin means is visually the same as the overall mean, even with
options(digits=20), but *still* the test fails.

IMHO it's not my code and neither the distribution I use to test, but
still, can you point out an obvious failure of my programming or is it
indeed something of R that I don't yet grasp?

Paul

--
Paul Lemmens
NICI, University of Nijmegen              ASCII Ribbon Campaign /"\
Montessorilaan 3 (B.01.03)                    Against HTML Mail \ /
NL-6525 HR Nijmegen                                              X
The Netherlands                                                 / \
Phonenumber    +31-24-3612648
Fax            +31-24-3616066

-------------- next part --------------
"RT" "Cond"
"1"  1 "A"
"2"  1 "A"
"3"  1 "A"
"4"  2 "A"
"5"  2 "A"
"6"  3 "A"
"7"  3 "A"
"8"  3 "A"
"9"  3 "A"
"10"  3 "A"
"11"  4 "A"
"12"  4 "A"
"13"  4 "A"
"14"  4 "A"
"15"  5 "A"
"16"  5 "A"
"17"  5 "A"
"18"  5 "A"
"19"  5 "A"
"20"  5 "A"
"21"  5 "A"
"22"  6 "A"
"23"  6 "A"
"24"  6 "A"
"25"  6 "A"
"26"  6 "A"
"27"  6 "A"
"28"  6 "A"
"29"  6 "A"
"30"  6 "A"
"31"  7 "A"
"32"  7 "A"
"33"  7 "A"
"34"  7 "A"
"35"  8 "A"
"36"  8 "A"
"37"  8 "A"
"38"  9 "A"
"39"  9 "A"
"40" 10 "A"
"41"  2 "B"
"42"  2 "B"
"43"  2 "B"
"44"  4 "B"
"45"  4 "B"
"46"  6 "B"
"47"  6 "B"
"48"  6 "B"
"49"  6 "B"
"50"  6 "B"
"51"  8 "B"
"52"  8 "B"
"53"  8 "B"
"54"  8 "B"
"55" 10 "B"
"56" 10 "B"
"57" 10 "B"
"58" 10 "B"
"59" 10 "B"
"60" 10 "B"
"61" 10 "B"
"62" 12 "B"
"63" 12 "B"
"64" 12 "B"
"65" 12 "B"
"66" 12 "B"
"67" 12 "B"
"68" 12 "B"
"69" 12 "B"
"70" 12 "B"
"71" 14 "B"
"72" 14 "B"
"73" 14 "B"
"74" 14 "B"
"75" 16 "B"
"76" 16 "B"
"77" 16 "B"
"78" 18 "B"
"79" 18 "B"
"80" 20 "B"
"81"  3 "C"
"82"  3 "C"
"83"  3 "C"
"84"  6 "C"
"85"  6 "C"
"86"  9 "C"
"87"  9 "C"
"88"  9 "C"
"89"  9 "C"
"90"  9 "C"
"91" 12 "C"
"92" 12 "C"
"93" 12 "C"
"94" 12 "C"
"95" 15 "C"
"96" 15 "C"
"97" 15 "C"
"98" 15 "C"
"99" 15 "C"
"100" 15 "C"
"101" 15 "C"
"102" 18 "C"
"103" 18 "C"
"104" 18 "C"
"105" 18 "C"
"106" 18 "C"
"107" 18 "C"
"108" 18 "C"
"109" 18 "C"
"110" 18 "C"
"111" 21 "C"
"112" 21 "C"
"113" 21 "C"
"114" 21 "C"
"115" 24 "C"
"116" 24 "C"
"117" 24 "C"
"118" 27 "C"
"119" 27 "C"
"120" 30 "C"
-------------- next part --------------
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
```