[R] Help with possible bug (assigning NA value to data.frame)?

Dan Bolser dmb at mrc-dunn.cam.ac.uk
Tue Jun 7 20:15:22 CEST 2005


This 'strange behaviour' manifest itself within some quite complex
code. When I created a *very* simple example the behaviour dissapeared. 

Here is the simplest version I have found which still causes the strange
behaviour (it could be quite unrelated to the boot library, however).


library(boot)
 
## boot statistic function
my.mean.s <- function(data,subset){
  mean(data[subset])
}

## dummy data, deliberatly no variance
my.test.dat.1 <- rep(4,5)
my.test.dat.2 <- rep(8,5)

## not much can happen here
my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 )
my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 )

## returns a null object as ci is meaningless for this data
my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal')
my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal')


## now try to store this data (the problem begins)...

## dummy existing data 
a <- data.frame(matrix(c(1,2,3,4),nrow=2))

## make space for new data
a$X3 <- NA
a$X4 <- NA

## try to store the upper and lower ci (not) calculated above
a[a$X1==1,]$X3 <-  my.test.boot.ci.1$normal[2]
a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]
a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]

a


What I see is 

> a
  X1 X2 X3 X4
1  1  3 NA  1
2  2  4 NA  2


What I expected to see was

> a
  X1 X2 X3 X4
1  1  3 NA  NA
2  2  4 NA  NA

Some how the last assignment of the data from within the null object
assigns the value of the '==x' part of the logical vector subscript.

If I make the following (trivial?) adjustment 

a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
a[a$X1==1,]$X3 <-  my.test.boot.ci.a$normal[2]
a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]
a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]


The output changes to 

> a
  X1 X2 X3 X4
1  1  3  1  1
2  2  4  2  2

Which is even wronger.



Not sure if this is usefull without the full context, but here is the
output from the real version of this program (where most of the above code
is within a loop). What is printed out for each cycle of the loop is the
value of the '==x' part of the subscript.


[1] 2
[1] 3
[1] 4
[1] 5
[1] "All values of t are equal to  1 \n Cannot calculate confidence
intervals"
[1] 6
[1] 7
[1] "All values of t are equal to  1 \n Cannot calculate confidence
intervals"
[1] 8
[1] 10
[1] 11
[1] "All values of t are equal to  1 \n Cannot calculate confidence
intervals"
> 


Above you see that for some values I can't calculate a ci (but storing it
as above), then...

> dat.5.ho
  CHAINS DOM_PER_CHAIN     lower     upper
1      2      1.416539 1.3626253  1.468387
2      3      1.200000 1.1146014  1.288724
3      4      1.363636 1.2675657  1.462571
4      5      1.000000        NA  5.000000
5      6      1.323529 1.0991974  1.546156
6      7      1.000000        NA  7.000000
7      8      1.100000 0.9037904  1.289210
8     10      1.142857 0.8775104  1.403918
9     11      1.000000        NA 11.000000
> 


Do you spot the same problem? Namely for each value of the 'CHAINS' column
that was unable to calculate a ci, the second assignment to the data table
from the 'null' object assigned the lookup value of CHAINS to that column
instead! The assignment (within the loop) looks like this...

  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  x.s.ci$normal[2]
  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  x.s.ci$normal[3]

(where chain is the 'loop variable').


As far as I can tell this is a bug. It dosn't happen when I try...
 
  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  NA
  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  NA 


And doing the following (swapping the order) changes the behaviour...

  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  x.s.ci$normal[3]
  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  x.s.ci$normal[2]
  

Giving...

> dat.5.ho
  CHAINS DOM_PER_CHAIN      lower     upper
1      2      1.416539  1.3616070  1.472716
2      3      1.200000  1.1134237  1.287601
3      4      1.363636  1.2587204  1.466037
4      5      1.000000  5.0000000  5.000000
5      6      1.323529  1.1082482  1.547222
6      7      1.000000  7.0000000  7.000000
7      8      1.100000  0.9021282  1.287672
8     10      1.142857  0.8766731  1.403327
9     11      1.000000 11.0000000 11.000000


Which is again incorrect and unpredicted (as above). 


Please let me know what to do to report this problem better, or if I just
missed something silly.

I am RH9, R-2.1.0 (compiled from source), latest boot from CRAN (if that
makes a difference).

Cheers,
Dan.




More information about the R-help mailing list