[R] Help with possible bug (assigning NA value to data.frame) ?

Liaw, Andy andy_liaw at merck.com
Tue Jun 7 21:15:14 CEST 2005


There's something peculiar that I do not understand here.  However, did you
realize that the thing you are assigning into parts of `a' is NULL?  Check
you're my.test.boot.ci.1:  It's NULL.

Be that as it may, I get:

> a <- data.frame(matrix(1:4, nrow=2), X3=NA, X4=NA)
> a
  X1 X2 X3 X4
1  1  3 NA NA
2  2  4 NA NA
> a[a$X1 == 1,]$X3 <- NULL
> a
  X1 X2 X3 X4
1  1  3 NA  1
2  2  4 NA NA
> a[a$X1 == 1,]$X4 <- NULL
> a
  X1 X2 X3 X4
1  1  3 NA  1
2  2  4 NA NA

which really baffles me...

In any case, that's not how I would assign into part of a data frame.  I
would do either

    a[a$X1 == 1, "X3"] <- something

or

    a$X3[a$X1 == 1] <- something

In either case you'd get an error if `something' is NULL.

Andy

> From: Dan Bolser
> 
> 
> This 'strange behaviour' manifest itself within some quite complex
> code. When I created a *very* simple example the behaviour 
> dissapeared. 
> 
> Here is the simplest version I have found which still causes 
> the strange
> behaviour (it could be quite unrelated to the boot library, however).
> 
> 
> library(boot)
>  
> ## boot statistic function
> my.mean.s <- function(data,subset){
>   mean(data[subset])
> }
> 
> ## dummy data, deliberatly no variance
> my.test.dat.1 <- rep(4,5)
> my.test.dat.2 <- rep(8,5)
> 
> ## not much can happen here
> my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 )
> my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 )
> 
> ## returns a null object as ci is meaningless for this data
> my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal')
> my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal')
> 
> 
> ## now try to store this data (the problem begins)...
> 
> ## dummy existing data 
> a <- data.frame(matrix(c(1,2,3,4),nrow=2))
> 
> ## make space for new data
> a$X3 <- NA
> a$X4 <- NA
> 
> ## try to store the upper and lower ci (not) calculated above
> a[a$X1==1,]$X3 <-  my.test.boot.ci.1$normal[2]
> a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
> a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]
> a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]
> 
> a
> 
> 
> What I see is 
> 
> > a
>   X1 X2 X3 X4
> 1  1  3 NA  1
> 2  2  4 NA  2
> 
> 
> What I expected to see was
> 
> > a
>   X1 X2 X3 X4
> 1  1  3 NA  NA
> 2  2  4 NA  NA
> 
> Some how the last assignment of the data from within the null object
> assigns the value of the '==x' part of the logical vector subscript.
> 
> If I make the following (trivial?) adjustment 
> 
> a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
> a[a$X1==1,]$X3 <-  my.test.boot.ci.a$normal[2]
> a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]
> a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]
> 
> 
> The output changes to 
> 
> > a
>   X1 X2 X3 X4
> 1  1  3  1  1
> 2  2  4  2  2
> 
> Which is even wronger.
> 
> 
> 
> Not sure if this is usefull without the full context, but here is the
> output from the real version of this program (where most of 
> the above code
> is within a loop). What is printed out for each cycle of the 
> loop is the
> value of the '==x' part of the subscript.
> 
> 
> [1] 2
> [1] 3
> [1] 4
> [1] 5
> [1] "All values of t are equal to  1 \n Cannot calculate confidence
> intervals"
> [1] 6
> [1] 7
> [1] "All values of t are equal to  1 \n Cannot calculate confidence
> intervals"
> [1] 8
> [1] 10
> [1] 11
> [1] "All values of t are equal to  1 \n Cannot calculate confidence
> intervals"
> > 
> 
> 
> Above you see that for some values I can't calculate a ci 
> (but storing it
> as above), then...
> 
> > dat.5.ho
>   CHAINS DOM_PER_CHAIN     lower     upper
> 1      2      1.416539 1.3626253  1.468387
> 2      3      1.200000 1.1146014  1.288724
> 3      4      1.363636 1.2675657  1.462571
> 4      5      1.000000        NA  5.000000
> 5      6      1.323529 1.0991974  1.546156
> 6      7      1.000000        NA  7.000000
> 7      8      1.100000 0.9037904  1.289210
> 8     10      1.142857 0.8775104  1.403918
> 9     11      1.000000        NA 11.000000
> > 
> 
> 
> Do you spot the same problem? Namely for each value of the 
> 'CHAINS' column
> that was unable to calculate a ci, the second assignment to 
> the data table
> from the 'null' object assigned the lookup value of CHAINS to 
> that column
> instead! The assignment (within the loop) looks like this...
> 
>   dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  x.s.ci$normal[2]
>   dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  x.s.ci$normal[3]
> 
> (where chain is the 'loop variable').
> 
> 
> As far as I can tell this is a bug. It dosn't happen when I try...
>  
>   dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  NA
>   dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  NA 
> 
> 
> And doing the following (swapping the order) changes the behaviour...
> 
>   dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  x.s.ci$normal[3]
>   dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  x.s.ci$normal[2]
>   
> 
> Giving...
> 
> > dat.5.ho
>   CHAINS DOM_PER_CHAIN      lower     upper
> 1      2      1.416539  1.3616070  1.472716
> 2      3      1.200000  1.1134237  1.287601
> 3      4      1.363636  1.2587204  1.466037
> 4      5      1.000000  5.0000000  5.000000
> 5      6      1.323529  1.1082482  1.547222
> 6      7      1.000000  7.0000000  7.000000
> 7      8      1.100000  0.9021282  1.287672
> 8     10      1.142857  0.8766731  1.403327
> 9     11      1.000000 11.0000000 11.000000
> 
> 
> Which is again incorrect and unpredicted (as above). 
> 
> 
> Please let me know what to do to report this problem better, 
> or if I just
> missed something silly.
> 
> I am RH9, R-2.1.0 (compiled from source), latest boot from 
> CRAN (if that
> makes a difference).
> 
> Cheers,
> Dan.
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>




More information about the R-help mailing list