[R] Help with possible bug (assigning NA value to data.frame) ?

James Reilly reilly at stat.auckland.ac.nz
Wed Jun 8 01:36:41 CEST 2005


This seems to have more to do with NULLs than NAs. For instance:
> a <- data.frame(matrix(1:8, nrow=2))
> a
  X1 X2 X3 X4
1  1  3  5  7
2  2  4  6  8
> a[a$X2 == 4,]$X1 <- NULL
> a
  X1 X2 X3 X4
1  1  3  5  7
2  4  6  8  4

James

On 8/06/2005 7:15 a.m., Liaw, Andy wrote:
> There's something peculiar that I do not understand here.  However, did you
> realize that the thing you are assigning into parts of `a' is NULL?  Check
> you're my.test.boot.ci.1:  It's NULL.
> 
> Be that as it may, I get:
> 
> 
>>a <- data.frame(matrix(1:4, nrow=2), X3=NA, X4=NA)
>>a
> 
>   X1 X2 X3 X4
> 1  1  3 NA NA
> 2  2  4 NA NA
> 
>>a[a$X1 == 1,]$X3 <- NULL
>>a
> 
>   X1 X2 X3 X4
> 1  1  3 NA  1
> 2  2  4 NA NA
> 
>>a[a$X1 == 1,]$X4 <- NULL
>>a
> 
>   X1 X2 X3 X4
> 1  1  3 NA  1
> 2  2  4 NA NA
> 
> which really baffles me...
> 
> In any case, that's not how I would assign into part of a data frame.  I
> would do either
> 
>     a[a$X1 == 1, "X3"] <- something
> 
> or
> 
>     a$X3[a$X1 == 1] <- something
> 
> In either case you'd get an error if `something' is NULL.
> 
> Andy
> 
> 
>>From: Dan Bolser
>>
>>
>>This 'strange behaviour' manifest itself within some quite complex
>>code. When I created a *very* simple example the behaviour 
>>dissapeared. 
>>
>>Here is the simplest version I have found which still causes 
>>the strange
>>behaviour (it could be quite unrelated to the boot library, however).
>>
>>
>>library(boot)
>> 
>>## boot statistic function
>>my.mean.s <- function(data,subset){
>>  mean(data[subset])
>>}
>>
>>## dummy data, deliberatly no variance
>>my.test.dat.1 <- rep(4,5)
>>my.test.dat.2 <- rep(8,5)
>>
>>## not much can happen here
>>my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 )
>>my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 )
>>
>>## returns a null object as ci is meaningless for this data
>>my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal')
>>my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal')
>>
>>
>>## now try to store this data (the problem begins)...
>>
>>## dummy existing data 
>>a <- data.frame(matrix(c(1,2,3,4),nrow=2))
>>
>>## make space for new data
>>a$X3 <- NA
>>a$X4 <- NA
>>
>>## try to store the upper and lower ci (not) calculated above
>>a[a$X1==1,]$X3 <-  my.test.boot.ci.1$normal[2]
>>a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
>>a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]
>>a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]
>>
>>a
>>
>>
>>What I see is 
>>
>>
>>>a
>>
>>  X1 X2 X3 X4
>>1  1  3 NA  1
>>2  2  4 NA  2
>>
>>
>>What I expected to see was
>>
>>
>>>a
>>
>>  X1 X2 X3 X4
>>1  1  3 NA  NA
>>2  2  4 NA  NA
>>
>>Some how the last assignment of the data from within the null object
>>assigns the value of the '==x' part of the logical vector subscript.
>>
>>If I make the following (trivial?) adjustment 
>>
>>a[a$X1==1,]$X4 <-  my.test.boot.ci.1$normal[3]
>>a[a$X1==1,]$X3 <-  my.test.boot.ci.a$normal[2]
>>a[a$X1==2,]$X4 <-  my.test.boot.ci.1$normal[3]
>>a[a$X1==2,]$X3 <-  my.test.boot.ci.1$normal[2]
>>
>>
>>The output changes to 
>>
>>
>>>a
>>
>>  X1 X2 X3 X4
>>1  1  3  1  1
>>2  2  4  2  2
>>
>>Which is even wronger.
>>
>>
>>
>>Not sure if this is usefull without the full context, but here is the
>>output from the real version of this program (where most of 
>>the above code
>>is within a loop). What is printed out for each cycle of the 
>>loop is the
>>value of the '==x' part of the subscript.
>>
>>
>>[1] 2
>>[1] 3
>>[1] 4
>>[1] 5
>>[1] "All values of t are equal to  1 \n Cannot calculate confidence
>>intervals"
>>[1] 6
>>[1] 7
>>[1] "All values of t are equal to  1 \n Cannot calculate confidence
>>intervals"
>>[1] 8
>>[1] 10
>>[1] 11
>>[1] "All values of t are equal to  1 \n Cannot calculate confidence
>>intervals"
>>
>>
>>Above you see that for some values I can't calculate a ci 
>>(but storing it
>>as above), then...
>>
>>
>>>dat.5.ho
>>
>>  CHAINS DOM_PER_CHAIN     lower     upper
>>1      2      1.416539 1.3626253  1.468387
>>2      3      1.200000 1.1146014  1.288724
>>3      4      1.363636 1.2675657  1.462571
>>4      5      1.000000        NA  5.000000
>>5      6      1.323529 1.0991974  1.546156
>>6      7      1.000000        NA  7.000000
>>7      8      1.100000 0.9037904  1.289210
>>8     10      1.142857 0.8775104  1.403918
>>9     11      1.000000        NA 11.000000
>>
>>
>>Do you spot the same problem? Namely for each value of the 
>>'CHAINS' column
>>that was unable to calculate a ci, the second assignment to 
>>the data table
>>from the 'null' object assigned the lookup value of CHAINS to 
>>that column
>>instead! The assignment (within the loop) looks like this...
>>
>>  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  x.s.ci$normal[2]
>>  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  x.s.ci$normal[3]
>>
>>(where chain is the 'loop variable').
>>
>>
>>As far as I can tell this is a bug. It dosn't happen when I try...
>> 
>>  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  NA
>>  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  NA 
>>
>>
>>And doing the following (swapping the order) changes the behaviour...
>>
>>  dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <-  x.s.ci$normal[3]
>>  dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <-  x.s.ci$normal[2]
>>  
>>
>>Giving...
>>
>>
>>>dat.5.ho
>>
>>  CHAINS DOM_PER_CHAIN      lower     upper
>>1      2      1.416539  1.3616070  1.472716
>>2      3      1.200000  1.1134237  1.287601
>>3      4      1.363636  1.2587204  1.466037
>>4      5      1.000000  5.0000000  5.000000
>>5      6      1.323529  1.1082482  1.547222
>>6      7      1.000000  7.0000000  7.000000
>>7      8      1.100000  0.9021282  1.287672
>>8     10      1.142857  0.8766731  1.403327
>>9     11      1.000000 11.0000000 11.000000
>>
>>
>>Which is again incorrect and unpredicted (as above). 
>>
>>
>>Please let me know what to do to report this problem better, 
>>or if I just
>>missed something silly.
>>
>>I am RH9, R-2.1.0 (compiled from source), latest boot from 
>>CRAN (if that
>>makes a difference).
>>
>>Cheers,
>>Dan.
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
>>
>>
> 
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

-- 
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand




More information about the R-help mailing list