[R] Help with possible bug (assigning NA value to data.frame) ?
James Reilly
reilly at stat.auckland.ac.nz
Wed Jun 8 01:36:41 CEST 2005
This seems to have more to do with NULLs than NAs. For instance:
> a <- data.frame(matrix(1:8, nrow=2))
> a
X1 X2 X3 X4
1 1 3 5 7
2 2 4 6 8
> a[a$X2 == 4,]$X1 <- NULL
> a
X1 X2 X3 X4
1 1 3 5 7
2 4 6 8 4
James
On 8/06/2005 7:15 a.m., Liaw, Andy wrote:
> There's something peculiar that I do not understand here. However, did you
> realize that the thing you are assigning into parts of `a' is NULL? Check
> you're my.test.boot.ci.1: It's NULL.
>
> Be that as it may, I get:
>
>
>>a <- data.frame(matrix(1:4, nrow=2), X3=NA, X4=NA)
>>a
>
> X1 X2 X3 X4
> 1 1 3 NA NA
> 2 2 4 NA NA
>
>>a[a$X1 == 1,]$X3 <- NULL
>>a
>
> X1 X2 X3 X4
> 1 1 3 NA 1
> 2 2 4 NA NA
>
>>a[a$X1 == 1,]$X4 <- NULL
>>a
>
> X1 X2 X3 X4
> 1 1 3 NA 1
> 2 2 4 NA NA
>
> which really baffles me...
>
> In any case, that's not how I would assign into part of a data frame. I
> would do either
>
> a[a$X1 == 1, "X3"] <- something
>
> or
>
> a$X3[a$X1 == 1] <- something
>
> In either case you'd get an error if `something' is NULL.
>
> Andy
>
>
>>From: Dan Bolser
>>
>>
>>This 'strange behaviour' manifest itself within some quite complex
>>code. When I created a *very* simple example the behaviour
>>dissapeared.
>>
>>Here is the simplest version I have found which still causes
>>the strange
>>behaviour (it could be quite unrelated to the boot library, however).
>>
>>
>>library(boot)
>>
>>## boot statistic function
>>my.mean.s <- function(data,subset){
>> mean(data[subset])
>>}
>>
>>## dummy data, deliberatly no variance
>>my.test.dat.1 <- rep(4,5)
>>my.test.dat.2 <- rep(8,5)
>>
>>## not much can happen here
>>my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 )
>>my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 )
>>
>>## returns a null object as ci is meaningless for this data
>>my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal')
>>my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal')
>>
>>
>>## now try to store this data (the problem begins)...
>>
>>## dummy existing data
>>a <- data.frame(matrix(c(1,2,3,4),nrow=2))
>>
>>## make space for new data
>>a$X3 <- NA
>>a$X4 <- NA
>>
>>## try to store the upper and lower ci (not) calculated above
>>a[a$X1==1,]$X3 <- my.test.boot.ci.1$normal[2]
>>a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3]
>>a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2]
>>a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3]
>>
>>a
>>
>>
>>What I see is
>>
>>
>>>a
>>
>> X1 X2 X3 X4
>>1 1 3 NA 1
>>2 2 4 NA 2
>>
>>
>>What I expected to see was
>>
>>
>>>a
>>
>> X1 X2 X3 X4
>>1 1 3 NA NA
>>2 2 4 NA NA
>>
>>Some how the last assignment of the data from within the null object
>>assigns the value of the '==x' part of the logical vector subscript.
>>
>>If I make the following (trivial?) adjustment
>>
>>a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3]
>>a[a$X1==1,]$X3 <- my.test.boot.ci.a$normal[2]
>>a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3]
>>a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2]
>>
>>
>>The output changes to
>>
>>
>>>a
>>
>> X1 X2 X3 X4
>>1 1 3 1 1
>>2 2 4 2 2
>>
>>Which is even wronger.
>>
>>
>>
>>Not sure if this is usefull without the full context, but here is the
>>output from the real version of this program (where most of
>>the above code
>>is within a loop). What is printed out for each cycle of the
>>loop is the
>>value of the '==x' part of the subscript.
>>
>>
>>[1] 2
>>[1] 3
>>[1] 4
>>[1] 5
>>[1] "All values of t are equal to 1 \n Cannot calculate confidence
>>intervals"
>>[1] 6
>>[1] 7
>>[1] "All values of t are equal to 1 \n Cannot calculate confidence
>>intervals"
>>[1] 8
>>[1] 10
>>[1] 11
>>[1] "All values of t are equal to 1 \n Cannot calculate confidence
>>intervals"
>>
>>
>>Above you see that for some values I can't calculate a ci
>>(but storing it
>>as above), then...
>>
>>
>>>dat.5.ho
>>
>> CHAINS DOM_PER_CHAIN lower upper
>>1 2 1.416539 1.3626253 1.468387
>>2 3 1.200000 1.1146014 1.288724
>>3 4 1.363636 1.2675657 1.462571
>>4 5 1.000000 NA 5.000000
>>5 6 1.323529 1.0991974 1.546156
>>6 7 1.000000 NA 7.000000
>>7 8 1.100000 0.9037904 1.289210
>>8 10 1.142857 0.8775104 1.403918
>>9 11 1.000000 NA 11.000000
>>
>>
>>Do you spot the same problem? Namely for each value of the
>>'CHAINS' column
>>that was unable to calculate a ci, the second assignment to
>>the data table
>>from the 'null' object assigned the lookup value of CHAINS to
>>that column
>>instead! The assignment (within the loop) looks like this...
>>
>> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2]
>> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3]
>>
>>(where chain is the 'loop variable').
>>
>>
>>As far as I can tell this is a bug. It dosn't happen when I try...
>>
>> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- NA
>> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- NA
>>
>>
>>And doing the following (swapping the order) changes the behaviour...
>>
>> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3]
>> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2]
>>
>>
>>Giving...
>>
>>
>>>dat.5.ho
>>
>> CHAINS DOM_PER_CHAIN lower upper
>>1 2 1.416539 1.3616070 1.472716
>>2 3 1.200000 1.1134237 1.287601
>>3 4 1.363636 1.2587204 1.466037
>>4 5 1.000000 5.0000000 5.000000
>>5 6 1.323529 1.1082482 1.547222
>>6 7 1.000000 7.0000000 7.000000
>>7 8 1.100000 0.9021282 1.287672
>>8 10 1.142857 0.8766731 1.403327
>>9 11 1.000000 11.0000000 11.000000
>>
>>
>>Which is again incorrect and unpredicted (as above).
>>
>>
>>Please let me know what to do to report this problem better,
>>or if I just
>>missed something silly.
>>
>>I am RH9, R-2.1.0 (compiled from source), latest boot from
>>CRAN (if that
>>makes a difference).
>>
>>Cheers,
>>Dan.
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide!
>>http://www.R-project.org/posting-guide.html
>>
>>
>>
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
--
James Reilly
Department of Statistics, University of Auckland
Private Bag 92019, Auckland, New Zealand
More information about the R-help
mailing list