[R] Help with possible bug (assigning NA value to data.frame) ?
Liaw, Andy
andy_liaw at merck.com
Tue Jun 7 21:15:14 CEST 2005
There's something peculiar that I do not understand here. However, did you
realize that the thing you are assigning into parts of `a' is NULL? Check
you're my.test.boot.ci.1: It's NULL.
Be that as it may, I get:
> a <- data.frame(matrix(1:4, nrow=2), X3=NA, X4=NA)
> a
X1 X2 X3 X4
1 1 3 NA NA
2 2 4 NA NA
> a[a$X1 == 1,]$X3 <- NULL
> a
X1 X2 X3 X4
1 1 3 NA 1
2 2 4 NA NA
> a[a$X1 == 1,]$X4 <- NULL
> a
X1 X2 X3 X4
1 1 3 NA 1
2 2 4 NA NA
which really baffles me...
In any case, that's not how I would assign into part of a data frame. I
would do either
a[a$X1 == 1, "X3"] <- something
or
a$X3[a$X1 == 1] <- something
In either case you'd get an error if `something' is NULL.
Andy
> From: Dan Bolser
>
>
> This 'strange behaviour' manifest itself within some quite complex
> code. When I created a *very* simple example the behaviour
> dissapeared.
>
> Here is the simplest version I have found which still causes
> the strange
> behaviour (it could be quite unrelated to the boot library, however).
>
>
> library(boot)
>
> ## boot statistic function
> my.mean.s <- function(data,subset){
> mean(data[subset])
> }
>
> ## dummy data, deliberatly no variance
> my.test.dat.1 <- rep(4,5)
> my.test.dat.2 <- rep(8,5)
>
> ## not much can happen here
> my.test.boot.1 <- boot( my.test.dat.1, my.mean.s, R=10 )
> my.test.boot.2 <- boot( my.test.dat.2, my.mean.s, R=10 )
>
> ## returns a null object as ci is meaningless for this data
> my.test.boot.ci.1 <- boot.ci(my.test.boot.1,type='normal')
> my.test.boot.ci.2 <- boot.ci(my.test.boot.2,type='normal')
>
>
> ## now try to store this data (the problem begins)...
>
> ## dummy existing data
> a <- data.frame(matrix(c(1,2,3,4),nrow=2))
>
> ## make space for new data
> a$X3 <- NA
> a$X4 <- NA
>
> ## try to store the upper and lower ci (not) calculated above
> a[a$X1==1,]$X3 <- my.test.boot.ci.1$normal[2]
> a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3]
> a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2]
> a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3]
>
> a
>
>
> What I see is
>
> > a
> X1 X2 X3 X4
> 1 1 3 NA 1
> 2 2 4 NA 2
>
>
> What I expected to see was
>
> > a
> X1 X2 X3 X4
> 1 1 3 NA NA
> 2 2 4 NA NA
>
> Some how the last assignment of the data from within the null object
> assigns the value of the '==x' part of the logical vector subscript.
>
> If I make the following (trivial?) adjustment
>
> a[a$X1==1,]$X4 <- my.test.boot.ci.1$normal[3]
> a[a$X1==1,]$X3 <- my.test.boot.ci.a$normal[2]
> a[a$X1==2,]$X4 <- my.test.boot.ci.1$normal[3]
> a[a$X1==2,]$X3 <- my.test.boot.ci.1$normal[2]
>
>
> The output changes to
>
> > a
> X1 X2 X3 X4
> 1 1 3 1 1
> 2 2 4 2 2
>
> Which is even wronger.
>
>
>
> Not sure if this is usefull without the full context, but here is the
> output from the real version of this program (where most of
> the above code
> is within a loop). What is printed out for each cycle of the
> loop is the
> value of the '==x' part of the subscript.
>
>
> [1] 2
> [1] 3
> [1] 4
> [1] 5
> [1] "All values of t are equal to 1 \n Cannot calculate confidence
> intervals"
> [1] 6
> [1] 7
> [1] "All values of t are equal to 1 \n Cannot calculate confidence
> intervals"
> [1] 8
> [1] 10
> [1] 11
> [1] "All values of t are equal to 1 \n Cannot calculate confidence
> intervals"
> >
>
>
> Above you see that for some values I can't calculate a ci
> (but storing it
> as above), then...
>
> > dat.5.ho
> CHAINS DOM_PER_CHAIN lower upper
> 1 2 1.416539 1.3626253 1.468387
> 2 3 1.200000 1.1146014 1.288724
> 3 4 1.363636 1.2675657 1.462571
> 4 5 1.000000 NA 5.000000
> 5 6 1.323529 1.0991974 1.546156
> 6 7 1.000000 NA 7.000000
> 7 8 1.100000 0.9037904 1.289210
> 8 10 1.142857 0.8775104 1.403918
> 9 11 1.000000 NA 11.000000
> >
>
>
> Do you spot the same problem? Namely for each value of the
> 'CHAINS' column
> that was unable to calculate a ci, the second assignment to
> the data table
> from the 'null' object assigned the lookup value of CHAINS to
> that column
> instead! The assignment (within the loop) looks like this...
>
> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2]
> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3]
>
> (where chain is the 'loop variable').
>
>
> As far as I can tell this is a bug. It dosn't happen when I try...
>
> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- NA
> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- NA
>
>
> And doing the following (swapping the order) changes the behaviour...
>
> dat.5.ho[dat.5.ho$CHAINS==chain,]$upper <- x.s.ci$normal[3]
> dat.5.ho[dat.5.ho$CHAINS==chain,]$lower <- x.s.ci$normal[2]
>
>
> Giving...
>
> > dat.5.ho
> CHAINS DOM_PER_CHAIN lower upper
> 1 2 1.416539 1.3616070 1.472716
> 2 3 1.200000 1.1134237 1.287601
> 3 4 1.363636 1.2587204 1.466037
> 4 5 1.000000 5.0000000 5.000000
> 5 6 1.323529 1.1082482 1.547222
> 6 7 1.000000 7.0000000 7.000000
> 7 8 1.100000 0.9021282 1.287672
> 8 10 1.142857 0.8766731 1.403327
> 9 11 1.000000 11.0000000 11.000000
>
>
> Which is again incorrect and unpredicted (as above).
>
>
> Please let me know what to do to report this problem better,
> or if I just
> missed something silly.
>
> I am RH9, R-2.1.0 (compiled from source), latest boot from
> CRAN (if that
> makes a difference).
>
> Cheers,
> Dan.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>
More information about the R-help
mailing list