[R] Strange results : bootrstrp CIs
Ben Bolker
bbo|ker @end|ng |rom gm@||@com
Sat Jan 13 23:51:24 CET 2024
It took me a little while to figure this out, but: the problem is
that if your resampling leaves out any countries (which is very likely),
your model applied to the bootstrapped data will have fewer coefficients
than your original model.
I tried this:
cc <- unique(e$Country)
func <- function(data, idx) {
coef(lm(Score~ Time + factor(Country, levels =cc),data=data[idx,]))
}
but lm() automatically drops coefficients for missing countries (I
didn't think about it too hard, but I thought they might get returned as
NA and that boot() might be able to handle that ...)
If you want to do this I think you'll have to find a way to do a
*stratified* bootstrap, restricting the bootstrap samples so that they
always contain at least one sample from each country ... (I would have
expected "strata = as.numeric(e$Country)" to do this, but it doesn't
work the way I thought ... it tries to compute the statistics for *each*
stratum ...)
====
Debugging attempts:
set.seed(101)
options(error=recover)
B= boot(e, func, R=1000)
Error in t.star[r, ] <- res[[r]] :
number of items to replace is not a multiple of replacement length
Enter a frame number, or 0 to exit
1: boot(e, func, R = 1000)
<enter 1>
Selection: 1
Called from: top level
Browse[1]> print(r)
[1] 2
Browse[1]> t.star[r,]
[1] NA NA NA NA NA NA NA NA NA
i[2,]
[1] 14 15 22 22 21 14 8 2 12 22 10 15 9 7 9 13 12 23 1 20 15 7
5 10
On 2024-01-13 5:22 p.m., varin sacha via R-help wrote:
> Dear Duncan,
> Dear Ivan,
>
> I really thank you a lot for your response.
> So, if I correctly understand your answers the problem is coming from this line:
>
> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
>
> This line should be:
> coef(lm(Score~ Time + factor(Country),data=data[idx,]))
>
> If yes, now I get an error message (code here below)! So, it still does not work.
>
> Error in t.star[r, ] <- res[[r]] :
> number of items to replace is not a multiple of replacement length
>
>
> ##########################################
> Score=c(345,564,467,675,432,346,476,512,567,543,234,435,654,411,356,658,432,345,432,345, 345,456,543,501)
>
> Country=c("Italy", "Italy", "Italy", "Turkey", "Turkey", "Turkey", "USA", "USA", "USA", "Korea", "Korea", "Korea", "Portugal", "Portugal", "Portugal", "UK", "UK", "UK", "Poland", "Poland", "Poland", "Austria", "Austria", "Austria")
>
> Time=c(1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
>
> e=data.frame(Score, Country, Time)
>
>
> library(boot)
> func= function(data, idx) {
> coef(lm(Score~ Time + factor(Country),data=data[idx,]))
> }
> B= boot(e, func, R=1000)
>
> boot.ci(B, index=2, type="perc")
> #############################################################
>
>
>
>
>
>
>
>
> Le samedi 13 janvier 2024 à 21:56:58 UTC+1, Ivan Krylov <ikrylov using disroot.org> a écrit :
>
>
>
>
>
> В Sat, 13 Jan 2024 20:33:47 +0000 (UTC)
>
> varin sacha via R-help <r-help using r-project.org> пишет:
>
>> coef(lm(Score~ Time + factor(Country)),data=data[idx,])
>
>
> Wrong place for the data=... argument. You meant to give it to lm(...),
> but in the end it went to coef(...). Without the data=... argument, the
> formula passed to lm() picks up the global variables inherited by the
> func() closure.
>
> Unfortunately, S3 methods really do have to ignore extra arguments they
> don't understand if the class is to be extended, so coef.lm isn't
> allowed to complain to you about it.
>
More information about the R-help
mailing list