[R] family question
Douglas Bates
bates at stat.wisc.edu
Thu Aug 31 04:29:41 CEST 2000
I couldn't resist trying to create a swish simulation method for this
problem.
To simulate a single family size I start with a length "basesize"
(defaults to 16) random vector of 1's, representing boys, and -1's,
representing girls, each with 0.50 probability.
The reason for -1 and 1 is because checking for more boys than girls
reduces to checking if the cumulative sum is greater than zero.
(Thanks to Richard Johnson for the suggestion.) If you do not get a
family with more boys than girls in that sample then you double the
length of the sample and check again. I impose a limit on the number
of doublings: 13 on a machine with a moderate vsize and 16 on a
machine where I can set vsize = 100M. (These limits could change
after R-1.2.0 is released (probably in December) because of its new
storage allocation and garbage collection mechanism.)
The function to simulate one family size is
fam <-
function(basesize = 16, doublings = 16)
{
## simulate a family size in Troels Ring's problem
samp <- ifelse(runif(basesize) > 0.5, 1, -1)
for (i in 0:doublings) {
moreboys <- cumsum(samp) > 0
if (any(moreboys)) return(min(seq(along = moreboys)[moreboys]))
samp <- c(samp,
ifelse(runif(length(samp)) > 0.5, 1, -1))
}
return(NA)
}
On the machine where I allowed extremely large family sizes (16
doublings), I got some extremely large sizes
> gc() # check the amount of memory allocated
free total (Mb)
Ncells 658984 800000 15.3
Vcells 13062681 13107200 100.0
> famsz <- integer(10000)
> for (i in seq(along = famsz)) famsz[i] <- fam()
> summary(famsz)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.0 1.0 1.0 665.7 9.0 832200.0 12.0
The NA's represent family sizes that would be greater than 2^20. Note
that there is about a 1 in 1000 chance of that occuring!
Here are the big families
> sort(famsz[famsz > 10000])
[1] 10285 10379 10485 11337 11623 11669 11747 12185 12273 12335
[11] 13547 14113 14623 15467 15567 16425 20253 20515 21133 22559
[21] 22989 23015 23387 23771 24051 26353 27701 28297 29577 29579
[31] 30071 32207 35971 37669 38419 39185 40457 47893 51383 52869
[41] 66031 68411 71883 74351 75481 77125 83023 91483 92489 92565
[51] 105233 107009 115961 121769 188675 204793 234175 239793 251181 264895
[61] 335125 337655 353795 464961 832211
Even ignoring the really big families (those that are NA's here), the
proportion of females comes out to be essentially 0.5 in this sample
> sum((famsz - 1)/2, na.rm = TRUE)/sum(famsz, na.rm = TRUE)
[1] 0.499249
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list