[R] Generate groups with random size but given total sample size
Greg Snow
Greg.Snow at imail.org
Tue Jul 13 18:17:22 CEST 2010
For one definition of random:
ss <- rexp(100)
ss <- ss/sum(ss)
ss <- 5 + round( ss*9500 )
cnt <- 0
while( ( d <- sum(ss) - 10000 ) != 0 ) {
tmpid <- sample.int(100,1)
ss[tmpid] <- ss[tmpid] - d
ss[ ss > 500 ] <- 500
ss[ ss < 5 ] <- 5
cnt <- cnt + 1
if (cnt > 100) {
cat('problems finding a solution, stopping after 100 iterations\n')
break
}
}
group <- rep( 1:100, ss )
Hope this helps,
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Arne Schulz
> Sent: Tuesday, July 13, 2010 7:10 AM
> To: r-help at r-project.org
> Subject: [R] Generate groups with random size but given total sample
> size
>
> Dear list,
> I am currently doing some simulation studies where I want to compare
> different scenarios.
> In particular, two scenarios should be compared: 10.000 cases in 100
> groups with 100 cases per group and 10.000 cases in 100 groups with
> random group size (ranging from 5 to 500).
>
> The first part is no problem:
> > id <- seq(1,10000)
> > group <- sort(rep(seq(1,100),100))
>
> But I don't get along with the second scenario. Using sample does give
> me 100 groups with random cases, but generates more than 10.000 cases:
> > set.seed(13)
> > sum(sample(5:500, 100))
> [1] 24583
>
> Another way could be generating one sample at a time and sum the cases.
> But this would end up in trail & error to fit the 10.000 cases. Maybe
> it would break rules of probability, too.
>
> I'm convinced that there should be another (and even better) way to
> handle this problem in R... :-)
>
>
> Best regards,
> Arne Schulz
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list