[R] Generate groups with random size but given total sample size

Greg Snow Greg.Snow at imail.org
Tue Jul 13 18:17:22 CEST 2010


For one definition of random:

ss <- rexp(100)
ss <- ss/sum(ss)

ss <- 5 + round( ss*9500 )

cnt <- 0
while( ( d <- sum(ss) - 10000 ) != 0 ) {
	
	tmpid <- sample.int(100,1)
	ss[tmpid] <- ss[tmpid] - d

	ss[ ss > 500 ] <- 500
	ss[ ss < 5 ] <- 5

	cnt <- cnt + 1
	if (cnt > 100) {
		cat('problems finding a solution, stopping after 100 iterations\n')
		break
	}
}

group <- rep( 1:100, ss )


Hope this helps,

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Arne Schulz
> Sent: Tuesday, July 13, 2010 7:10 AM
> To: r-help at r-project.org
> Subject: [R] Generate groups with random size but given total sample
> size
> 
> Dear list,
> I am currently doing some simulation studies where I want to compare
> different scenarios.
> In particular, two scenarios should be compared: 10.000 cases in 100
> groups with 100 cases per group and 10.000 cases in 100 groups with
> random group size (ranging from 5 to 500).
> 
> The first part is no problem:
> > id <- seq(1,10000)
> > group <- sort(rep(seq(1,100),100))
> 
> But I don't get along with the second scenario. Using sample does give
> me 100 groups with random cases, but generates more than 10.000 cases:
> > set.seed(13)
> > sum(sample(5:500, 100))
> [1] 24583
> 
> Another way could be generating one sample at a time and sum the cases.
> But this would end up in trail & error to fit the 10.000 cases. Maybe
> it would break rules of probability, too.
> 
> I'm convinced that there should be another (and even better) way to
> handle this problem in R... :-)
> 
> 
> Best regards,
> Arne Schulz
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list