[R] Generate groups with random size but given total sample size

Arne Schulz arne.schulz at student.uni-kassel.de
Thu Jul 15 10:42:56 CEST 2010


Hi,
thanks a lot! That did it!

Regards,
Arne Schulz

> -----Ursprüngliche Nachricht-----
> Von: Greg Snow [mailto:Greg.Snow at imail.org]
> Gesendet: Dienstag, 13. Juli 2010 18:17
> An: Arne Schulz; r-help at r-project.org
> Betreff: RE: [R] Generate groups with random size but given total sample size
> 
> For one definition of random:
> 
> ss <- rexp(100)
> ss <- ss/sum(ss)
> 
> ss <- 5 + round( ss*9500 )
> 
> cnt <- 0
> while( ( d <- sum(ss) - 10000 ) != 0 ) {
> 
> 	tmpid <- sample.int(100,1)
> 	ss[tmpid] <- ss[tmpid] - d
> 
> 	ss[ ss > 500 ] <- 500
> 	ss[ ss < 5 ] <- 5
> 
> 	cnt <- cnt + 1
> 	if (cnt > 100) {
> 		cat('problems finding a solution, stopping after 100 iterations\n')
> 		break
> 	}
> }
> 
> group <- rep( 1:100, ss )
> 
> 
> Hope this helps,
> 
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
> 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Arne Schulz
> > Sent: Tuesday, July 13, 2010 7:10 AM
> > To: r-help at r-project.org
> > Subject: [R] Generate groups with random size but given total sample
> > size
> >
> > Dear list,
> > I am currently doing some simulation studies where I want to compare
> > different scenarios.
> > In particular, two scenarios should be compared: 10.000 cases in 100
> > groups with 100 cases per group and 10.000 cases in 100 groups with
> > random group size (ranging from 5 to 500).
> >
> > The first part is no problem:
> > > id <- seq(1,10000)
> > > group <- sort(rep(seq(1,100),100))
> >
> > But I don't get along with the second scenario. Using sample does give
> > me 100 groups with random cases, but generates more than 10.000 cases:
> > > set.seed(13)
> > > sum(sample(5:500, 100))
> > [1] 24583
> >
> > Another way could be generating one sample at a time and sum the cases.
> > But this would end up in trail & error to fit the 10.000 cases. Maybe
> > it would break rules of probability, too.
> >
> > I'm convinced that there should be another (and even better) way to
> > handle this problem in R... :-)
> >
> >
> > Best regards,
> > Arne Schulz
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list