SV: [R] sample from contingency table
Bill Venables
William.Venables at cmis.CSIRO.AU
Wed Sep 20 10:29:02 CEST 2000
Regin Reinert writes:
> I have had the same problem and I wrote this function
>
> rmulti <- function(n, size, p)
> {
> NrDim <- length(p)
> if(NrDim<2) stop("The simulated variabel has to be at least
> 2-dimensional")
> res <- matrix(data=NA, nrow=n, ncol=NrDim)
> p <- p/sum(p)
> TempSize <- size
> for(i in 1:NrDim)
> {
> TempP <- p[i]/sum(p[i:NrDim])
> TempBin <- rbinom(n=n, size=TempSize, prob=TempP)
> TempSize <- TempSize-TempBin
> res[,i] <- TempBin
> }
> return(res)
> }
>
> # Then you can draw 10 samples like this, whith
> # each row representing a contingency table
>
> x <- as.matrix(1:4, nrow=2, ncol=2)
> rmulti(10, sum(x), x)
>
>
> Regin
Hey, hang on... If I have understood the original question properly
what you have to do is to sample from the cells of a contingency table
with probabilities proportional to the frequencies in those cells.
Here is the original question:
> -----Oprindelig meddelelse-----
> Fra: Dirk F. Raetzel [mailto:raetzel at Mathematik.Uni-Marburg.de]
> Sendt: 19. september 2000 18:48
> Til: R-Help Mailing List
> Emne: [R] sample from contingency table
>
>
> Hello,
>
> I have a multivariate (dim >= 3) discrete distribution
> given by a contingency table from which I want to draw independent
> random samples. The result should be a data.frame (or array) with each
> column representing a dimension.
>
> Before starting to hack some search tree with approbiate
> transformations: Is there any built-in function I
> have overseen or did anybody program such a function already?
>
> Dirk
I can't see why you would need a "search tree" for this problem
either. Here is (what I think is) a very simple solution:
sampct <- function(n, Fr) {
# sample with replacement from a multivariate distribution
# defined by a contingency table
if(!is.null(dfr <- dimnames(Fr)) &&
prod(sapply(dfr, length)) == length(Fr))
dfr <- expand.grid(dfr)
else
dfr <- expand.grid(lapply(dim(Fr), seq))
dfr[sample(1:nrow(dfr), n, prob = Fr, rep = T), ]
}
Here n is the sample size and Fr the array of frequencies in the
table. The result is an n x k data frame of cell indices where k is
the dimension of Fr.
It uses the fact that sample() can sample with replacement and with
probabilities proportional to the entries in a (non-negative) vector.
As usual there are a few little obscurities there to make it
interesting...
--
Bill Venables, Statistician, CMIS Environmetrics Project
CSIRO Marine Labs, PO Box 120, Cleveland, Qld, AUSTRALIA. 4163
Tel: +61 7 3826 7251 Email: Bill.Venables at cmis.csiro.au
Fax: +61 7 3826 7304 http://www.cmis.csiro.au/bill.venables/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list