[R] generating unordered combinations

Dan dan.halligan at gmail.com
Wed Sep 23 17:18:48 CEST 2009


Thanks Bryan, sorry for the late reply - I only just noticed this
post. I'm not specifically interested in that sum, but something
related to the sum so this may also be very useful.

Dan

On 18 Sep, 18:24, Bryan Keller <bskel... at wisc.edu> wrote:
> The combn solution offered by Bill is great.  It struck me that what you are doing, in fact, is generating the null distribution of the two-sample Wilcoxon test where the first group has size m and the second group has size n.  In general, the length of the array has size choose(n+m-1,m) which gets very big very fast.  For example,
>
> > choose(20+20-1,20)
>
> [1] 68923264410
>
> If, on the off chance that you are interested in *summing* the unordered combinations across columns, there is a very slick way to do this that takes a tiny fraction of the time and memory that generating huge arrays entails.  If not, obviously, you already have your solution.
>
> Just in case, here is the code to generate the distribution of sums.  It is based on an algorithm due to Harding (1984).
>
> f <- function(m,n) {
>
> umax <- (m*n+1)
>
> ifelse (umax%%2==0, umaxp <- (umax/2)-1, umaxp <- (umax-1)/2)
> #umaxp is analagous to “M” from Harding (1984)
>
> p <- min((m+n),umaxp)
> q <- min(m,umaxp)
>
> dis <- c(1,numeric(umaxp))
>
> if ((n+1)>umaxp) {
>
> for (i in 1:q) {                        #steps for denominator of generating function
> for (j in i:umaxp) {
>         dis[j+1] <- (dis[j+1] + dis[(j-i)+1])
>         }
>         }
>
> } else {
>
> for (i in (n+1):(p)) {          #steps for numerator of generating function
> for (j in (umaxp):i) {
>         dis[j+1] <- (dis[j+1] - dis[(j-i)+1])
>         }
>         }
>
> for (i in 1:q) {                        #steps for denominator of generating function
> for (j in i:umaxp) {
>         dis[j+1] <- (dis[j+1] + dis[(j-i)+1])
>         }
>         }
>
> }
>
> ldis <- length(dis)
> ifelse(umax%%2==0,dis <- c(dis,dis[ldis:1]),dis <- c(dis,dis[(ldis-1):1]))
> dispr <- dis/choose((n+m),n)
> ws <- sum(1:m):sum((n+1):(n+m))
> lws <- length(ws)
> mat3 <- cbind(ws,dis,dispr,numeric(lws),numeric(lws))
> mat3[,4] <- cumsum(mat3[,3])
> mat3[,5] <- cumsum(mat3[,3][lws:1])[lws:1]
> colnames(mat3) <- c("W","Freq","Probability","Sum up","Sum down")
>
> print(mat3)
>
> }
> > system.time(f(20,20))
>
>    user  system elapsed
>    0.11    0.00    0.11
>
> Bryan
>
> That's brilliant - thanks.
>
> On 17 Sep 2009, at 23:36, William Dunlap wrote:
>
>
>
>
>
> > There is a 1-1 correspondance between your n-sets
> > consisting of m possible element types (0 through m-1
> > in your example) and the number of n-subsets of a (n+m-1)-set.
> > E.g., your example had m=3 and n=3 and subtracting
> > 1:3 from each column of combn(3+3-1,3) gives your result:
>
> >> t(combn(3+3-1, 3)-(1:3))
> >      [,1] [,2] [,3]
> > [1,]    0    0    0
> > [2,]    0    0    1
> > [3,]    0    0    2
> > [4,]    0    1    1
> > [5,]    0    1    2
> > [6,]    0    2    2
> > [7,]    1    1    1
> > [8,]    1    1    2
> > [9,]    1    2    2
> > [10,]    2    2    2
>
> > Bill Dunlap
> > TIBCO Software Inc - Spotfire Division
> > wdunlap tibco.com
>
> >> -----Original Message-----
> >> From: r-help-boun... at r-project.org
> >> [mailto:r-help-boun... at r-project.org] On Behalf Of DanHalligan
> >> Sent: Thursday, September 17, 2009 1:31 PM
> >> To: r-h... at r-project.org
> >> Subject: [R] generating unordered combinations
>
> >> Hi,
>
> >> I am trying to generate all unordered combinations of a set of
> >> numbers / characters, and I can only find a (very) clumsy way of  
> >> doing
> >> this using expand.grid.  For example, all unordered combinations of
> >> the numbers 0, 1, 2 are:
> >> 0, 0, 0
> >> 0, 0, 1
> >> 0, 0, 2
> >> 0, 1, 1
> >> 0, 1, 2
> >> 0, 2, 2
> >> 1, 1, 1
> >> 1, 1, 2
> >> 1, 2, 2
> >> 2, 2, 2
>
> >> (I have not included, for example, 1, 0, 0, since it is equivalent to
> >> 0, 0, 1).
>
> >> I have found a way to generate this data.frame using expand.grid as
> >> follows:
>
> >> g <- expand.grid(c(0,1,2), c(0,1,2), c(0,1,2))
> >> for(i in 1:nrow(g)) {
> >>        g[i,] <- sort(as.character(g[i,]))
> >> }
> >> o <- order(g$Var1, g$Var2, g$Var3)
> >> unique(g[o,]).
>
> >> This is obviously quite clumsy and hard to generalise to a greater
> >> number of characters, so I'm keen to find any other solutions.  Can
> >> anyone suggest a better (more general, quicker) method?
>
> >> Cheers
>
> >> ______________________________________________
> >> R-h... at r-project.org mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
> -------------
> Bryan Keller, Doctoral Student/Project Assistant
> Educational Psychology - Quantitative Methods
> The University of Wisconsin - Madison
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list