[R] Generating groupings of ordered observations

Charles C. Berry cberry at tajo.ucsd.edu
Sun Jun 22 02:51:04 CEST 2008


On Sat, 21 Jun 2008, Gavin Simpson wrote:

> On Sat, 2008-06-21 at 17:56 -0400, Gabor Grothendieck wrote:
>> On Sat, Jun 21, 2008 at 12:40 PM, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
>>> Dear List,
>>>
>>> I have a problem I'm finding it difficult to make headway with.
>>>
>>> Say I have 6 ordered observations, and I want to find all combinations
>>> of splitting these 6 ordered observations in g groups, where g = 1, ...,
>>> 6. Groups can only be formed by adjacent observations, so observations 1
>>> and 4 can't be in a group on their own, only if 1,2,3&4 are all in the
>>> group.
>>>
> <snip />
>>>
>>> I'd like to be able to do this automagically, for any (reasonable,
>>> small, say n = 10-20) number of observations, n, and for g = 1, ..., n
>>> groups.
>>>
>>> I can't see the pattern here or a way forward. Can anyone suggest an
>>> approach?
>>>
>>
>> Peter Wolf has APL-style encode/decode functions on his web site that
>> can readily be used for this.  The output of the encode below are the binary
>> digits expansions of the numbers 0:15, one per column, and the remainder
>> transforms that matrix to the required one (but columns are in a different
>> order than yours):
>
> Thanks for this Gabor. Will take a look.
>
> One additional complication that I failed to mention, is that I would
> like to state the number of groups. So in my example, instead of
> splitting the 6 observations into 1,2,...,6 groups, I would like to get
> only the sets that partition the 6 observations into say 4 groups.
>
> For larger problems, where n is > 100 it is not possible to do the
> required calculations on the choose(n-1, 0:(n-1)) possible combinations.
> Instead we would evaluate the combinations of the n observations
> relating to a small number of groups, say g = 1:10 where n = 100 for
> example.
>
> So Chuck and your solutions answer the question I asked, but I can't see
> how to modify them to set the number of groups to be less than the
> number of observations.

So you want all ways to split an ordered set of n objects into k 
contiguous groupings with k << n ??

The approach shown in this thread is one approach:

 	http://finzi.psych.upenn.edu/R/Rhelp02a/archive/76379.html


but with more than N=30, you will need to typedef an object that has 
enough bits to represent any possible split.

Also, apropos the warnings (further down) in that thread, you can easily 
exhaust the memory of a computer. For n=100 and ngroups=10, there are 
choose(99,9) possibilities which I guess is many terabytes.

HTH,

Chuck

>
> All the best,
>
> G
>
>>
>>> source("http://www.wiwi.uni-bielefeld.de/~wolf/software/R-wtools/decodeencode/decodeencode.R")
>>> n <- 6
>>> n1 <- n-1
>>> apply(rbind(1, encode(0:(2^n1-1), rep(2, n1))), 2, cumsum)
>>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
>> [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23]
>> [,24] [,25] [,26]
>> [1,]    1    1    1    1    1    1    1    1    1     1     1     1
>>  1     1     1     1     1     1     1     1     1     1     1     1
>>   1     1
>> [2,]    1    1    1    1    1    1    1    1    1     1     1     1
>>  1     1     1     1     2     2     2     2     2     2     2     2
>>   2     2
>> [3,]    1    1    1    1    1    1    1    1    2     2     2     2
>>  2     2     2     2     2     2     2     2     2     2     2     2
>>   3     3
>> [4,]    1    1    1    1    2    2    2    2    2     2     2     2
>>  3     3     3     3     2     2     2     2     3     3     3     3
>>   3     3
>> [5,]    1    1    2    2    2    2    3    3    2     2     3     3
>>  3     3     4     4     2     2     3     3     3     3     4     4
>>   3     3
>> [6,]    1    2    2    3    2    3    3    4    2     3     3     4
>>  3     4     4     5     2     3     3     4     3     4     4     5
>>   3     4
>>      [,27] [,28] [,29] [,30] [,31] [,32]
>> [1,]     1     1     1     1     1     1
>> [2,]     2     2     2     2     2     2
>> [3,]     3     3     3     3     3     3
>> [4,]     3     3     4     4     4     4
>> [5,]     4     4     4     4     5     5
>> [6,]     4     5     4     5     5     6
> -- 
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list