[R] Drawing a sample based on certain condition

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Mon Apr 14 22:16:14 CEST 2025


Just a comment:

You wish to draw subsets of size n with or without replacement -- and I
suspect without replacement is simpler than with -- from a set of positive
integers that sum to a fixed value T. This sounds related to the so-called
subset sum problem in computational complexity theory: Given a set S of
positive integers and positive total, T, is there *any* subset of S (i.e. a
sample without replacement) whose sum is T? This is known to be
NP-complete. So this means that listing all such subsets must be
NP-complete. I don't know whether specifying that, as you have, the size of
the subsets must be a fixed number (obviously?) simplifies the problem
sufficiently to make it computationally tractable; computational wizards or
suitable web searches might tell you this. But for these reasons,  your
query about how to sample the set of all such subsets -- both those drawn
with or without replacement -- seems computationally difficult to me.


Cheers,
Bert



"An educated person is one who can entertain new ideas, entertain others,
and entertain herself."



On Mon, Apr 14, 2025 at 8:37 AM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 2025-04-14 7:26 a.m., Brian Smith wrote:
> > Hi,
> >
> > For my analytical work, I need to draw a sample of certain sample size
> > from a denied population, where population members are marked by
> > non-negative integers, such that sum of sample members if fixed. For
> > example,
> >
> > Population = 0:100
> > Sample_size = 10
> > Sample_Sum = 20
> >
> > Under this setup if my sample members are X1, X2, ..., X10 then I
> > should have X1+X2+...+X10 = 20
> >
> > Sample drawing scheme may be with/without replacement
> >
> > Is there any R function to achieve this? One possibility is to employ
> > naive trial-error approach, but this doesnt seem to be practical as it
> > would take long time to get the final sample with desired properties.
> >
> > Any pointer would be greatly appreciated.
>
> One general way to think of this problem is that you are defining a
> distribution on the space of all possible samples of size 10, such that
> the probability of a sample is X if the sum is 20, and zero otherwise,
> and you want to sample from this distribution.
>
> There's probably a slick method to do that for your example, but if
> you've got a general population instead of that special one, I doubt it.
>
> What I would do is the following:
>
> Define another distribution on samples that has probabilities that
> depend on the sum of the sample, with the highest probabilities attached
> to ones with the correct sum, and probabilities for other sums declining
> with distance from the sum.  For example, maybe
>
>   P(sum) = Y/(1 + abs(sum - 20))
>
> for some constant Y.
>
> You can use MCMC to sample from that distribution and then only keep the
> samples where the sum is exactly equal to the target sum.  If you do
> that, you don't need to care about the value of Y. but you do need to
> think about how proposed moves are made, and you probably need to use a
> different function than the example above for acceptable efficiency.
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list