[R] Generate random vectors (continuous number) with a fixed sum

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sat Apr 26 22:11:03 CEST 2025


If a uniform sample is needed, it's fairly easy to turn my solution into 
a rejection sampler that is uniform across the region satisfying the 
constraints. It has an acceptance rate that is bounded below by 
something like 1/factorial(dim-1), e.g. 1/6 in 4 dimensions.

Duncan Murdoch

On 2025-04-26 3:18 p.m., Bert Gunter wrote:
> Yes, thanks. I got my algebra/constraints wrong.
> ... Sigh...
> 
> Unfortunatey, this seems to make things yet more difficult.
> 
> -- Bert
> 
> "An educated person is one who can entertain new ideas, entertain 
> others, and entertain herself."
> 
> 
> 
> On Sat, Apr 26, 2025 at 10:01 AM Duncan Murdoch 
> <murdoch.duncan using gmail.com <mailto:murdoch.duncan using gmail.com>> wrote:
> 
>     On 2025-04-24 3:25 p.m., Bert Gunter wrote:
>      > Folks:
>      >
>      > Unless my wee old brain errs (always a danger), uniform sampling
>     from an
>      > n-vector <xi> for which 0 <= ai <= xi <= bi and SUM(xi) = k,  a
>     constant,
>      > where to ensure that the constraints are consistent (and nontrivial),
>      > SUM(ai)< k and SUM(bi) > k, is a simple linear transformation
>     (details left
>      > to the reader) of uniform sampling from a standard n-1 dim
>     simplex, for
>      > which a web search yielded:
>      >
>     https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex <https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex>
> 
>     I think for n > 2 it is sometimes a simplex, and sometimes a different
>     shape.  For example, with n = 3 and all a_i = 0 and all b_i = 1, it's a
>     simplex if k < 1 or k > 2, but not for 1 < k < 2, where it's a hexagon.
> 
>     You can visualize this with the following code:
> 
>     x <- matrix(runif(30000), ncol = 3) # the full cube
>     x0 <- x[1.45 < sums & sums < 1.55,] # points near k = 1.5
>     rgl::plot3d(x0)
> 
> 
>      > The proposed "solutions" in this thread which rejection sample
>     sequentially
>      > from uniforms do *not* produce a uniform distribution on the
>     simplex, as
>      > the link and references therein explain.
> 
>     My solution didn't do rejection sampling, but it won't give a uniform
>     density.  I didn't realize that was requested.
> 
>      >
>      > Apologies if I have misunderstood the problem and proposed solutions,
>      > although there does seem to be some confusion in the thread about
>     exactly
>      > what was sought.
> 
>     Yes, that's true.
> 
>     Duncan Murdoch
> 
>      >
>      > Cheers,
>      > Bert
>      >
>      > "An educated person is one who can entertain new ideas, entertain
>     others,
>      > and entertain herself."
>      >
>      >
>      >
>      > On Thu, Apr 24, 2025 at 10:33 AM Brian Smith
>     <briansmith199312 using gmail.com <mailto:briansmith199312 using gmail.com>>
>      > wrote:
>      >
>      >> Hi Rui,
>      >>
>      >> This code is able to generate absolutely correct random sample
>     vector
>      >> based on the applicable constraints.
>      >>
>      >> I have one question though.
>      >>
>      >> If I understood the R code correctly then, the first element is
>      >> drawing random number from Uniform distribution unconditionally,
>      >> however drawing of sample point for the second element is
>     conditional
>      >> to the first one. Therefore if we have large vector size instead of
>      >> current 2, I guess the feasible region for the last few elements
>     will
>      >> be very small.
>      >>
>      >> Will that be any problem? does there any algorithm exist where all
>      >> (n-1) elements would be drawn unconditionally assuming our
>     vector has
>      >> n elements?
>      >>
>      >> Thanks and regards,
>      >>
>      >>
>      >>
>      >> On Wed, 23 Apr 2025 at 10:57, Rui Barradas <ruipbarradas using sapo.pt
>     <mailto:ruipbarradas using sapo.pt>> wrote:
>      >>>
>      >>> Hello,
>      >>>
>      >>> Here are your tests and the random numbers' histograms.
>      >>>
>      >>>
>      >>> one_vec <- function(a, b, s) {
>      >>>     repeat{
>      >>>       repeat{
>      >>>         u <- runif(1, a[1], b[1])
>      >>>         if(s - u > 0) break
>      >>>       }
>      >>>       v <- s - u
>      >>>       if(a[2] < v && v < b[2]) break
>      >>>     }
>      >>>     c(u, v)
>      >>> }
>      >>> gen_mat <- function(m, a, b, s) {
>      >>>     replicate(m, one_vec(a, b, s))
>      >>> }
>      >>>
>      >>> a <- c(0.015, 0.005)
>      >>> b <- c(0.070, 0.045)
>      >>> s <- 0.05528650577311
>      >>> m <- 10000L
>      >>>
>      >>> set.seed(2025)
>      >>> res <- gen_mat(m, a, b, s)
>      >>> apply(res, 1, min) > a
>      >>> #> [1] TRUE TRUE
>      >>> apply(res, 1, max) < b
>      >>> #> [1] TRUE TRUE
>      >>>
>      >>> # plot histograms of one million 2d vectors
>      >>> system.time(
>      >>>     res1mil <- gen_mat(1e6, a, b, s)
>      >>> )
>      >>> #>    user  system elapsed
>      >>> #>    3.01    0.06    3.86
>      >>>
>      >>> old_par <- par(mfrow = c(1, 2))
>      >>> hist(res1mil[1L,])
>      >>> hist(res1mil[2L,])
>      >>> par(old_par)
>      >>>
>      >>>
>      >>> Hope this helps,
>      >>>
>      >>> Rui Barradas
>      >>>
>      >>> Às 23:31 de 22/04/2025, Rui Barradas escreveu:
>      >>>> Hello,
>      >>>>
>      >>>> Inline.
>      >>>>
>      >>>> Às 17:55 de 22/04/2025, Brian Smith escreveu:
>      >>>>> i.e. we should have
>      >>>>>
>      >>>>> all elements of Reduce("+", res) should be equal to  s =
>      >> 0.05528650577311
>      >>>>>
>      >>>>> My assertion is that it is not happing here.
>      >>>>
>      >>>> You are right, that's not what is happening. The output is n
>     vectors of
>      >>>> 2 elements each. It's each of these vectors that add up to s.
>      >>>> Appparently I misunderstood the problem.
>      >>>>
>      >>>> Maybe this is what you want?
>      >>>> (There is no n argument, the matrix is always 2*m)
>      >>>>
>      >>>>
>      >>>> one_vec <- function(a, b, s) {
>      >>>>     repeat{
>      >>>>       repeat{
>      >>>>         u <- runif(1, a[1], b[1])
>      >>>>         if(s - u > 0) break
>      >>>>       }
>      >>>>       v <- s - u
>      >>>>       if(a[2] < v && v < b[2]) break
>      >>>>     }
>      >>>>     c(u, v)
>      >>>> }
>      >>>> gen_mat <- function(m, a, b, s) {
>      >>>>     replicate(m, one_vec(a, b, s))
>      >>>> }
>      >>>>
>      >>>> set.seed(2025)
>      >>>> res <- gen_mat(10000, a, b, s)
>      >>>> colSums(res)
>      >>>>
>      >>>>
>      >>>> Hope this helps,
>      >>>>
>      >>>> Rui Barradas
>      >>>>
>      >>>>
>      >>>>>
>      >>>>>
>      >>>>> On Tue, 22 Apr 2025 at 22:20, Brian Smith
>     <briansmith199312 using gmail.com <mailto:briansmith199312 using gmail.com>
>      >>>
>      >>>>> wrote:
>      >>>>>>
>      >>>>>> Hi Rui,
>      >>>>>>
>      >>>>>> Thanks for the explanation.
>      >>>>>>
>      >>>>>> But in this case, are we looking at the correct solution at all?
>      >>>>>>
>      >>>>>> My goal is to generate random vector where:
>      >>>>>> 1) the first element is bounded by (a[1], b[1]) and second
>     element is
>      >>>>>> bounded by (a[2], b[2])
>      >>>>>> 2) sum of the element is s
>      >>>>>>
>      >>>>>> According to the outcome,
>      >>>>>> The first matrix values are bounded by c(a[1], b[1]) &
>     second matrix
>      >>>>>> values are bounded by c(a[2], b[2])
>      >>>>>>
>      >>>>>> But,
>      >>>>>> regarding the sum, I think we should have sum (element-wise) sum
>      >>>>>> should be equal to s = 0.05528650577311.
>      >>>>>>
>      >>>>>> How could we achieve that then?
>      >>>>>>
>      >>>>>> On Tue, 22 Apr 2025 at 22:03, Rui Barradas
>     <ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>>
>      >> wrote:
>      >>>>>>>
>      >>>>>>> Às 12:39 de 22/04/2025, Brian Smith escreveu:
>      >>>>>>>> Hi Rui,
>      >>>>>>>>
>      >>>>>>>> Many thanks for your time and insight.
>      >>>>>>>>
>      >>>>>>>> However, I am not sure if I could understand the code.
>     Below is
>      >> what I
>      >>>>>>>> tried based on your code
>      >>>>>>>>
>      >>>>>>>> library(Surrogate)
>      >>>>>>>> a <- c(0.015, 0.005)
>      >>>>>>>> b <- c(0.070, 0.045)
>      >>>>>>>> set.seed(2025)
>      >>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>      >>>>>>>>                   MoreArgs = list(s = 0.05528650577311, n
>     = 2, m =
>      >>>>>>>> 10000), a, b)
>      >>>>>>>>
>      >>>>>>>> res1 = res[[1]]
>      >>>>>>>> res2 = res[[2]]
>      >>>>>>>>
>      >>>>>>>> apply(res1, 1, min) > a   ## [1] TRUE TRUE
>      >>>>>>>> apply(res2, 1, min) > a   ## [1] FALSE  TRUE
>      >>>>>>>>
>      >>>>>>>> I could not understand what basically 2 blocks of res signify?
>      >> Which
>      >>>>>>>> one I should take as final simulation of the vector? If I
>     take the
>      >>>>>>>> first block then the lower bound condition is fulfilled,
>     but not
>      >> with
>      >>>>>>>> the second block. However with the both blocks, the total
>     equals s
>      >> is
>      >>>>>>>> satisfying.
>      >>>>>>>>
>      >>>>>>>> I appreciate your further insight.
>      >>>>>>>>
>      >>>>>>>> Thanks and regards,
>      >>>>>>>>
>      >>>>>>>> On Mon, 21 Apr 2025 at 20:43, Rui Barradas
>     <ruipbarradas using sapo.pt <mailto:ruipbarradas using sapo.pt>>
>      >>>>>>>> wrote:
>      >>>>>>>>>
>      >>>>>>>>> Hello,
>      >>>>>>>>>
>      >>>>>>>>> Inline.
>      >>>>>>>>>
>      >>>>>>>>> Às 16:08 de 21/04/2025, Rui Barradas escreveu:
>      >>>>>>>>>> Às 15:27 de 21/04/2025, Brian Smith escreveu:
>      >>>>>>>>>>> Hi,
>      >>>>>>>>>>>
>      >>>>>>>>>>> There is a function called RandVec in the package Surrogate
>      >>>>>>>>>>> which can
>      >>>>>>>>>>> generate andom vectors (continuous number) with a fixed sum
>      >>>>>>>>>>>
>      >>>>>>>>>>> The help page of this function states that:
>      >>>>>>>>>>>
>      >>>>>>>>>>> a
>      >>>>>>>>>>>
>      >>>>>>>>>>> The function RandVec generates an n by m matrix x. Each
>     of the m
>      >>>>>>>>>>> columns contain n random values lying in the interval
>     [a,b]. The
>      >>>>>>>>>>> argument a specifies the lower limit of the interval.
>     Default 0.
>      >>>>>>>>>>>
>      >>>>>>>>>>> b
>      >>>>>>>>>>>
>      >>>>>>>>>>> The argument b specifies the upper limit of the interval.
>      >>>>>>>>>>> Default 1.
>      >>>>>>>>>>>
>      >>>>>>>>>>> However in my case, the lower and upper limits are not
>     same. For
>      >>>>>>>>>>> example, if I need to draw a pair of number x, y, such
>     that x +
>      >>>>>>>>>>> y = 1,
>      >>>>>>>>>>> then the lower and upper limits are different.
>      >>>>>>>>>>>
>      >>>>>>>>>>> I tried with below code
>      >>>>>>>>>>>
>      >>>>>>>>>>> library(Surrogate)
>      >>>>>>>>>>>
>      >>>>>>>>>>> RandVec(a=c(0.1, 0.2), b=c(0.2, 0.8), s=1, n=2,
>      >> m=5)$RandVecOutput
>      >>>>>>>>>>>
>      >>>>>>>>>>> This generates error with message
>      >>>>>>>>>>>
>      >>>>>>>>>>> Error in if (b - a == 0) { : the condition has length > 1
>      >>>>>>>>>>>
>      >>>>>>>>>>> Is there any way to generate the numbers with different
>     lower
>      >> and
>      >>>>>>>>>>> upper limits?
>      >>>>>>>>>>>
>      >>>>>>>>>>> ______________________________________________
>      >>>>>>>>>>> R-help using r-project.org <mailto:R-help using r-project.org>
>     mailing list -- To UNSUBSCRIBE and more,
>      >> see
>      >>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      >>>>>>>>>>> PLEASE do read the posting guide
>      >> https://www.R-project.org/posting-
>     <https://www.R-project.org/posting->
>      >>>>>>>>>>> guide.html
>      >>>>>>>>>>> and provide commented, minimal, self-contained,
>     reproducible
>      >> code.
>      >>>>>>>>>> Hello,
>      >>>>>>>>>>
>      >>>>>>>>>> Use ?mapply to cycle through all values of a and b.
>      >>>>>>>>>> Note that the output matrices are transposed, the random
>     vectors
>      >>>>>>>>>> are the
>      >>>>>>>>>> rows.
>      >>>>>>>>> Sorry, this is not true. The columns are the random
>     vectors, as
>      >>>>>>>>> documented. An example setting the RNG seed, for
>     reproducibility.
>      >>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>> library(Surrogate)
>      >>>>>>>>>
>      >>>>>>>>> a <- c(0.1, 0.2)
>      >>>>>>>>> b <- c(0.2, 0.8)
>      >>>>>>>>> set.seed(2025)
>      >>>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>      >>>>>>>>>                   MoreArgs = list(s = 1, n = 2, m = 5), a, b)
>      >>>>>>>>>
>      >>>>>>>>> res
>      >>>>>>>>> #> $RandVecOutput
>      >>>>>>>>> #>          [,1]      [,2]      [,3]     [,4]      [,5]
>      >>>>>>>>> #> [1,] 0.146079 0.1649319 0.1413759 0.257086 0.1715478
>      >>>>>>>>> #> [2,] 0.253921 0.2350681 0.2586241 0.142914 0.2284522
>      >>>>>>>>> #>
>      >>>>>>>>> #> $RandVecOutput
>      >>>>>>>>> #>           [,1]      [,2]      [,3]      [,4]      [,5]
>      >>>>>>>>> #> [1,] 0.5930918 0.2154583 0.6915523 0.7167089 0.3617918
>      >>>>>>>>> #> [2,] 0.4069082 0.7845417 0.3084477 0.2832911 0.6382082
>      >>>>>>>>>
>      >>>>>>>>> lapply(res, colSums)
>      >>>>>>>>> #> $RandVecOutput
>      >>>>>>>>> #> [1] 0.4 0.4 0.4 0.4 0.4
>      >>>>>>>>> #>
>      >>>>>>>>> #> $RandVecOutput
>      >>>>>>>>> #> [1] 1 1 1 1 1
>      >>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>> Hope this helps,
>      >>>>>>>>>
>      >>>>>>>>> Rui Barradas
>      >>>>>>>>>>
>      >>>>>>>>>>
>      >>>>>>>>>> library(Surrogate)
>      >>>>>>>>>>
>      >>>>>>>>>> a <- c(0.1, 0.2)
>      >>>>>>>>>> b <- c(0.2, 0.8)
>      >>>>>>>>>> mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>      >>>>>>>>>>            MoreArgs = list(s = 1, n = 2, m = 5), a, b)
>      >>>>>>>>>> #> $RandVecOutput
>      >>>>>>>>>> #>           [,1]      [,2]      [,3]      [,4]      [,5]
>      >>>>>>>>>> #> [1,] 0.2004363 0.1552328 0.2391742 0.1744857 0.1949236
>      >>>>>>>>>> #> [2,] 0.1995637 0.2447672 0.1608258 0.2255143 0.2050764
>      >>>>>>>>>> #>
>      >>>>>>>>>> #> $RandVecOutput
>      >>>>>>>>>> #>           [,1]      [,2]      [,3]      [,4]      [,5]
>      >>>>>>>>>> #> [1,] 0.2157416 0.4691191 0.5067447 0.7749258 0.7728955
>      >>>>>>>>>> #> [2,] 0.7842584 0.5308809 0.4932553 0.2250742 0.2271045
>      >>>>>>>>>>
>      >>>>>>>>>>
>      >>>>>>>>>> Hope this helps,
>      >>>>>>>>>>
>      >>>>>>>>>> Rui Barradas
>      >>>>>>>>>>
>      >>>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>>
>      >>>>>>>>> --
>      >>>>>>>>> Este e-mail foi analisado pelo software antivírus AVG para
>      >>>>>>>>> verificar a presença de vírus.
>      >>>>>>>>> www.avg.com <http://www.avg.com>
>      >>>>>>> Hello,
>      >>>>>>>
>      >>>>>>> The two blocks of res are the two random matrices, one for each
>      >>>>>>> combination of (a,b). mapply passes each of the values in its
>      >> arguments
>      >>>>>>> list (the ellipses in the help page) and computes the anonymous
>      >>>>>>> function
>      >>>>>>> with the pairs (a[1], b[1]), (a[2], b[2]).
>      >>>>>>>
>      >>>>>>> Since a and b are two elements vectors the output res is a two
>      >> members
>      >>>>>>> named list. Your error is to compare the result of
>     apply(res2, 1,
>      >> min)
>      >>>>>>> to a, when you should compare to a[2]. See the code below.
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> library(Surrogate)
>      >>>>>>> a <- c(0.015, 0.005)
>      >>>>>>> b <- c(0.070, 0.045)
>      >>>>>>> set.seed(2025)
>      >>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>      >>>>>>>                  MoreArgs = list(s = 0.05528650577311, n =
>     2, m =
>      >>>>>>> 10000),
>      >>>>>>> a, b)
>      >>>>>>>
>      >>>>>>> res1 = res[[1]]
>      >>>>>>> res2 = res[[2]]
>      >>>>>>>
>      >>>>>>> # first check that the sums are correct
>      >>>>>>> # these sums should be s = 0.05528650577311, up to
>     floating-point
>      >>>>>>> accuracy
>      >>>>>>> lapply(res, \(x) colSums(x[, 1:5]) |> print(digits = 14L))
>      >>>>>>> #> [1] 0.05528650577311 0.05528650577311 0.05528650577311
>      >>>>>>> 0.05528650577311
>      >>>>>>> #> [5] 0.05528650577311
>      >>>>>>> #> [1] 0.05528650577311 0.05528650577311 0.05528650577311
>      >>>>>>> 0.05528650577311
>      >>>>>>> #> [5] 0.05528650577311
>      >>>>>>> #> $RandVecOutput
>      >>>>>>> #> [1] 0.05528651 0.05528651 0.05528651 0.05528651 0.05528651
>      >>>>>>> #>
>      >>>>>>> #> $RandVecOutput
>      >>>>>>> #> [1] 0.05528651 0.05528651 0.05528651 0.05528651 0.05528651
>      >>>>>>>
>      >>>>>>> # now check the min and max
>      >>>>>>> apply(res1, 1, min) > a[1L]   ## [1] TRUE TRUE
>      >>>>>>> #> [1] TRUE TRUE
>      >>>>>>> apply(res2, 1, min) > a[2L]   ## [1] TRUE TRUE
>      >>>>>>> #> [1] TRUE TRUE
>      >>>>>>>
>      >>>>>>> apply(res1, 1, max) < b[1L]   ## [1] TRUE TRUE
>      >>>>>>> #> [1] TRUE TRUE
>      >>>>>>> apply(res2, 1, max) < b[2L]   ## [1] TRUE TRUE
>      >>>>>>> #> [1] TRUE TRUE
>      >>>>>>>
>      >>>>>>>
>      >>>>>>>
>      >>>>>>> Which one should you take as final simulation of the
>     vector? Both.
>      >>>>>>> The first matrix values are bounded by c(a[1], b[1]) with
>     column
>      >> sums
>      >>>>>>> equal to s.
>      >>>>>>> The second matrix values are bounded by c(a[2], b[2]) with
>     column
>      >> sums
>      >>>>>>> also equal to s.
>      >>>>>>>
>      >>>>>>> Hoep this helps,
>      >>>>>>>
>      >>>>>>> Rui Barradas
>      >>>>>>>
>      >>>>
>      >>>> ______________________________________________
>      >>>> R-help using r-project.org <mailto:R-help using r-project.org> mailing
>     list -- To UNSUBSCRIBE and more, see
>      >>>> https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      >>>> PLEASE do read the posting guide
>     https://www.R-project.org/posting- <https://www.R-project.org/posting->
>      >>>> guide.html
>      >>>> and provide commented, minimal, self-contained, reproducible code.
>      >>>
>      >>>
>      >>> --
>      >>> Este e-mail foi analisado pelo software antivírus AVG para
>     verificar a
>      >> presença de vírus.
>      >>> www.avg.com <http://www.avg.com>
>      >>
>      >> ______________________________________________
>      >> R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      >> https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      >> PLEASE do read the posting guide
>      >> https://www.R-project.org/posting-guide.html
>     <https://www.R-project.org/posting-guide.html>
>      >> and provide commented, minimal, self-contained, reproducible code.
>      >>
>      >
>      >       [[alternative HTML version deleted]]
>      >
>      > ______________________________________________
>      > R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>      > https://stat.ethz.ch/mailman/listinfo/r-help
>     <https://stat.ethz.ch/mailman/listinfo/r-help>
>      > PLEASE do read the posting guide
>     https://www.R-project.org/posting-guide.html
>     <https://www.R-project.org/posting-guide.html>
>      > and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list