[R] Generate random vectors (continuous number) with a fixed sum

Sat Apr 26 19:01:36 CEST 2025

On 2025-04-24 3:25 p.m., Bert Gunter wrote:
> Folks:
> 
> Unless my wee old brain errs (always a danger), uniform sampling from an
> n-vector <xi> for which 0 <= ai <= xi <= bi and SUM(xi) = k,  a constant,
> where to ensure that the constraints are consistent (and nontrivial),
> SUM(ai)< k and SUM(bi) > k, is a simple linear transformation (details left
> to the reader) of uniform sampling from a standard n-1 dim simplex, for
> which a web search yielded:
> https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex

I think for n > 2 it is sometimes a simplex, and sometimes a different 
shape.  For example, with n = 3 and all a_i = 0 and all b_i = 1, it's a 
simplex if k < 1 or k > 2, but not for 1 < k < 2, where it's a hexagon.

You can visualize this with the following code:

x <- matrix(runif(30000), ncol = 3) # the full cube
x0 <- x[1.45 < sums & sums < 1.55,] # points near k = 1.5
rgl::plot3d(x0)

> The proposed "solutions" in this thread which rejection sample sequentially
> from uniforms do *not* produce a uniform distribution on the simplex, as
> the link and references therein explain.

My solution didn't do rejection sampling, but it won't give a uniform 
density.  I didn't realize that was requested.

> 
> Apologies if I have misunderstood the problem and proposed solutions,
> although there does seem to be some confusion in the thread about exactly
> what was sought.

Yes, that's true.

Duncan Murdoch

> 
> Cheers,
> Bert
> 
> "An educated person is one who can entertain new ideas, entertain others,
> and entertain herself."
> 
> 
> 
> On Thu, Apr 24, 2025 at 10:33 AM Brian Smith <briansmith199312 using gmail.com>
> wrote:
> 
>> Hi Rui,
>>
>> This code is able to generate absolutely correct random sample vector
>> based on the applicable constraints.
>>
>> I have one question though.
>>
>> If I understood the R code correctly then, the first element is
>> drawing random number from Uniform distribution unconditionally,
>> however drawing of sample point for the second element is conditional
>> to the first one. Therefore if we have large vector size instead of
>> current 2, I guess the feasible region for the last few elements will
>> be very small.
>>
>> Will that be any problem? does there any algorithm exist where all
>> (n-1) elements would be drawn unconditionally assuming our vector has
>> n elements?
>>
>> Thanks and regards,
>>
>>
>>
>> On Wed, 23 Apr 2025 at 10:57, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>>
>>> Hello,
>>>
>>> Here are your tests and the random numbers' histograms.
>>>
>>>
>>> one_vec <- function(a, b, s) {
>>>     repeat{
>>>       repeat{
>>>         u <- runif(1, a[1], b[1])
>>>         if(s - u > 0) break
>>>       }
>>>       v <- s - u
>>>       if(a[2] < v && v < b[2]) break
>>>     }
>>>     c(u, v)
>>> }
>>> gen_mat <- function(m, a, b, s) {
>>>     replicate(m, one_vec(a, b, s))
>>> }
>>>
>>> a <- c(0.015, 0.005)
>>> b <- c(0.070, 0.045)
>>> s <- 0.05528650577311
>>> m <- 10000L
>>>
>>> set.seed(2025)
>>> res <- gen_mat(m, a, b, s)
>>> apply(res, 1, min) > a
>>> #> [1] TRUE TRUE
>>> apply(res, 1, max) < b
>>> #> [1] TRUE TRUE
>>>
>>> # plot histograms of one million 2d vectors
>>> system.time(
>>>     res1mil <- gen_mat(1e6, a, b, s)
>>> )
>>> #>    user  system elapsed
>>> #>    3.01    0.06    3.86
>>>
>>> old_par <- par(mfrow = c(1, 2))
>>> hist(res1mil[1L,])
>>> hist(res1mil[2L,])
>>> par(old_par)
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>> Às 23:31 de 22/04/2025, Rui Barradas escreveu:
>>>> Hello,
>>>>
>>>> Inline.
>>>>
>>>> Às 17:55 de 22/04/2025, Brian Smith escreveu:
>>>>> i.e. we should have
>>>>>
>>>>> all elements of Reduce("+", res) should be equal to  s =
>> 0.05528650577311
>>>>>
>>>>> My assertion is that it is not happing here.
>>>>
>>>> You are right, that's not what is happening. The output is n vectors of
>>>> 2 elements each. It's each of these vectors that add up to s.
>>>> Appparently I misunderstood the problem.
>>>>
>>>> Maybe this is what you want?
>>>> (There is no n argument, the matrix is always 2*m)
>>>>
>>>>
>>>> one_vec <- function(a, b, s) {
>>>>     repeat{
>>>>       repeat{
>>>>         u <- runif(1, a[1], b[1])
>>>>         if(s - u > 0) break
>>>>       }
>>>>       v <- s - u
>>>>       if(a[2] < v && v < b[2]) break
>>>>     }
>>>>     c(u, v)
>>>> }
>>>> gen_mat <- function(m, a, b, s) {
>>>>     replicate(m, one_vec(a, b, s))
>>>> }
>>>>
>>>> set.seed(2025)
>>>> res <- gen_mat(10000, a, b, s)
>>>> colSums(res)
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Tue, 22 Apr 2025 at 22:20, Brian Smith <briansmith199312 using gmail.com
>>>
>>>>> wrote:
>>>>>>
>>>>>> Hi Rui,
>>>>>>
>>>>>> Thanks for the explanation.
>>>>>>
>>>>>> But in this case, are we looking at the correct solution at all?
>>>>>>
>>>>>> My goal is to generate random vector where:
>>>>>> 1) the first element is bounded by (a[1], b[1]) and second element is
>>>>>> bounded by (a[2], b[2])
>>>>>> 2) sum of the element is s
>>>>>>
>>>>>> According to the outcome,
>>>>>> The first matrix values are bounded by c(a[1], b[1]) & second matrix
>>>>>> values are bounded by c(a[2], b[2])
>>>>>>
>>>>>> But,
>>>>>> regarding the sum, I think we should have sum (element-wise) sum
>>>>>> should be equal to s = 0.05528650577311.
>>>>>>
>>>>>> How could we achieve that then?
>>>>>>
>>>>>> On Tue, 22 Apr 2025 at 22:03, Rui Barradas <ruipbarradas using sapo.pt>
>> wrote:
>>>>>>>
>>>>>>> Às 12:39 de 22/04/2025, Brian Smith escreveu:
>>>>>>>> Hi Rui,
>>>>>>>>
>>>>>>>> Many thanks for your time and insight.
>>>>>>>>
>>>>>>>> However, I am not sure if I could understand the code. Below is
>> what I
>>>>>>>> tried based on your code
>>>>>>>>
>>>>>>>> library(Surrogate)
>>>>>>>> a <- c(0.015, 0.005)
>>>>>>>> b <- c(0.070, 0.045)
>>>>>>>> set.seed(2025)
>>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>>                   MoreArgs = list(s = 0.05528650577311, n = 2, m =
>>>>>>>> 10000), a, b)
>>>>>>>>
>>>>>>>> res1 = res[[1]]
>>>>>>>> res2 = res[[2]]
>>>>>>>>
>>>>>>>> apply(res1, 1, min) > a   ## [1] TRUE TRUE
>>>>>>>> apply(res2, 1, min) > a   ## [1] FALSE  TRUE
>>>>>>>>
>>>>>>>> I could not understand what basically 2 blocks of res signify?
>> Which
>>>>>>>> one I should take as final simulation of the vector? If I take the
>>>>>>>> first block then the lower bound condition is fulfilled, but not
>> with
>>>>>>>> the second block. However with the both blocks, the total equals s
>> is
>>>>>>>> satisfying.
>>>>>>>>
>>>>>>>> I appreciate your further insight.
>>>>>>>>
>>>>>>>> Thanks and regards,
>>>>>>>>
>>>>>>>> On Mon, 21 Apr 2025 at 20:43, Rui Barradas <ruipbarradas using sapo.pt>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> Inline.
>>>>>>>>>
>>>>>>>>> Às 16:08 de 21/04/2025, Rui Barradas escreveu:
>>>>>>>>>> Às 15:27 de 21/04/2025, Brian Smith escreveu:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> There is a function called RandVec in the package Surrogate
>>>>>>>>>>> which can
>>>>>>>>>>> generate andom vectors (continuous number) with a fixed sum
>>>>>>>>>>>
>>>>>>>>>>> The help page of this function states that:
>>>>>>>>>>>
>>>>>>>>>>> a
>>>>>>>>>>>
>>>>>>>>>>> The function RandVec generates an n by m matrix x. Each of the m
>>>>>>>>>>> columns contain n random values lying in the interval [a,b]. The
>>>>>>>>>>> argument a specifies the lower limit of the interval. Default 0.
>>>>>>>>>>>
>>>>>>>>>>> b
>>>>>>>>>>>
>>>>>>>>>>> The argument b specifies the upper limit of the interval.
>>>>>>>>>>> Default 1.
>>>>>>>>>>>
>>>>>>>>>>> However in my case, the lower and upper limits are not same. For
>>>>>>>>>>> example, if I need to draw a pair of number x, y, such that x +
>>>>>>>>>>> y = 1,
>>>>>>>>>>> then the lower and upper limits are different.
>>>>>>>>>>>
>>>>>>>>>>> I tried with below code
>>>>>>>>>>>
>>>>>>>>>>> library(Surrogate)
>>>>>>>>>>>
>>>>>>>>>>> RandVec(a=c(0.1, 0.2), b=c(0.2, 0.8), s=1, n=2,
>> m=5)$RandVecOutput
>>>>>>>>>>>
>>>>>>>>>>> This generates error with message
>>>>>>>>>>>
>>>>>>>>>>> Error in if (b - a == 0) { : the condition has length > 1
>>>>>>>>>>>
>>>>>>>>>>> Is there any way to generate the numbers with different lower
>> and
>>>>>>>>>>> upper limits?
>>>>>>>>>>>
>>>>>>>>>>> ______________________________________________
>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>> see
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>> https://www.R-project.org/posting-
>>>>>>>>>>> guide.html
>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Use ?mapply to cycle through all values of a and b.
>>>>>>>>>> Note that the output matrices are transposed, the random vectors
>>>>>>>>>> are the
>>>>>>>>>> rows.
>>>>>>>>> Sorry, this is not true. The columns are the random vectors, as
>>>>>>>>> documented. An example setting the RNG seed, for reproducibility.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> library(Surrogate)
>>>>>>>>>
>>>>>>>>> a <- c(0.1, 0.2)
>>>>>>>>> b <- c(0.2, 0.8)
>>>>>>>>> set.seed(2025)
>>>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>>>                   MoreArgs = list(s = 1, n = 2, m = 5), a, b)
>>>>>>>>>
>>>>>>>>> res
>>>>>>>>> #> $RandVecOutput
>>>>>>>>> #>          [,1]      [,2]      [,3]     [,4]      [,5]
>>>>>>>>> #> [1,] 0.146079 0.1649319 0.1413759 0.257086 0.1715478
>>>>>>>>> #> [2,] 0.253921 0.2350681 0.2586241 0.142914 0.2284522
>>>>>>>>> #>
>>>>>>>>> #> $RandVecOutput
>>>>>>>>> #>           [,1]      [,2]      [,3]      [,4]      [,5]
>>>>>>>>> #> [1,] 0.5930918 0.2154583 0.6915523 0.7167089 0.3617918
>>>>>>>>> #> [2,] 0.4069082 0.7845417 0.3084477 0.2832911 0.6382082
>>>>>>>>>
>>>>>>>>> lapply(res, colSums)
>>>>>>>>> #> $RandVecOutput
>>>>>>>>> #> [1] 0.4 0.4 0.4 0.4 0.4
>>>>>>>>> #>
>>>>>>>>> #> $RandVecOutput
>>>>>>>>> #> [1] 1 1 1 1 1
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Rui Barradas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> library(Surrogate)
>>>>>>>>>>
>>>>>>>>>> a <- c(0.1, 0.2)
>>>>>>>>>> b <- c(0.2, 0.8)
>>>>>>>>>> mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>>>>            MoreArgs = list(s = 1, n = 2, m = 5), a, b)
>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>> #>           [,1]      [,2]      [,3]      [,4]      [,5]
>>>>>>>>>> #> [1,] 0.2004363 0.1552328 0.2391742 0.1744857 0.1949236
>>>>>>>>>> #> [2,] 0.1995637 0.2447672 0.1608258 0.2255143 0.2050764
>>>>>>>>>> #>
>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>> #>           [,1]      [,2]      [,3]      [,4]      [,5]
>>>>>>>>>> #> [1,] 0.2157416 0.4691191 0.5067447 0.7749258 0.7728955
>>>>>>>>>> #> [2,] 0.7842584 0.5308809 0.4932553 0.2250742 0.2271045
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rui Barradas
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Este e-mail foi analisado pelo software antivírus AVG para
>>>>>>>>> verificar a presença de vírus.
>>>>>>>>> www.avg.com
>>>>>>> Hello,
>>>>>>>
>>>>>>> The two blocks of res are the two random matrices, one for each
>>>>>>> combination of (a,b). mapply passes each of the values in its
>> arguments
>>>>>>> list (the ellipses in the help page) and computes the anonymous
>>>>>>> function
>>>>>>> with the pairs (a[1], b[1]), (a[2], b[2]).
>>>>>>>
>>>>>>> Since a and b are two elements vectors the output res is a two
>> members
>>>>>>> named list. Your error is to compare the result of apply(res2, 1,
>> min)
>>>>>>> to a, when you should compare to a[2]. See the code below.
>>>>>>>
>>>>>>>
>>>>>>> library(Surrogate)
>>>>>>> a <- c(0.015, 0.005)
>>>>>>> b <- c(0.070, 0.045)
>>>>>>> set.seed(2025)
>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>                  MoreArgs = list(s = 0.05528650577311, n = 2, m =
>>>>>>> 10000),
>>>>>>> a, b)
>>>>>>>
>>>>>>> res1 = res[[1]]
>>>>>>> res2 = res[[2]]
>>>>>>>
>>>>>>> # first check that the sums are correct
>>>>>>> # these sums should be s = 0.05528650577311, up to floating-point
>>>>>>> accuracy
>>>>>>> lapply(res, \(x) colSums(x[, 1:5]) |> print(digits = 14L))
>>>>>>> #> [1] 0.05528650577311 0.05528650577311 0.05528650577311
>>>>>>> 0.05528650577311
>>>>>>> #> [5] 0.05528650577311
>>>>>>> #> [1] 0.05528650577311 0.05528650577311 0.05528650577311
>>>>>>> 0.05528650577311
>>>>>>> #> [5] 0.05528650577311
>>>>>>> #> $RandVecOutput
>>>>>>> #> [1] 0.05528651 0.05528651 0.05528651 0.05528651 0.05528651
>>>>>>> #>
>>>>>>> #> $RandVecOutput
>>>>>>> #> [1] 0.05528651 0.05528651 0.05528651 0.05528651 0.05528651
>>>>>>>
>>>>>>> # now check the min and max
>>>>>>> apply(res1, 1, min) > a[1L]   ## [1] TRUE TRUE
>>>>>>> #> [1] TRUE TRUE
>>>>>>> apply(res2, 1, min) > a[2L]   ## [1] TRUE TRUE
>>>>>>> #> [1] TRUE TRUE
>>>>>>>
>>>>>>> apply(res1, 1, max) < b[1L]   ## [1] TRUE TRUE
>>>>>>> #> [1] TRUE TRUE
>>>>>>> apply(res2, 1, max) < b[2L]   ## [1] TRUE TRUE
>>>>>>> #> [1] TRUE TRUE
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Which one should you take as final simulation of the vector? Both.
>>>>>>> The first matrix values are bounded by c(a[1], b[1]) with column
>> sums
>>>>>>> equal to s.
>>>>>>> The second matrix values are bounded by c(a[2], b[2]) with column
>> sums
>>>>>>> also equal to s.
>>>>>>>
>>>>>>> Hoep this helps,
>>>>>>>
>>>>>>> Rui Barradas
>>>>>>>
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide https://www.R-project.org/posting-
>>>> guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>> --
>>> Este e-mail foi analisado pelo software antivírus AVG para verificar a
>> presença de vírus.
>>> www.avg.com
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.