[R] Generate random vectors (continuous number) with a fixed sum
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Sat Apr 26 19:04:04 CEST 2025
And I missed copying one line:
sums <- rowSums(x)
should come between the calculations of x and x0.
Duncan Murdoch
On 2025-04-26 1:01 p.m., Duncan Murdoch wrote:
> On 2025-04-24 3:25 p.m., Bert Gunter wrote:
>> Folks:
>>
>> Unless my wee old brain errs (always a danger), uniform sampling from an
>> n-vector <xi> for which 0 <= ai <= xi <= bi and SUM(xi) = k, a constant,
>> where to ensure that the constraints are consistent (and nontrivial),
>> SUM(ai)< k and SUM(bi) > k, is a simple linear transformation (details left
>> to the reader) of uniform sampling from a standard n-1 dim simplex, for
>> which a web search yielded:
>> https://cs.stackexchange.com/questions/3227/uniform-sampling-from-a-simplex
>
> I think for n > 2 it is sometimes a simplex, and sometimes a different
> shape. For example, with n = 3 and all a_i = 0 and all b_i = 1, it's a
> simplex if k < 1 or k > 2, but not for 1 < k < 2, where it's a hexagon.
>
> You can visualize this with the following code:
>
> x <- matrix(runif(30000), ncol = 3) # the full cube
> x0 <- x[1.45 < sums & sums < 1.55,] # points near k = 1.5
> rgl::plot3d(x0)
>
>
>> The proposed "solutions" in this thread which rejection sample sequentially
>> from uniforms do *not* produce a uniform distribution on the simplex, as
>> the link and references therein explain.
>
> My solution didn't do rejection sampling, but it won't give a uniform
> density. I didn't realize that was requested.
>
>>
>> Apologies if I have misunderstood the problem and proposed solutions,
>> although there does seem to be some confusion in the thread about exactly
>> what was sought.
>
> Yes, that's true.
>
> Duncan Murdoch
>
>>
>> Cheers,
>> Bert
>>
>> "An educated person is one who can entertain new ideas, entertain others,
>> and entertain herself."
>>
>>
>>
>> On Thu, Apr 24, 2025 at 10:33 AM Brian Smith <briansmith199312 using gmail.com>
>> wrote:
>>
>>> Hi Rui,
>>>
>>> This code is able to generate absolutely correct random sample vector
>>> based on the applicable constraints.
>>>
>>> I have one question though.
>>>
>>> If I understood the R code correctly then, the first element is
>>> drawing random number from Uniform distribution unconditionally,
>>> however drawing of sample point for the second element is conditional
>>> to the first one. Therefore if we have large vector size instead of
>>> current 2, I guess the feasible region for the last few elements will
>>> be very small.
>>>
>>> Will that be any problem? does there any algorithm exist where all
>>> (n-1) elements would be drawn unconditionally assuming our vector has
>>> n elements?
>>>
>>> Thanks and regards,
>>>
>>>
>>>
>>> On Wed, 23 Apr 2025 at 10:57, Rui Barradas <ruipbarradas using sapo.pt> wrote:
>>>>
>>>> Hello,
>>>>
>>>> Here are your tests and the random numbers' histograms.
>>>>
>>>>
>>>> one_vec <- function(a, b, s) {
>>>> repeat{
>>>> repeat{
>>>> u <- runif(1, a[1], b[1])
>>>> if(s - u > 0) break
>>>> }
>>>> v <- s - u
>>>> if(a[2] < v && v < b[2]) break
>>>> }
>>>> c(u, v)
>>>> }
>>>> gen_mat <- function(m, a, b, s) {
>>>> replicate(m, one_vec(a, b, s))
>>>> }
>>>>
>>>> a <- c(0.015, 0.005)
>>>> b <- c(0.070, 0.045)
>>>> s <- 0.05528650577311
>>>> m <- 10000L
>>>>
>>>> set.seed(2025)
>>>> res <- gen_mat(m, a, b, s)
>>>> apply(res, 1, min) > a
>>>> #> [1] TRUE TRUE
>>>> apply(res, 1, max) < b
>>>> #> [1] TRUE TRUE
>>>>
>>>> # plot histograms of one million 2d vectors
>>>> system.time(
>>>> res1mil <- gen_mat(1e6, a, b, s)
>>>> )
>>>> #> user system elapsed
>>>> #> 3.01 0.06 3.86
>>>>
>>>> old_par <- par(mfrow = c(1, 2))
>>>> hist(res1mil[1L,])
>>>> hist(res1mil[2L,])
>>>> par(old_par)
>>>>
>>>>
>>>> Hope this helps,
>>>>
>>>> Rui Barradas
>>>>
>>>> Às 23:31 de 22/04/2025, Rui Barradas escreveu:
>>>>> Hello,
>>>>>
>>>>> Inline.
>>>>>
>>>>> Às 17:55 de 22/04/2025, Brian Smith escreveu:
>>>>>> i.e. we should have
>>>>>>
>>>>>> all elements of Reduce("+", res) should be equal to s =
>>> 0.05528650577311
>>>>>>
>>>>>> My assertion is that it is not happing here.
>>>>>
>>>>> You are right, that's not what is happening. The output is n vectors of
>>>>> 2 elements each. It's each of these vectors that add up to s.
>>>>> Appparently I misunderstood the problem.
>>>>>
>>>>> Maybe this is what you want?
>>>>> (There is no n argument, the matrix is always 2*m)
>>>>>
>>>>>
>>>>> one_vec <- function(a, b, s) {
>>>>> repeat{
>>>>> repeat{
>>>>> u <- runif(1, a[1], b[1])
>>>>> if(s - u > 0) break
>>>>> }
>>>>> v <- s - u
>>>>> if(a[2] < v && v < b[2]) break
>>>>> }
>>>>> c(u, v)
>>>>> }
>>>>> gen_mat <- function(m, a, b, s) {
>>>>> replicate(m, one_vec(a, b, s))
>>>>> }
>>>>>
>>>>> set.seed(2025)
>>>>> res <- gen_mat(10000, a, b, s)
>>>>> colSums(res)
>>>>>
>>>>>
>>>>> Hope this helps,
>>>>>
>>>>> Rui Barradas
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 22 Apr 2025 at 22:20, Brian Smith <briansmith199312 using gmail.com
>>>>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi Rui,
>>>>>>>
>>>>>>> Thanks for the explanation.
>>>>>>>
>>>>>>> But in this case, are we looking at the correct solution at all?
>>>>>>>
>>>>>>> My goal is to generate random vector where:
>>>>>>> 1) the first element is bounded by (a[1], b[1]) and second element is
>>>>>>> bounded by (a[2], b[2])
>>>>>>> 2) sum of the element is s
>>>>>>>
>>>>>>> According to the outcome,
>>>>>>> The first matrix values are bounded by c(a[1], b[1]) & second matrix
>>>>>>> values are bounded by c(a[2], b[2])
>>>>>>>
>>>>>>> But,
>>>>>>> regarding the sum, I think we should have sum (element-wise) sum
>>>>>>> should be equal to s = 0.05528650577311.
>>>>>>>
>>>>>>> How could we achieve that then?
>>>>>>>
>>>>>>> On Tue, 22 Apr 2025 at 22:03, Rui Barradas <ruipbarradas using sapo.pt>
>>> wrote:
>>>>>>>>
>>>>>>>> Às 12:39 de 22/04/2025, Brian Smith escreveu:
>>>>>>>>> Hi Rui,
>>>>>>>>>
>>>>>>>>> Many thanks for your time and insight.
>>>>>>>>>
>>>>>>>>> However, I am not sure if I could understand the code. Below is
>>> what I
>>>>>>>>> tried based on your code
>>>>>>>>>
>>>>>>>>> library(Surrogate)
>>>>>>>>> a <- c(0.015, 0.005)
>>>>>>>>> b <- c(0.070, 0.045)
>>>>>>>>> set.seed(2025)
>>>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>>> MoreArgs = list(s = 0.05528650577311, n = 2, m =
>>>>>>>>> 10000), a, b)
>>>>>>>>>
>>>>>>>>> res1 = res[[1]]
>>>>>>>>> res2 = res[[2]]
>>>>>>>>>
>>>>>>>>> apply(res1, 1, min) > a ## [1] TRUE TRUE
>>>>>>>>> apply(res2, 1, min) > a ## [1] FALSE TRUE
>>>>>>>>>
>>>>>>>>> I could not understand what basically 2 blocks of res signify?
>>> Which
>>>>>>>>> one I should take as final simulation of the vector? If I take the
>>>>>>>>> first block then the lower bound condition is fulfilled, but not
>>> with
>>>>>>>>> the second block. However with the both blocks, the total equals s
>>> is
>>>>>>>>> satisfying.
>>>>>>>>>
>>>>>>>>> I appreciate your further insight.
>>>>>>>>>
>>>>>>>>> Thanks and regards,
>>>>>>>>>
>>>>>>>>> On Mon, 21 Apr 2025 at 20:43, Rui Barradas <ruipbarradas using sapo.pt>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> Inline.
>>>>>>>>>>
>>>>>>>>>> Às 16:08 de 21/04/2025, Rui Barradas escreveu:
>>>>>>>>>>> Às 15:27 de 21/04/2025, Brian Smith escreveu:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> There is a function called RandVec in the package Surrogate
>>>>>>>>>>>> which can
>>>>>>>>>>>> generate andom vectors (continuous number) with a fixed sum
>>>>>>>>>>>>
>>>>>>>>>>>> The help page of this function states that:
>>>>>>>>>>>>
>>>>>>>>>>>> a
>>>>>>>>>>>>
>>>>>>>>>>>> The function RandVec generates an n by m matrix x. Each of the m
>>>>>>>>>>>> columns contain n random values lying in the interval [a,b]. The
>>>>>>>>>>>> argument a specifies the lower limit of the interval. Default 0.
>>>>>>>>>>>>
>>>>>>>>>>>> b
>>>>>>>>>>>>
>>>>>>>>>>>> The argument b specifies the upper limit of the interval.
>>>>>>>>>>>> Default 1.
>>>>>>>>>>>>
>>>>>>>>>>>> However in my case, the lower and upper limits are not same. For
>>>>>>>>>>>> example, if I need to draw a pair of number x, y, such that x +
>>>>>>>>>>>> y = 1,
>>>>>>>>>>>> then the lower and upper limits are different.
>>>>>>>>>>>>
>>>>>>>>>>>> I tried with below code
>>>>>>>>>>>>
>>>>>>>>>>>> library(Surrogate)
>>>>>>>>>>>>
>>>>>>>>>>>> RandVec(a=c(0.1, 0.2), b=c(0.2, 0.8), s=1, n=2,
>>> m=5)$RandVecOutput
>>>>>>>>>>>>
>>>>>>>>>>>> This generates error with message
>>>>>>>>>>>>
>>>>>>>>>>>> Error in if (b - a == 0) { : the condition has length > 1
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any way to generate the numbers with different lower
>>> and
>>>>>>>>>>>> upper limits?
>>>>>>>>>>>>
>>>>>>>>>>>> ______________________________________________
>>>>>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
>>> see
>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-
>>>>>>>>>>>> guide.html
>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>> code.
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> Use ?mapply to cycle through all values of a and b.
>>>>>>>>>>> Note that the output matrices are transposed, the random vectors
>>>>>>>>>>> are the
>>>>>>>>>>> rows.
>>>>>>>>>> Sorry, this is not true. The columns are the random vectors, as
>>>>>>>>>> documented. An example setting the RNG seed, for reproducibility.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> library(Surrogate)
>>>>>>>>>>
>>>>>>>>>> a <- c(0.1, 0.2)
>>>>>>>>>> b <- c(0.2, 0.8)
>>>>>>>>>> set.seed(2025)
>>>>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>>>> MoreArgs = list(s = 1, n = 2, m = 5), a, b)
>>>>>>>>>>
>>>>>>>>>> res
>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>> #> [,1] [,2] [,3] [,4] [,5]
>>>>>>>>>> #> [1,] 0.146079 0.1649319 0.1413759 0.257086 0.1715478
>>>>>>>>>> #> [2,] 0.253921 0.2350681 0.2586241 0.142914 0.2284522
>>>>>>>>>> #>
>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>> #> [,1] [,2] [,3] [,4] [,5]
>>>>>>>>>> #> [1,] 0.5930918 0.2154583 0.6915523 0.7167089 0.3617918
>>>>>>>>>> #> [2,] 0.4069082 0.7845417 0.3084477 0.2832911 0.6382082
>>>>>>>>>>
>>>>>>>>>> lapply(res, colSums)
>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>> #> [1] 0.4 0.4 0.4 0.4 0.4
>>>>>>>>>> #>
>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>> #> [1] 1 1 1 1 1
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> Rui Barradas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> library(Surrogate)
>>>>>>>>>>>
>>>>>>>>>>> a <- c(0.1, 0.2)
>>>>>>>>>>> b <- c(0.2, 0.8)
>>>>>>>>>>> mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>>>>> MoreArgs = list(s = 1, n = 2, m = 5), a, b)
>>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>>> #> [,1] [,2] [,3] [,4] [,5]
>>>>>>>>>>> #> [1,] 0.2004363 0.1552328 0.2391742 0.1744857 0.1949236
>>>>>>>>>>> #> [2,] 0.1995637 0.2447672 0.1608258 0.2255143 0.2050764
>>>>>>>>>>> #>
>>>>>>>>>>> #> $RandVecOutput
>>>>>>>>>>> #> [,1] [,2] [,3] [,4] [,5]
>>>>>>>>>>> #> [1,] 0.2157416 0.4691191 0.5067447 0.7749258 0.7728955
>>>>>>>>>>> #> [2,] 0.7842584 0.5308809 0.4932553 0.2250742 0.2271045
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hope this helps,
>>>>>>>>>>>
>>>>>>>>>>> Rui Barradas
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Este e-mail foi analisado pelo software antivírus AVG para
>>>>>>>>>> verificar a presença de vírus.
>>>>>>>>>> www.avg.com
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> The two blocks of res are the two random matrices, one for each
>>>>>>>> combination of (a,b). mapply passes each of the values in its
>>> arguments
>>>>>>>> list (the ellipses in the help page) and computes the anonymous
>>>>>>>> function
>>>>>>>> with the pairs (a[1], b[1]), (a[2], b[2]).
>>>>>>>>
>>>>>>>> Since a and b are two elements vectors the output res is a two
>>> members
>>>>>>>> named list. Your error is to compare the result of apply(res2, 1,
>>> min)
>>>>>>>> to a, when you should compare to a[2]. See the code below.
>>>>>>>>
>>>>>>>>
>>>>>>>> library(Surrogate)
>>>>>>>> a <- c(0.015, 0.005)
>>>>>>>> b <- c(0.070, 0.045)
>>>>>>>> set.seed(2025)
>>>>>>>> res <- mapply(\(a, b, s, n, m) RandVec(a, b, s, n, m),
>>>>>>>> MoreArgs = list(s = 0.05528650577311, n = 2, m =
>>>>>>>> 10000),
>>>>>>>> a, b)
>>>>>>>>
>>>>>>>> res1 = res[[1]]
>>>>>>>> res2 = res[[2]]
>>>>>>>>
>>>>>>>> # first check that the sums are correct
>>>>>>>> # these sums should be s = 0.05528650577311, up to floating-point
>>>>>>>> accuracy
>>>>>>>> lapply(res, \(x) colSums(x[, 1:5]) |> print(digits = 14L))
>>>>>>>> #> [1] 0.05528650577311 0.05528650577311 0.05528650577311
>>>>>>>> 0.05528650577311
>>>>>>>> #> [5] 0.05528650577311
>>>>>>>> #> [1] 0.05528650577311 0.05528650577311 0.05528650577311
>>>>>>>> 0.05528650577311
>>>>>>>> #> [5] 0.05528650577311
>>>>>>>> #> $RandVecOutput
>>>>>>>> #> [1] 0.05528651 0.05528651 0.05528651 0.05528651 0.05528651
>>>>>>>> #>
>>>>>>>> #> $RandVecOutput
>>>>>>>> #> [1] 0.05528651 0.05528651 0.05528651 0.05528651 0.05528651
>>>>>>>>
>>>>>>>> # now check the min and max
>>>>>>>> apply(res1, 1, min) > a[1L] ## [1] TRUE TRUE
>>>>>>>> #> [1] TRUE TRUE
>>>>>>>> apply(res2, 1, min) > a[2L] ## [1] TRUE TRUE
>>>>>>>> #> [1] TRUE TRUE
>>>>>>>>
>>>>>>>> apply(res1, 1, max) < b[1L] ## [1] TRUE TRUE
>>>>>>>> #> [1] TRUE TRUE
>>>>>>>> apply(res2, 1, max) < b[2L] ## [1] TRUE TRUE
>>>>>>>> #> [1] TRUE TRUE
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Which one should you take as final simulation of the vector? Both.
>>>>>>>> The first matrix values are bounded by c(a[1], b[1]) with column
>>> sums
>>>>>>>> equal to s.
>>>>>>>> The second matrix values are bounded by c(a[2], b[2]) with column
>>> sums
>>>>>>>> also equal to s.
>>>>>>>>
>>>>>>>> Hoep this helps,
>>>>>>>>
>>>>>>>> Rui Barradas
>>>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide https://www.R-project.org/posting-
>>>>> guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> --
>>>> Este e-mail foi analisado pelo software antivírus AVG para verificar a
>>> presença de vírus.
>>>> www.avg.com
>>>
>>> ______________________________________________
>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> https://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list