# [R] A vector of normal distributed values with a sum-to-zero constraint

peter dalgaard pdalgd at gmail.com
Wed Apr 2 09:54:07 CEST 2014

```On 01 Apr 2014, at 17:22 , Rui Barradas <ruipbarradas at sapo.pt> wrote:

> Hello,
>
> One way is to use ?scale.
>

...except that the sd will be less than 0.5 (not obvious at n=1e6, though). However, if you want

- normal distribution
- symmetry
- constant marginal variance of sigma^2
- fixed sum = 0

I don't see any way more straightforward than generating n normal variates and subtracting the mean. The only snag is that the variance of a residual is sigma^2(1-1/n), so generate the original data with a variance of sigma^2/(1-1/n) = n/(n-1) sigma^2

I.e.

x <- rnorm(n,0,0.5*sqrt(n/(n-1)))
x <- x - mean(x)

All of this applies within the boundaries of numerical precision. You're not going to beat the FPU:

> n <- 1e6
> x <- rnorm(n,0,0.5*sqrt(n/(n-1)))
> x <- x - mean(x)
> sum(x)
[1] -1.625718e-11
> mean(x)
[1] -1.452682e-17

The problem of getting a sum or mean of _exactly_ 0 is just not well-defined, since sums and averages depend on the summation order:

> sum(x)
[1] -1.625718e-11
> sum(sample(x))
[1] -1.624851e-11
> sum(sort(x))
[1] -1.508771e-11
> sum(rev(sort(x)))
[1] -1.599831e-11

> set.seed(4867)
> l <- 1000000
> aux <- rnorm(l, 0, 0.5)
> aux <- scale(aux, scale = FALSE)
> sum(aux)
>
> hist(aux, prob = TRUE)
> curve(dnorm(x, 0, 0.5), from = -2, to = 2, add = TRUE)
>
> Hope this helps,
>
>
> Em 01-04-2014 16:01, JLucke at ria.buffalo.edu escreveu:
>> Then what's wrong with centering your initial values around the mean?
>>
>>
>>
>> Marc Marí Dell'Olmo <marceivissa at gmail.com>
>> 04/01/2014 10:56 AM
>>
>> To
>> Boris Steipe <boris.steipe at utoronto.ca>,
>> cc
>> JLucke at ria.buffalo.edu, "r-help at r-project.org" <r-help at r-project.org>
>> Subject
>> Re: [R] A vector of normal distributed values with a sum-to-zero
>> constraint
>>
>>
>>
>>
>>
>>
>> Boris is right. I need this vector to include as initial values of a
>> MCMC process (with openbugs) and If I use this last approach sum(x)
>> could be a large (or extreme) value and can cause problems.
>>
>> The other approach x <- c(x, -x) has the problem that only vectors
>> with even values are obtained.
>>
>> Thank you!
>>
>>
>> 2014-04-01 16:25 GMT+02:00 Boris Steipe <boris.steipe at utoronto.ca>:
>>> But the result is not Normal. Consider:
>>>
>>> set.seed(112358)
>>> N <- 100
>>> x <- rnorm(N-1)
>>> sum(x)
>>>
>>> [1] 1.759446   !!!
>>>
>>> i.e. you have an outlier at 1.7 sigma, and for larger N...
>>>
>>> set.seed(112358)
>>> N <- 10000
>>> x <- rnorm(N-1)
>>> sum(x)
>>> [1] -91.19731
>>>
>>> B.
>>>
>>>
>>> On 2014-04-01, at 10:14 AM, JLucke at ria.buffalo.edu wrote:
>>>
>>>> The sum-to-zero constraint imposes a loss of one degree of freedom.  Of
>>  N samples, only (N-1) can be random.   Thus the solution is
>>>>> N <- 100
>>>>> x <- rnorm(N-1)
>>>>> x <- c(x, -sum(x))
>>>>> sum(x)
>>>> [1] -7.199102e-17
>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Boris Steipe <boris.steipe at utoronto.ca>
>>>> Sent by: r-help-bounces at r-project.org
>>>> 04/01/2014 09:29 AM
>>>>
>>>> To
>>>> Marc Marí Dell'Olmo <marceivissa at gmail.com>,
>>>> cc
>>>> "r-help at r-project.org" <r-help at r-project.org>
>>>> Subject
>>>> Re: [R] A vector of normal distributed values with a sum-to-zero
>> constraint
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Make a copy with opposite sign. This is Normal, symmetric, but no
>> longer random.
>>>>
>>>>  set.seed(112358)
>>>>  x <- rnorm(5000, 0, 0.5)
>>>>  x <- c(x, -x)
>>>>  sum(x)
>>>>  hist(x)
>>>>
>>>> B.
>>>>
>>>> On 2014-04-01, at 8:56 AM, Marc Marí Dell'Olmo wrote:
>>>>
>>>>> Dear all,
>>>>>
>>>>> Anyone knows how to generate a vector of Normal distributed values
>>>>> (for example N(0,0.5)), but with a sum-to-zero constraint??
>>>>>
>>>>> The sum would be exactly zero, without decimals.
>>>>>
>>>>>
>>>>>> l <- 1000000
>>>>>> aux <- rnorm(l,0,0.5)
>>>>>> s <- sum(aux)/l
>>>>>> aux2 <- aux-s
>>>>>> sum(aux2)
>>>>> [1] -0.000000000006131392
>>>>>>
>>>>>> aux[1]<- -sum(aux[2:l])
>>>>>> sum(aux)
>>>>> [1] -0.00000000000003530422
>>>>>
>>>>>
>>>>> but the sum is not exactly zero and not all parameters are N(0,0.5)
>>>>> distributed...
>>>>>
>>>>> Perhaps is obvious but I can't find the way to do it..
>>>>>
>>>>> Thank you very much!
>>>>>
>>>>> Marc
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

```