[R] A vector of normal distributed values with a sum-to-zero constraint

peter dalgaard pdalgd at gmail.com
Wed Apr 2 09:54:07 CEST 2014


On 01 Apr 2014, at 17:22 , Rui Barradas <ruipbarradas at sapo.pt> wrote:

> Hello,
> 
> One way is to use ?scale.
> 

...except that the sd will be less than 0.5 (not obvious at n=1e6, though). However, if you want

- normal distribution
- symmetry
- constant marginal variance of sigma^2
- fixed sum = 0 

I don't see any way more straightforward than generating n normal variates and subtracting the mean. The only snag is that the variance of a residual is sigma^2(1-1/n), so generate the original data with a variance of sigma^2/(1-1/n) = n/(n-1) sigma^2

I.e.

x <- rnorm(n,0,0.5*sqrt(n/(n-1)))
x <- x - mean(x) 

All of this applies within the boundaries of numerical precision. You're not going to beat the FPU:

> n <- 1e6
> x <- rnorm(n,0,0.5*sqrt(n/(n-1)))
> x <- x - mean(x) 
> sum(x)
[1] -1.625718e-11
> mean(x)
[1] -1.452682e-17

The problem of getting a sum or mean of _exactly_ 0 is just not well-defined, since sums and averages depend on the summation order:

> sum(x)
[1] -1.625718e-11
> sum(sample(x))
[1] -1.624851e-11
> sum(sort(x))
[1] -1.508771e-11
> sum(rev(sort(x)))
[1] -1.599831e-11


 


> set.seed(4867)
> l <- 1000000
> aux <- rnorm(l, 0, 0.5)
> aux <- scale(aux, scale = FALSE)
> sum(aux)
> 
> hist(aux, prob = TRUE)
> curve(dnorm(x, 0, 0.5), from = -2, to = 2, add = TRUE)
> 
> Hope this helps,
> 
> Rui Barradas
> 
> Em 01-04-2014 16:01, JLucke at ria.buffalo.edu escreveu:
>> Then what's wrong with centering your initial values around the mean?
>> 
>> 
>> 
>> Marc Marí Dell'Olmo <marceivissa at gmail.com>
>> 04/01/2014 10:56 AM
>> 
>> To
>> Boris Steipe <boris.steipe at utoronto.ca>,
>> cc
>> JLucke at ria.buffalo.edu, "r-help at r-project.org" <r-help at r-project.org>
>> Subject
>> Re: [R] A vector of normal distributed values with a sum-to-zero
>> constraint
>> 
>> 
>> 
>> 
>> 
>> 
>> Boris is right. I need this vector to include as initial values of a
>> MCMC process (with openbugs) and If I use this last approach sum(x)
>> could be a large (or extreme) value and can cause problems.
>> 
>> The other approach x <- c(x, -x) has the problem that only vectors
>> with even values are obtained.
>> 
>> Thank you!
>> 
>> 
>> 2014-04-01 16:25 GMT+02:00 Boris Steipe <boris.steipe at utoronto.ca>:
>>> But the result is not Normal. Consider:
>>> 
>>> set.seed(112358)
>>> N <- 100
>>> x <- rnorm(N-1)
>>> sum(x)
>>> 
>>> [1] 1.759446   !!!
>>> 
>>> i.e. you have an outlier at 1.7 sigma, and for larger N...
>>> 
>>> set.seed(112358)
>>> N <- 10000
>>> x <- rnorm(N-1)
>>> sum(x)
>>> [1] -91.19731
>>> 
>>> B.
>>> 
>>> 
>>> On 2014-04-01, at 10:14 AM, JLucke at ria.buffalo.edu wrote:
>>> 
>>>> The sum-to-zero constraint imposes a loss of one degree of freedom.  Of
>>  N samples, only (N-1) can be random.   Thus the solution is
>>>>> N <- 100
>>>>> x <- rnorm(N-1)
>>>>> x <- c(x, -sum(x))
>>>>> sum(x)
>>>> [1] -7.199102e-17
>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Boris Steipe <boris.steipe at utoronto.ca>
>>>> Sent by: r-help-bounces at r-project.org
>>>> 04/01/2014 09:29 AM
>>>> 
>>>> To
>>>> Marc Marí Dell'Olmo <marceivissa at gmail.com>,
>>>> cc
>>>> "r-help at r-project.org" <r-help at r-project.org>
>>>> Subject
>>>> Re: [R] A vector of normal distributed values with a sum-to-zero
>> constraint
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Make a copy with opposite sign. This is Normal, symmetric, but no
>> longer random.
>>>> 
>>>>  set.seed(112358)
>>>>  x <- rnorm(5000, 0, 0.5)
>>>>  x <- c(x, -x)
>>>>  sum(x)
>>>>  hist(x)
>>>> 
>>>> B.
>>>> 
>>>> On 2014-04-01, at 8:56 AM, Marc Marí Dell'Olmo wrote:
>>>> 
>>>>> Dear all,
>>>>> 
>>>>> Anyone knows how to generate a vector of Normal distributed values
>>>>> (for example N(0,0.5)), but with a sum-to-zero constraint??
>>>>> 
>>>>> The sum would be exactly zero, without decimals.
>>>>> 
>>>>> I made some attempts:
>>>>> 
>>>>>> l <- 1000000
>>>>>> aux <- rnorm(l,0,0.5)
>>>>>> s <- sum(aux)/l
>>>>>> aux2 <- aux-s
>>>>>> sum(aux2)
>>>>> [1] -0.000000000006131392
>>>>>> 
>>>>>> aux[1]<- -sum(aux[2:l])
>>>>>> sum(aux)
>>>>> [1] -0.00000000000003530422
>>>>> 
>>>>> 
>>>>> but the sum is not exactly zero and not all parameters are N(0,0.5)
>>>>> distributed...
>>>>> 
>>>>> Perhaps is obvious but I can't find the way to do it..
>>>>> 
>>>>> Thank you very much!
>>>>> 
>>>>> Marc
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> 
>> 	[[alternative HTML version deleted]]
>> 
>> 
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com




More information about the R-help mailing list