[R] Generating samples from truncated multivariate Student-t distribution
Czarek Kowalski
czarek230800 at gmail.com
Wed May 10 20:02:18 CEST 2017
Previously I had used another language to make calculations based on
theory. I have calculated using R and I have received another results.
My theoretical calculation does not take into account the full
covariance matrix (only 6 elements from diagonal). Code based on
theory:
df = 4; #degrees of freedom
sigmas = c(1, 4, 2, 5, 3, 6) # roots of diagonal elements of covariance matrix
meann = c(55, 40, 50, 35, 45, 30)
alfa1 = 20; # lower truncation
beta1 = 60; # upper truncation
a = (alfa1 - meann)/sigmas;
b = (beta1 - meann)/sigmas;
E = meann + sigmas * ((gamma(df - 1)/2)*((df + a^2)^(-(df-1)/2) - (df
+ b^2)^(-(df-1)/2))*df^(df/2))/(2*(pt(b,df)-pt(a,df))*gamma(df/2)*gamma(1/2))
E
Kind regards
Czarek
On 9 May 2017 at 23:50, David Winsemius <dwinsemius at comcast.net> wrote:
>
>> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>>
>>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>
>>> I have already posted that in attachement - pdf file.
>>
>> I see that now. I failed to scroll to the 3rd page.
>>
>>> I am posting
>>> plain text here:
>>>
>>>> library(tmvtnorm)
>>>> meann = c(55, 40, 50, 35, 45, 30)
>>>> covv = matrix(c( 1, 1, 0, 2, -1, -1,
>>> 1, 16, -6, -6, -2, 12,
>>> 0, -6, 4, 2, -2, -5,
>>> 2, -6, 2, 25, 0, -17,
>>> -1, -2, -2, 0, 9, -5,
>>> -1, 12, -5, -17, -5, 36), 6, 6)
>>> df = 4
>>> lower = c(20, 20, 20, 20, 20, 20)
>>> upper = c(60, 60, 60, 60, 60, 60)
>>> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
>>>
>>>
>>>> sum(X1[,1]) / 100000
>>> [1] 54.98258
>>> sum(X1[,2]) / 100000
>>> [1] 40.36153
>>> sum(X1[,3]) / 100000
>>> [1] 49.83571
>>> sum(X1[,4]) / 100000
>>> [1] 34.69571 # "4th element of mean vector"
>>> sum(X1[,5]) / 100000
>>> [1] 44.81081
>>> sum(X1[,6]) / 100000
>>> [1] 31.10834
>>>
>>> And corresponding results received using equation (3) from pdf file:
>>> [54.97,
>>> 40,
>>> 49.95,
>>> 35.31, # "4th element of mean vector"
>>> 44.94,
>>> 31.32]
>>>
>>
>> I get similar results for the output from your code,
>>
>> My 100-fold run of your calculations were:
>>
>> meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper)
>> colMeans(Xbig)} )
>>
>> describe(meansBig[4,]) # describe is from Hmisc package
>>
>> meansBig[4, ]
>> n missing distinct Info Mean Gmd .05 .10 .25
>> 100 0 100 1 34.7 0.01954 34.68 34.68 34.69
>> .50 .75 .90 .95
>> 34.70 34.72 34.72 34.73
>>
>> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
>> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
>>
>>
>> So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory.
>
> Further investigation:
>
> covDiag <- covv*( row(covv)==col(covv) ) # just the diagonal means
>
> Repeat with all zero covariances:
>
>> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag, df, lower, upper)
> + colMeans(Xbig)} )
>> describe(meansDiag[4,])
> meansDiag[4, ]
> n missing distinct Info Mean Gmd .05 .10 .25
> 100 0 100 1 35.23 0.02074 35.21 35.21 35.22
> .50 .75 .90 .95
> 35.23 35.25 35.26 35.26
>
> lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
> highest: 35.26367 35.26635 35.26791 35.27251 35.27302
>
> So failing to account for the covariances in your theoretical calculations mostly explains the apparent discrepancy, although your value of 35.31 would be at the far end of a statistical distribution and I wonder about some sort of error in your theoretical calculation, which didn't appear to take into account the covariance matrix.
>
> Best;
> David.
>
>
>
>>
>> Best;
>> David.
>>
>>
>>> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>
>>>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>
>>>>> Of course I have expected the difference between theory and a sample
>>>>> of realizations of RV's and result mean should still be a random
>>>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>>>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>>>>> expected that the difference would be smaller because of law of large
>>>>> numbers (for 10mln samples the difference is quite similar).
>>>>
>>>> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results.
>>>>
>>>> I suggest you post some code and results.
>>>>
>>>> --
>>>> David.
>>>>
>>>>
>>>>>
>>>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>>
>>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>>>
>>>>>>> Dear Members,
>>>>>>> I am working with 6-dimensional Student-t distribution with 4 degrees
>>>>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>>>>>>> from truncated multivariate Student-t distribution using rtmvt
>>>>>>> function from package ‘tmvtnorm’. I have also calculated mean vector
>>>>>>> using equation (3) from attached pdf. The problem is, that after
>>>>>>> summing all elements in one column of rtmvt result (and dividing by
>>>>>>> 100 000) I do not receive the same result as using (3) equation. Could
>>>>>>> You tell me, what is incorrect, why there is a difference?
>>>>>>
>>>>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr?
>>>>>>
>>>>>>
>>>>>>> Yours faithfully
>>>>>>> Czarek Kowalski
>>>>>>> <truncatedT.pdf>______________________________________________
>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>> David Winsemius
>>>>>> Alameda, CA, USA
>>>>>>
>>>>
>>>> David Winsemius
>>>> Alameda, CA, USA
>>>>
>>
>> David Winsemius
>> Alameda, CA, USA
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
More information about the R-help
mailing list