[R] Generating samples from truncated multivariate Student-t distribution

Tue May 9 23:50:03 CEST 2017

> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsemius at comcast.net> wrote:
> 
> 
>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>> 
>> I have already posted that in attachement - pdf file.
> 
> I see that now. I failed to scroll to the 3rd page.
> 
>> I am posting
>> plain text here:
>> 
>>> library(tmvtnorm)
>>> meann = c(55, 40, 50, 35, 45, 30)
>>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>>                   1, 16, -6, -6, -2, 12,
>>                   0, -6, 4, 2, -2, -5,
>>                   2, -6, 2, 25, 0, -17,
>>                  -1, -2, -2, 0, 9, -5,
>>                  -1, 12, -5, -17, -5, 36), 6, 6)
>> df = 4
>> lower = c(20, 20, 20, 20, 20, 20)
>> upper = c(60, 60, 60, 60, 60, 60)
>> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
>> 
>> 
>>> sum(X1[,1]) / 100000
>> [1] 54.98258
>> sum(X1[,2]) / 100000
>> [1] 40.36153
>> sum(X1[,3]) / 100000
>> [1] 49.83571
>> sum(X1[,4]) / 100000
>> [1] 34.69571      # "4th element of mean vector"
>> sum(X1[,5]) / 100000
>> [1] 44.81081
>> sum(X1[,6]) / 100000
>> [1] 31.10834
>> 
>> And corresponding results received using equation (3) from pdf file:
>> [54.97,
>> 40,
>> 49.95,
>> 35.31, #  "4th element of mean vector"
>> 44.94,
>> 31.32]
>> 
> 
> I get similar results for the output from your code, 
> 
> My 100-fold run of your calculations were:
> 
> meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper)
> colMeans(Xbig)} )
> 
> describe(meansBig[4,])  # describe is from Hmisc package
> 
> meansBig[4, ] 
>       n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
>     100        0      100        1     34.7  0.01954    34.68    34.68    34.69 
>     .50      .75      .90      .95 
>   34.70    34.72    34.72    34.73 
> 
> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
> 
> 
> So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory.

Further investigation:

covDiag <- covv*( row(covv)==col(covv) )  # just the diagonal means

Repeat with all zero covariances:

> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag, df, lower, upper)
+ colMeans(Xbig)} )
> describe(meansDiag[4,])
meansDiag[4, ] 
       n  missing distinct     Info     Mean      Gmd      .05      .10      .25 
     100        0      100        1    35.23  0.02074    35.21    35.21    35.22 
     .50      .75      .90      .95 
   35.23    35.25    35.26    35.26 

lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
highest: 35.26367 35.26635 35.26791 35.27251 35.27302

So failing to account for the covariances in your theoretical calculations mostly explains the apparent discrepancy, although your value of 35.31 would be at the  far end of a statistical distribution and I wonder about some sort of error in your theoretical calculation, which didn't appear to take into account the covariance matrix.

Best;
David.

> 
> Best;
> David.
> 
> 
>> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote:
>>> 
>>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>> 
>>>> Of course I have expected the difference between theory and a sample
>>>> of realizations of RV's and result mean should still be a random
>>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>>>> expected that the difference would be smaller because of law of large
>>>> numbers (for 10mln samples the difference is quite similar).
>>> 
>>> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results.
>>> 
>>> I suggest you post some code and results.
>>> 
>>> --
>>> David.
>>> 
>>> 
>>>> 
>>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>> 
>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>> 
>>>>>> Dear Members,
>>>>>> I am working with 6-dimensional Student-t distribution with 4 degrees
>>>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>>>>>> from truncated multivariate Student-t distribution using rtmvt
>>>>>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>>>>>> using equation (3) from attached pdf. The problem is, that after
>>>>>> summing all elements in one column of rtmvt result (and dividing by
>>>>>> 100 000) I do not receive the same result as using (3) equation. Could
>>>>>> You tell me, what is incorrect, why there is a difference?
>>>>> 
>>>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr?
>>>>> 
>>>>> 
>>>>>> Yours faithfully
>>>>>> Czarek Kowalski
>>>>>> <truncatedT.pdf>______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>> 
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>> 
>>> 
>>> David Winsemius
>>> Alameda, CA, USA
>>> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA