[R] Generating samples from truncated multivariate Student-t distribution

Wed May 10 21:25:42 CEST 2017

> On May 10, 2017, at 11:02 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
> 
> Previously I had used another language to make calculations based on
> theory. I have calculated using R and I have received another results.
> My theoretical calculation does not take into account the full
> covariance matrix (only 6 elements from diagonal). Code based on
> theory:
> 
> df = 4;   #degrees of freedom
> sigmas = c(1, 4, 2, 5, 3, 6) # roots of diagonal elements of covariance matrix
> meann = c(55, 40, 50, 35, 45, 30)
> alfa1 = 20; # lower truncation
> beta1 = 60; # upper truncation
> a = (alfa1 - meann)/sigmas;
> b = (beta1 - meann)/sigmas;
> E = meann + sigmas * ((gamma(df - 1)/2)*((df + a^2)^(-(df-1)/2) - (df
> + b^2)^(-(df-1)/2))*df^(df/2))/(2*(pt(b,df)-pt(a,df))*gamma(df/2)*gamma(1/2))

This looks wrong:

(gamma(df - 1)/2)

According to your "theory" (which you have not yet supported with references) in the attached pdf file, that should be:

gamma( (df - 1)/2 )

I'm not a statistician and checking your "theoretical" expression for the mean of a truncated PDF is not really on-topic for R help. When I look at http://www.tonyohagan.co.uk/academic/pdf/trunc_multi_t.PDF I see a lot of normalization factors that seem quite different than yours. Perhaps you should post any further difficulties to stats.stackexchange.com ?

You may also want to consult:

https://www.jstatsoft.org/article/view/v016c02

Authors:	Saralees Nadarajah, Samuel Kotz
Title:	R Programs for Truncated Distributions
Abstract:	Truncated distributions arise naturally in many practical situations. In this note, we provide programs for computing six quantities of interest (probability density function, mean, variance, cumulative distribution function, quantile function and random numbers) for any truncated distribution: whether it is left truncated, right truncated or doubly truncated. The programs are written in R: a freely downloadable statistical software.

Best;
David.

> E
> 
> 
> Kind regards
> Czarek
> 
> On 9 May 2017 at 23:50, David Winsemius <dwinsemius at comcast.net> wrote:
>> 
>>> On May 9, 2017, at 2:33 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>>> 
>>> 
>>>> On May 9, 2017, at 2:05 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>> 
>>>> I have already posted that in attachement - pdf file.
>>> 
>>> I see that now. I failed to scroll to the 3rd page.
>>> 
>>>> I am posting
>>>> plain text here:
>>>> 
>>>>> library(tmvtnorm)
>>>>> meann = c(55, 40, 50, 35, 45, 30)
>>>>> covv = matrix(c(  1, 1, 0, 2, -1, -1,
>>>>                  1, 16, -6, -6, -2, 12,
>>>>                  0, -6, 4, 2, -2, -5,
>>>>                  2, -6, 2, 25, 0, -17,
>>>>                 -1, -2, -2, 0, 9, -5,
>>>>                 -1, 12, -5, -17, -5, 36), 6, 6)
>>>> df = 4
>>>> lower = c(20, 20, 20, 20, 20, 20)
>>>> upper = c(60, 60, 60, 60, 60, 60)
>>>> X1 <- rtmvt(n=100000, meann, covv, df, lower, upper)
>>>> 
>>>> 
>>>>> sum(X1[,1]) / 100000
>>>> [1] 54.98258
>>>> sum(X1[,2]) / 100000
>>>> [1] 40.36153
>>>> sum(X1[,3]) / 100000
>>>> [1] 49.83571
>>>> sum(X1[,4]) / 100000
>>>> [1] 34.69571      # "4th element of mean vector"
>>>> sum(X1[,5]) / 100000
>>>> [1] 44.81081
>>>> sum(X1[,6]) / 100000
>>>> [1] 31.10834
>>>> 
>>>> And corresponding results received using equation (3) from pdf file:
>>>> [54.97,
>>>> 40,
>>>> 49.95,
>>>> 35.31, #  "4th element of mean vector"
>>>> 44.94,
>>>> 31.32]
>>>> 
>>> 
>>> I get similar results for the output from your code,
>>> 
>>> My 100-fold run of your calculations were:
>>> 
>>> meansBig <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covv, df, lower, upper)
>>> colMeans(Xbig)} )
>>> 
>>> describe(meansBig[4,])  # describe is from Hmisc package
>>> 
>>> meansBig[4, ]
>>>      n  missing distinct     Info     Mean      Gmd      .05      .10      .25
>>>    100        0      100        1     34.7  0.01954    34.68    34.68    34.69
>>>    .50      .75      .90      .95
>>>  34.70    34.72    34.72    34.73
>>> 
>>> lowest : 34.65222 34.66675 34.66703 34.66875 34.67566
>>> highest: 34.72939 34.73012 34.73051 34.73742 34.74441
>>> 
>>> 
>>> So agree, 35.31 is outside the plausible range of an RV formed with that package, but I don't have any code relating to your calculations from theory.
>> 
>> Further investigation:
>> 
>> covDiag <- covv*( row(covv)==col(covv) )  # just the diagonal means
>> 
>> Repeat with all zero covariances:
>> 
>>> meansDiag <- replicate(100, {Xbig <- rtmvt(n=100000, meann, covDiag, df, lower, upper)
>> + colMeans(Xbig)} )
>>> describe(meansDiag[4,])
>> meansDiag[4, ]
>>       n  missing distinct     Info     Mean      Gmd      .05      .10      .25
>>     100        0      100        1    35.23  0.02074    35.21    35.21    35.22
>>     .50      .75      .90      .95
>>   35.23    35.25    35.26    35.26
>> 
>> lowest : 35.18360 35.19756 35.20098 35.20179 35.20622
>> highest: 35.26367 35.26635 35.26791 35.27251 35.27302
>> 
>> So failing to account for the covariances in your theoretical calculations mostly explains the apparent discrepancy, although your value of 35.31 would be at the  far end of a statistical distribution and I wonder about some sort of error in your theoretical calculation, which didn't appear to take into account the covariance matrix.
>> 
>> Best;
>> David.
>> 
>> 
>> 
>>> 
>>> Best;
>>> David.
>>> 
>>> 
>>>> On 9 May 2017 at 22:17, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>> 
>>>>>> On May 9, 2017, at 1:11 PM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>> 
>>>>>> Of course I have expected the difference between theory and a sample
>>>>>> of realizations of RV's and result mean should still be a random
>>>>>> variable. But, for example for 4th element of mean vector: 35.31 -
>>>>>> 34.69571 = 0.61429. It is quite big difference, nieprawdaż? I have
>>>>>> expected that the difference would be smaller because of law of large
>>>>>> numbers (for 10mln samples the difference is quite similar).
>>>>> 
>>>>> I for one have no idea what is meant by a "4th element of mean vector". So I have now idea what to consider "big". I have found that my intuitions about multivariate distributions, especially those where the covariate structure is as complex as you have assumed, are often far from simulated results.
>>>>> 
>>>>> I suggest you post some code and results.
>>>>> 
>>>>> --
>>>>> David.
>>>>> 
>>>>> 
>>>>>> 
>>>>>> On 9 May 2017 at 21:40, David Winsemius <dwinsemius at comcast.net> wrote:
>>>>>>> 
>>>>>>>> On May 9, 2017, at 10:09 AM, Czarek Kowalski <czarek230800 at gmail.com> wrote:
>>>>>>>> 
>>>>>>>> Dear Members,
>>>>>>>> I am working with 6-dimensional Student-t distribution with 4 degrees
>>>>>>>> of freedom truncated to [20; 60]. I have generated 100 000 samples
>>>>>>>> from truncated multivariate Student-t distribution using rtmvt
>>>>>>>> function from package ‘tmvtnorm’. I have also calculated  mean vector
>>>>>>>> using equation (3) from attached pdf. The problem is, that after
>>>>>>>> summing all elements in one column of rtmvt result (and dividing by
>>>>>>>> 100 000) I do not receive the same result as using (3) equation. Could
>>>>>>>> You tell me, what is incorrect, why there is a difference?
>>>>>>> 
>>>>>>> I guess the question is why you would NOT expect a difference between theory and a sample of realizations of RV's? The result mean should still be a random variable, night wahr?
>>>>>>> 
>>>>>>> 
>>>>>>>> Yours faithfully
>>>>>>>> Czarek Kowalski
>>>>>>>> <truncatedT.pdf>______________________________________________
>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>> 
>>>>>>> David Winsemius
>>>>>>> Alameda, CA, USA
>>>>>>> 
>>>>> 
>>>>> David Winsemius
>>>>> Alameda, CA, USA
>>>>> 
>>> 
>>> David Winsemius
>>> Alameda, CA, USA
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
>> 

David Winsemius
Alameda, CA, USA