[R] accuracy of pt for x close to 0
Charles C. Berry
cberry at tajo.ucsd.edu
Wed Oct 3 21:06:55 CEST 2007
On Wed, 3 Oct 2007, skylab gupta wrote:
> Hello,
>
> I have been playing around with the statistical distributions in R, and
> overall I think the accuracy is very good. However, it seems that for the
> Student's t distribution, the CDF loses accuracy when evaluated at values
> close to zero. For instance, I did the following in R
>
> ----------------------------------
> df<-seq(1,100,by=1)
> z<-c(0.50003183098839799,0.50003535533895194,0.50003675525997071,
> 0.50003749999985481,0.50003796066840744,0.50003827327749706,
> 0.50003849914427922,0.50003866990364754,0.50003880349244212 ,
> 0.50003891083995444,0.50003899897813187,0.50003907263208447,
> 0.50003913510150055,0.50003918874627440,0.50003923531785055,
> 0.50003927612461441,0.50003931217478748,0.50003934425324170,
> 0.50003937297989520,0.50003939886014204, 0.50003942229165621,
> 0.50003944360703978,0.50003946308016112,0.50003948094039441,
> 0.50003949738053710,0.50003951256485324,0.50003952663295181,
> 0.50003953969680248,0.50003955185925653,0.50003956322006460,
> 0.50003957385523301,0.50003958382054481 ,0.50003959318443636,
> 0.50003960200394315,0.50003961032679112,0.50003961818144815,
> 0.50003962562026172,0.50003963266089213,0.50003963934773465,
> 0.50003964569404735,0.50003965173577758,0.50003965749688895,
> 0.50003966298323521, 0.50003966823056478,0.50003967322766096,
> 0.50003967801868676,0.50003968260005904,0.50003968700228751,
> 0.50003969121916547,0.50003969526955183,0.500039699153400
> 63,0.50003970290428668,0.50003970650705731,0.50003970997149927 ,
> 0.50003971332909936,0.50003971654204993,0.50003971964040972,
> 0.50003972264367180,0.50003972553808163,0.50003972835715427,
> 0.50003973106835642,0.50003973370765664,0.50003973624942966,
> 0.50003973868896101,0.50003974107556448, 0.50003974338818691,
> 0.50003974563557085,0.50003974781567961,0.50003974993203681,
> 0.50003975199594708,0.50003975399737965,0.50003975593675354,
> 0.50003975782715593,0.50003975966389691,0.50003976145762119,
> 0.50003976321975063,0.50003976489560775 ,0.50003976655049909,
> 0.50003976818673812,0.50003976975798736,0.50003977127434285,
> 0.50003977277055756,0.50003977423495483,0.50003977566285773,
> 0.50003977705769798,0.50003977841313474,0.50003977975147973,
> 0.50003978102874791, 0.50003978230822732,0.50003978356836509,
> 0.50003978477872879,0.50003978596096421,0.50003978713049724,
> 0.50003978827577344,0.50003978935715154,0.50003979045422919,
> 0.50003979153680134,0.50003979256756137,0.50003979358957851,
> 0.50003979462027492 )
> y<-pt(1e-4,df)
> plot(df,(z-y)/z, type="s")
> ----------------------------------
>
> Basically, I evaluate the CDF of the Student's t distribution at 1e-4, for
> integer degrees of freedom from 1 to 100; the variable y contains the values
> computed by R, and the variable z contains the values obtained by
> Mathematica (with about 17 digits precision). The plot of relative accuracy
> shows that we get only about 10 digits of accuracy with R. It seems there is
> something is going wrong in the computations. Is this correct, or am I
> missing something?
How did you verify the accuracy of the vector z ?
The differences in z are not monotone.
Your worst case is df=88
Looks like pt() is better there , but both have error on the order of
1e-11:
> integrate( function(x) dt(x,88), 0, 1e-4 )
3.978106e-05 with absolute error < 4.4e-19
> z[88] -0.5 - integrate( function(x) dt(x,88), 0, 1e-4 )$value
[1] -2.661810e-11
> pt( 1e-04, 88 ) - 0.5 - integrate( function(x) dt(x,88),0,1e-4)$value
[1] 1.224781e-11
HTH,
Chuck
>
> I am using a precompiled binary version of R on windows xp. Following
> information might be helpful as well
>
> ----------------------------
>> version
> _
> platform i386-pc-mingw32
> arch i386
> os mingw32
> system i386, mingw32
> status
> major 2
> minor 5.1
> year 2007
> month 06
> day 27
> svn rev 42083
> language R
> version.string R version 2.5.1 (2007-06-27)
>
>> sessionInfo()
> R version 2.5.1 (2007-06-27)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
> "base"
>>
> ----------------------------
>
> Any help is appreciated.
>
> Thanks.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
More information about the R-help
mailing list