[R] Doubt about Student t distribution simulation

John Fox jfox at mcmaster.ca
Fri Aug 4 23:44:29 CEST 2006


Dear Sundar,

Try qq.plot(t, dist="t", df=n-1) from the car package, which include a
95-percent point-wise confidence envelope that helps you judge how extreme
the outliers are relative to expectations.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: Sundar Dorai-Raj [mailto:sundar.dorai-raj at pdf.com] 
> Sent: Friday, August 04, 2006 3:27 PM
> To: John Fox
> Cc: joseclaudio.faria at terra.com.br; R-help at stat.math.ethz.ch
> Subject: Re: [R] Doubt about Student t distribution simulation
> 
> Hi, Jose/John,
> 
> Here's an example to help Jose and highlights John's advice. 
> Also includes set.seed which should be included in all 
> simulations posted to R-help.
> 
> set.seed(42)
> mu <- 10
> sigma <-  5
> n <- 3
> nsim <- 10000
> m <- matrix(rnorm(n * nsim, mu, sigma), nsim, n) t <- 
> apply(m, 1, function(x) (mean(x) - mu)/(sd(x)/sqrt(n)))
> 
> library(lattice)
> qqmath(t, distribution = function(x) qt(x, n - 1),
>         panel = function(x, ...) {
>           panel.qqmath(x, col = "darkblue", ...)
>           panel.qqmathline(x, col = "darkred", ...)
>         })
> 
> 
> With n = 3, expect a few outliers.
> 
> --sundar
> 
> 
> John Fox wrote:
> > Dear Jose,
> > 
> > The problem is that you're using the population standard deviation 
> > (sigma) rather than the sample SD of each sample [i.e., t[i]  = 
> > (mean(amo.i) - mu) /
> > (sd(amo.i) / sqrt(n)) ], so your values should be normally 
> > distributed, as they appear to be.
> > 
> > A couple of smaller points: (1) Even after this correction, you're 
> > sampling from a discrete population (albeit with 
> replacement) and so 
> > the values won't be exactly t-distributed. You could draw 
> the samples 
> > directly from N(mu,
> > sigma) instead. (2) It would be preferable to make a 
> > quantile-comparison plot against the t-distribution, since 
> you'd get a 
> > better picture of what's going on in the tails.
> > 
> > I hope this helps,
> >  John
> > 
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> > 
> > 
> >>-----Original Message-----
> >>From: r-help-bounces at stat.math.ethz.ch 
> >>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jose Claudio 
> >>Faria
> >>Sent: Friday, August 04, 2006 3:09 PM
> >>To: R-help at stat.math.ethz.ch
> >>Subject: [R] Doubt about Student t distribution simulation
> >>
> >>Dear R list,
> >>
> >>I would like to illustrate the origin of the Student t distribution 
> >>using R.
> >>
> >>So, if (sample.mean - pop.mean) / standard.error(sample.mean) has t 
> >>distribution with (sample.size - 1) degree free, what is wrong with 
> >>the simulation below? I think that the theoretical curve 
> should agree 
> >>with the relative frequencies of the t values calculated:
> >>
> >>#== begin options=====
> >># parameters
> >>   mu    = 10
> >>   sigma =  5
> >>
> >># size of sample
> >>   n = 3
> >>
> >># repetitions
> >>   nsim = 10000
> >>
> >># histogram parameter
> >>   nchist = 150
> >>#== end options=======
> >>
> >>t   = numeric()
> >>pop = rnorm(10000, mean = mu, sd = sigma)
> >>
> >>for (i in 1:nsim) {
> >>   amo.i = sample(pop, n, replace = TRUE)
> >>   t[i]  = (mean(amo.i) - mu) / (sigma / sqrt(n)) }
> >>
> >>win.graph(w = 5, h = 7)
> >>split.screen(c(2,1))
> >>screen(1)
> >>hist(t,
> >>      main     = "histogram",
> >>      breaks   = nchist,
> >>      col      = "lightgray",
> >>      xlab     = '', ylab = "Fi",
> >>      font.lab = 2, font = 2)
> >>
> >>screen(2)
> >>hist(t,
> >>      probability = T,
> >>      main        = 'f.d.p and histogram',
> >>      breaks      = nchist,
> >>      col         = 'lightgray',
> >>      xlab        = 't', ylab = 'f(t)',
> >>      font.lab    = 2, font = 2)
> >>
> >>x = t
> >>curve(dt(x, df = n-1), add = T, col = "red", lwd = 2)
> >>
> >>Many thanks for any help,
> >>___
> >>Jose Claudio Faria
> >>Brasil/Bahia/Ilheus/UESC/DCET
> >>Estatística Experimental/Prof. Adjunto
> >>mails: joseclaudio.faria at terra.com.br
> >>        jc_faria at uesc.br
> >>        jc_faria at uol.com.br
> >>
> >>______________________________________________
> >>R-help at stat.math.ethz.ch mailing list
> >>https://stat.ethz.ch/mailman/listinfo/r-help
> >>PLEASE do read the posting guide
> >>http://www.R-project.org/posting-guide.html
> >>and provide commented, minimal, self-contained, reproducible code.
> > 
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list