[R] Using R to illustrate the Central Limit Theorem

Fri May 13 11:59:58 CEST 2005

Interesting thread. The graphics are great, the only thing that might be
worth doing for teaching purposes would be to illustrate the original
distribution that is being averaged 1000 times.

Below is one option based on Bill Venables code.  Note that to do this I
had to start with a k of 2.

N <- 10000
 for(k in 2:20) {
    graphics.off()
    par(mfrow = c(2,2), pty = "s")
    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 1")
    hist(((runif(k))-0.5)*sqrt(12*k),main="Example Distribution 2")
    m <- replicate(N, (mean(runif(k))-0.5)*sqrt(12*k))
    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
    pu <- par("usr")[1:2]
    x <- seq(pu[1], pu[2], len = 500)
    lines(x, dnorm(x), col = "red")
    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
    abline(0, 1, col = "red")
    Sys.sleep(3)
  }

By the way, I should probably know this but what is the logic of the
"sqrt(12*k)" part of the example?  Obviously as k increases the mean
will approach .5 in a uniform distribution, so runif(k)-.5 will be close
to zero, and sqrt(12*k) increases as k increases.  Why 12, though?

PB

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Kevin E. Thorpe
Sent: Friday, May 13, 2005 12:03 AM
To: Bill.Venables at csiro.au
Cc: ted.harding at nessie.mcc.ac.uk; r-help at stat.math.ethz.ch
Subject: Re: [R] Using R to illustrate the Central Limit Theorem

This thread was very timely for me since I will be teaching an
introductory
biostats course in the fall, including a session on CLT. I have
shamelessly
taken Dr. Venables' slick piece of code and wrapped it in a function so
that
I can vary the maximum sample size at will. I then created functions
based
on the first to sample from the exponential and the
chi-squaredistributions.
Lastly, I created a function to sample from a pdf with a parabolic shape
(confirming in the process just how rusty my calculus is :-) ). My code
is
below for all to do with as they please.

Now, at the risk of asking a really obvious question ...

In my final function, I used the inverse probability integral
transformation
to sample from my parabolic distribution. I create a local function
rparab
shown here:

rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}

It seems that in my version of R (2.0.1) on Linux, that calculating the
cube
root of a negative number using ^(1/3) returns NaN. I looked at the help
in
the arithmetic operators and did help.search("cube root"), 
help.search("root")
and help.search("cube") and recognised no alternatives. So I used an 
ifelse() to
deal with the negatives. Have I missed something really elementary?

Thanks

clt.unif <- function(n=20) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(runif(N*k), N, k)) - 0.5)*sqrt(12*k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Uniform(0,1), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.exp <- function(n=20,theta=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rexp(N*k,1/theta), N, k)) - theta)/sqrt(theta^2/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("exp(",theta,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.chisq <- function(n=20,nu=1) {
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rchisq(N*k,nu), N, k)) - nu)/sqrt(2*nu/k)
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("Chi-Square(",nu,"), n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

clt.parab <- function(n=20) {
rparab <- function(nn) {
u <- 2*runif(nn) - 1
ifelse(u<0,-(abs(u)^(1/3)),u^(1/3))
}
N <- 10000
graphics.off()
par(mfrow = c(1,2), pty = "s")
for(k in 1:n) {
m <- (rowMeans(matrix(rparab(N*k), N, k)))/sqrt(3/(5*k))
hist(m, breaks = "FD", xlim = c(-4,4),
main = paste("n = ",k,sep=""),
prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
pu <- par("usr")[1:2]
x <- seq(pu[1], pu[2], len = 500)
lines(x, dnorm(x), col = "red")
qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
abline(0, 1, col = "red")
Sys.sleep(1)
}
}

Bill.Venables at csiro.au wrote:

>Here's a bit of a refinement on Ted's first suggestion.
>
> 
> N <- 10000
> graphics.off()
> par(mfrow = c(1,2), pty = "s")
> for(k in 1:20) {
>    m <- (rowMeans(matrix(runif(M*k), N, k)) - 0.5)*sqrt(12*k)
>    hist(m, breaks = "FD", xlim = c(-4,4), main = k,
>            prob = TRUE, ylim = c(0,0.5), col = "lemonchiffon")
>    pu <- par("usr")[1:2]
>    x <- seq(pu[1], pu[2], len = 500)
>    lines(x, dnorm(x), col = "red")
>    qqnorm(m, ylim = c(-4,4), xlim = c(-4,4), pch = ".", col = "blue")
>    abline(0, 1, col = "red")
>    Sys.sleep(1)
>  }
>  
>

-- 
Kevin E. Thorpe
Biostatistician/Trialist, Knowledge Translation Program
Assistant Professor, Department of Public Health Sciences
Faculty of Medicine, University of Toronto
email: kevin.thorpe at utoronto.ca  Tel: 416.946.8081  Fax: 416.971.2462

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html