[R] Physical or Statistical Explanation for the "Funnel" Plot?
tlumley at u.washington.edu
Fri Mar 27 08:55:38 CET 2009
On Thu, 26 Mar 2009, Jason Rupert wrote:
> The R code below produces (after running for a few minutes on a decent computer) the plot shown at the following location:
> I'm just taking the mean of a given set of random variables, where the set size
>is increased. There appears to be a quick convergence and then a pretty steady
> variance out to a set size of 10,0000.
Part of the convergence is just that the standard devation of a mean of N observations is proportional to 1/sqrt(N). In your case the distributions are all exactly Normal; the same convergence would occur with other distributions, but you would also see the change in shape from left to right as the distribution converged to Normal.
There's also some plotting artifacts due to the size of the points. The apparent stabilization at large N (and the wide vertical bar at zero that Marc Schwartz commented on) are due partly to the slow convergence of 1/sqrt(N) but largely because the width can't be smaller than the width of a point.
When I draw funnel plots like this for whole-genome association data I use the 'hexbin' package, which doesn't have these artifacts and is much faster and produces smaller graphics files.
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help