[R] Physical or Statistical Explanation for the "Funnel" Plot?

Thomas Lumley tlumley at u.washington.edu
Fri Mar 27 08:55:38 CET 2009

```On Thu, 26 Mar 2009, Jason Rupert wrote:

>
> The R code below produces (after running for a few minutes on a decent computer) the plot shown at the following location:
>
> http://n2.nabble.com/Is-there-a-physical-and-quantitative-explanation-for-this-plot--td2542321.html
>
> I'm just taking the mean of a given set of random variables, where the set size
>is increased.  There appears to be a quick convergence and then a pretty steady
> variance out to a set size of 10,0000.

Part of the convergence is just that the standard devation of a mean of N observations is proportional to 1/sqrt(N). In your case the distributions are all exactly Normal; the same convergence would occur with other distributions, but you would also see the change in shape from left to right as the distribution converged to Normal.

There's also some plotting artifacts due to the size of the points.  The apparent stabilization at large N (and the wide vertical bar at zero that Marc Schwartz commented on) are due partly to the slow convergence of 1/sqrt(N) but largely because the width can't be smaller than the width of a point.

When I draw funnel plots like this for whole-genome association data I use the 'hexbin' package, which doesn't have these artifacts and is much faster and produces smaller graphics files.

-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

```