[R] population variance and sample variance
Gabor Grothendieck
ggrothendieck at gmail.com
Thu Feb 4 19:23:10 CET 2010
Checking VAR_SAMP and VAR_POP in the H2 and PostgreSQL databases and
VAR and VARP in Excel we find that in all three cases the sample
variance uses n-1. Here is an R example using H2 and sqldf:
> library(RH2)
> library(sqldf)
> DF <- data.frame(x = 1:3)
> sqldf("select VAR_SAMP(x), VAR_POP(x) from DF")
VAR_SAMP..x.. VAR_POP..x..
1 1 0.6666667
> sum((DF$x - mean(DF$x))^2)/2
[1] 1
> var(DF$x)
[1] 1
On Thu, Feb 4, 2010 at 12:58 PM, Greg Snow <Greg.Snow at imail.org> wrote:
> Probably not a typo, but a different textbook used originally. Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)?
>
> I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu. You should only divide by n when you have the entire population. When you have a sample you need to divide by n-1 to adjust for using the sample mean.
>
> So from that I learned: population-divide by n; sample-divide by n-1.
>
> But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance.
>
> So from that thinking: population-divide by n-1; sample-divide by n.
>
> Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous.
>
> I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Ista Zahn
>> Sent: Tuesday, February 02, 2010 12:03 PM
>> To: Peng Yu
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] population variance and sample variance
>>
>> Probably a simple typo, but just to keep things straight: you want to
>> divide by n when describing the standard deviation of a sample, and
>> divide by n-1 when estimating a population standard deviation (your
>> initial description had it backwards I think).
>>
>> On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> > On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
>> > <kingsfordjones at gmail.com> wrote:
>> >>> sum((x-mean(x))^2)/(n)
>> >> [1] 0.4894708
>> >>> ((n-1)/n) * var(x)
>> >> [1] 0.4894708
>> >
>> > But this is not a built-in function in R to do so, right?
>> >
>> >> hth,
>> >> Kingsford
>> >>
>> >> On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu <pengyu.ut at gmail.com>
>> wrote:
>> >>> It seems that var() computes sample variance. It is straight
>> forward
>> >>> to compute population variance from sample variance. However, I
>> feel
>> >>> that it is still convenient to have a function that can compute
>> >>> population variance. Is there a population variance function
>> available
>> >>> in R?
>> >>>
>> >>> $ Rscript var.R
>> >>>> set.seed(0)
>> >>>> n = 4
>> >>>> x = rnorm(n)
>> >>>> var(x)
>> >>> [1] 0.6526278
>> >>>> sum((x-mean(x))^2)/(n-1)
>> >>> [1] 0.6526278
>> >>>>
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list