[R] population variance and sample variance

Peng Yu pengyu.ut at gmail.com
Thu Feb 4 19:07:22 CET 2010


On Thu, Feb 4, 2010 at 11:58 AM, Greg Snow <Greg.Snow at imail.org> wrote:
> Probably not a typo, but a different textbook used originally.  Statistics is still a relatively young science, so we have not settled on a single set of notation/symbols/jargon yet (look at intro textbooks, is p the population proportion (with p-hat the sample) or is p the sample proportion (with pi as the population)?
>
> I originally learned that dividing by n gives the 'population' variance since if you have the entire population then mu is known exactly and you do not need to correct for unknown mu.  You should only divide by n when you have the entire population.  When you have a sample you need to divide by n-1 to adjust for using the sample mean.
>
> So from that I learned: population-divide by n; sample-divide by n-1.
>
> But I have seen others use the approach of dividing a sample sum of squares by n gives the variance of the sample data, but dividing by n-1 gives the estimate of the population variance.
>
> So from that thinking: population-divide by n-1; sample-divide by n.
>
> Both make sense, so to be clear it is best to just state the divisor rather than using terms like population and sample and expecting to be unambiguous.
>
> I have also seen them referred to as unbiased (n-1) and maximum likelihood (n), but these are not perfect descriptors once you start talking about standard deviations rather than variances.


I'm so surprised that even this basic definition does not have unique
name in the nomenclature, which might cause confusion in certain
context. Just some of my thought---if both definitions are OK, then
the wiki page might be revised
http://en.wikipedia.org/wiki/Variance#Population_variance_and_sample_variance.
After all, many none pure statisticians relies on wiki for easy access
of some simple terms.


>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Ista Zahn
>> Sent: Tuesday, February 02, 2010 12:03 PM
>> To: Peng Yu
>> Cc: r-help at stat.math.ethz.ch
>> Subject: Re: [R] population variance and sample variance
>>
>> Probably a simple typo, but just to keep things straight: you want to
>> divide by n when describing the standard deviation of a sample, and
>> divide by n-1 when estimating a population standard deviation (your
>> initial description had it backwards I think).
>>
>> On Tue, Feb 2, 2010 at 5:25 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> > On Mon, Oct 19, 2009 at 12:53 PM, Kingsford Jones
>> > <kingsfordjones at gmail.com> wrote:
>> >>> sum((x-mean(x))^2)/(n)
>> >> [1] 0.4894708
>> >>> ((n-1)/n) * var(x)
>> >> [1] 0.4894708
>> >
>> > But this is not a built-in function in R to do so, right?
>> >
>> >> hth,
>> >> Kingsford
>> >>
>> >> On Mon, Oct 19, 2009 at 9:30 AM, Peng Yu <pengyu.ut at gmail.com>
>> wrote:
>> >>> It seems that var() computes sample variance. It is straight
>> forward
>> >>> to compute population variance from sample variance. However, I
>> feel
>> >>> that it is still convenient to have a function that can compute
>> >>> population variance. Is there a population variance function
>> available
>> >>> in R?
>> >>>
>> >>> $ Rscript var.R
>> >>>> set.seed(0)
>> >>>> n = 4
>> >>>> x = rnorm(n)
>> >>>> var(x)
>> >>> [1] 0.6526278
>> >>>> sum((x-mean(x))^2)/(n-1)
>> >>> [1] 0.6526278
>> >>>>
>> >>>
>> >>> ______________________________________________
>> >>> R-help at r-project.org mailing list
>> >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> >>> and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list