[Rd] Discourage the weights= option of lm with summarized data

Mon Oct 9 07:58:13 CEST 2017

Yes.  Thank you; I should have quoted it.
I suggest to remove this text or to add the word "not" at the beginning.

   Arie

On Sun, Oct 8, 2017 at 4:38 PM, Viechtbauer Wolfgang (SP)
<wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
> Ah, I think you are referring to this part from ?lm:
>
> "(including the case that there are w_i observations equal to y_i and the data have been summarized)"
>
> I see; indeed, I don't think this is what 'weights' should be used for (the other part before that is correct). Sorry, I misunderstood the point you were trying to make.
>
> Best,
> Wolfgang
>
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie ten Cate
> Sent: Sunday, 08 October, 2017 14:55
> To: r-devel at r-project.org
> Subject: [Rd] Discourage the weights= option of lm with summarized data
>
> Indeed: Using 'weights' is not meant to indicate that the same
> observation is repeated 'n' times.  As I showed, this gives erroneous
> results. Hence I suggested that it is discouraged rather than
> encouraged in the Details section of lm in the Reference manual.
>
>    Arie
>
> ---Original Message-----
> On Sat, 7 Oct 2017, wolfgang.viechtbauer at maastrichtuniversity.nl wrote:
>
> Using 'weights' is not meant to indicate that the same observation is
> repeated 'n' times. It is meant to indicate different variances (or to
> be precise, that the variance of the last observation in 'x' is
> sigma^2 / n, while the first three observations have variance
> sigma^2).
>
> Best,
> Wolfgang
>
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Arie ten Cate
> Sent: Saturday, 07 October, 2017 9:36
> To: r-devel at r-project.org
> Subject: [Rd] Discourage the weights= option of lm with summarized data
>
> In the Details section of lm (linear models) in the Reference manual,
> it is suggested to use the weights= option for summarized data. This
> must be discouraged rather than encouraged. The motivation for this is
> as follows.
>
> With summarized data the standard errors get smaller with increasing
> numbers of observations. However, the standard errors in lm do not get
> smaller when for instance all weights are multiplied with the same
> constant larger than one, since the inverse weights are merely
> proportional to the error variances.
>
> Here is an example of the estimated standard errors being too large
> with the weights= option. The p value and the number of degrees of
> freedom are also wrong. The parameter estimates are correct.
>
>   n <- 10
>   x <- c(1,2,3,4)
>   y <- c(1,2,5,4)
>   w <- c(1,1,1,n)
>   xb <- c(x,rep(x[4],n-1))  # restore the original data
>   yb <- c(y,rep(y[4],n-1))
>   print(summary(lm(yb ~ xb)))
>   print(summary(lm(y ~ x, weights=w)))
>
> Compare with PROC REG in SAS, with a WEIGHT statement (like R) and a
> FREQ statement (for summarized data).
>
>     Arie
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel