[R] Increasing number of observations worsen the regression model

Raffa r@||@m@|den @end|ng |rom gm@||@com
Sat May 25 14:38:07 CEST 2019


I have the following code:

```

rm(list=ls())
N = 30000
xvar <- runif(N, -10, 10)
e <- rnorm(N, mean=0, sd=1)
yvar <- 1 + 2*xvar + e
plot(xvar,yvar)
lmMod <- lm(yvar~xvar)
print(summary(lmMod))
domain <- seq(min(xvar), max(xvar))    # define a vector of x values to 
feed into model
lines(domain, predict(lmMod, newdata = data.frame(xvar=domain)))    # 
add regression line, using `predict` to generate y-values

```

I expected the coefficients to be something similar to [1,2]. Instead R 
keeps throwing at me random numbers that are not statistically 
significant and don't fit the model, and I have 20k observations. For 
example

```

Call:
lm(formula = yvar ~ xvar)

Residuals:
     Min      1Q  Median      3Q     Max
-21.384  -8.908   1.016  10.972  23.663

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0007145  0.0670316   0.011    0.991
xvar        0.0168271  0.0116420   1.445    0.148

Residual standard error: 11.61 on 29998 degrees of freedom
Multiple R-squared:  7.038e-05,    Adjusted R-squared: 3.705e-05
F-statistic: 2.112 on 1 and 29998 DF,  p-value: 0.1462

```


The strange thing is that the code works perfectly for N=200 or N=2000. 
It's only for larger N that this thing happen U(for example, N=20000). I 
have tried to ask for example in CrossValidated 
<https://stats.stackexchange.com/questions/410050/increasing-number-of-observations-worsen-the-regression-model> 
but the code works for them. Any help?

I am runnign R 3.6.0 on Kubuntu 19.04

Best regards

Raffaele


	[[alternative HTML version deleted]]



More information about the R-help mailing list