[R] lm fails on some large input

Berry, Charles ccberry @end|ng |rom uc@d@edu
Thu Apr 18 18:39:38 CEST 2019



> On Apr 18, 2019, at 8:24 AM, Michael Dewey <lists using dewey.myzen.co.uk> wrote:
> 
> Perhaps subtract 1506705766 from y?

Good advice. Some further notes follow.

One can specify `tol` to have a smaller than default value

e.g.

  m2 <- lm(x ~ y, tol=1e-12)

which is accurate:

  plot(y,x)
  abline(coef=coef(m2))
 

Users of numerical procedures need to be mindful of the default settings of the algorithms they use.

As is well known, the use of a too large default for convergence of an optimization algorithm can lead to seriously wrong results. There is an example described here:

https://science.sciencemag.org/content/296/5575/1945/tab-pdf

One might quibble with the choice of tol=1e-7 (the default in lm.fit), and 64 bit floating point will support much smaller values. However, there are usually statistical issues surrounding fitting highly collinear variables.

So,  `tol = 1e-07` seems more like a feature than a bug.

HTH,

Chuck

> 
> Saying some other software does it well implies you know what the _correct_ answer is here but I would question what that means with this sort of data-set.
> 
> On 17/04/2019 07:26, Dingyuan Wang wrote:
>> Hi,
>> This input doesn't have any interesting properties except y is unix time. Spreadsheets can do this well.
>> Is this a bug that lm can't do x ~ y?
>> R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
>> Copyright (C) 2018 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001, 101.632, 108.928, 94.08)
>> > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873, 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307, 1506705747.372)
>> > m = lm(x ~ y)
>> > summary(m)
>> Call:
>> lm(formula = x ~ y)
>> Residuals:
>>      Min       1Q   Median       3Q      Max
>> -27.0222 -14.9902  -0.6542  14.1938  29.1698
>> Coefficients: (1 not defined because of singularities)
>>             Estimate Std. Error t value Pr(>|t|)
>> (Intercept)   94.734      6.511   14.55 4.88e-07 ***
>> y                 NA         NA      NA       NA
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> Residual standard error: 19.53 on 8 degrees of freedom
>> > summary(lm(y ~ x))
>> Call:
>> lm(formula = y ~ x)
>> Residuals:
>>     Min      1Q  Median      3Q     Max
>> -2.1687 -1.3345 -0.9466  1.3826  2.6551
>> Coefficients:
>>              Estimate Std. Error   t value Pr(>|t|)
>> (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
>> x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>> Residual standard error: 1.885 on 7 degrees of freedom
>> Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
>> F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> ---
>> This email has been checked for viruses by AVG.
>> https://www.avg.com
> 
> -- 
> Michael
> http://www.dewey.myzen.co.uk/home.html
> 



More information about the R-help mailing list