[R] lm fails on some large input

William Dunlap wdun|@p @end|ng |rom t|bco@com
Thu Apr 18 18:32:09 CEST 2019


This sort of data arises quite easily if you deal with time/dates around
now.  E.g.,

> d <- data.frame(
+     when = seq(as.POSIXct("2017-09-29 18:22:01"), by="secs", len=10),
+     measurement = log2(1:10))
> coef(lm(data=d, measurement ~ when))
       (Intercept)               when
2.1791061114716954                 NA
> as.numeric(d$when)[1:2]
[1] 1506734521 1506734522

There are problems with the time units (seconds vs. hours) if you subtract
off a time because the units of -.POSIXt depend on the data:

> coef(lm(data=d, measurement ~ I(when - min(when))))
        (Intercept) I(when - min(when))
0.68327571513124297 0.33240675474232279
> coef(lm(data=d, measurement ~ I(when - as.POSIXct("2017-09-29
00:00:00"))))
                                (Intercept) I(when - as.POSIXct("2017-09-29
00:00:00"))
                       -21978.3837546251634
1196.6643170736229


Hence you have to use difftime and specify the units

> coef(lm(data=d, measurement ~ difftime(when, as.POSIXct("2017-09-29
00:00:00"), units="secs")))
                                                      (Intercept)
                                          -2.1978383754612696e+04
difftime(when, as.POSIXct("2017-09-29 00:00:00"), units = "secs")
                                           3.3240675474248449e-01
> coef(lm(data=d, measurement ~ difftime(when, min(when), units="secs")))
                              (Intercept) difftime(when, min(when), units =
"secs")
                      0.68327571513124297
 0.33240675474232279



Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Apr 18, 2019 at 8:24 AM Michael Dewey <lists using dewey.myzen.co.uk>
wrote:

> Perhaps subtract 1506705766 from y?
>
> Saying some other software does it well implies you know what the
> _correct_ answer is here but I would question what that means with this
> sort of data-set.
>
> On 17/04/2019 07:26, Dingyuan Wang wrote:
> > Hi,
> >
> > This input doesn't have any interesting properties except y is unix
> > time. Spreadsheets can do this well.
> > Is this a bug that lm can't do x ~ y?
> >
> > R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
> > Copyright (C) 2018 The R Foundation for Statistical Computing
> > Platform: x86_64-pc-linux-gnu (64-bit)
> >
> >  > x = c(79.744, 123.904, 87.29601, 116.352, 67.71201, 72.96001,
> > 101.632, 108.928, 94.08)
> >  > y = c(1506705739.385, 1506705766.895, 1506705746.293, 1506705761.873,
> > 1506705734.743, 1506705735.351, 1506705756.26, 1506705761.307,
> > 1506705747.372)
> >  > m = lm(x ~ y)
> >  > summary(m)
> >
> > Call:
> > lm(formula = x ~ y)
> >
> > Residuals:
> >       Min       1Q   Median       3Q      Max
> > -27.0222 -14.9902  -0.6542  14.1938  29.1698
> >
> > Coefficients: (1 not defined because of singularities)
> >              Estimate Std. Error t value Pr(>|t|)
> > (Intercept)   94.734      6.511   14.55 4.88e-07 ***
> > y                 NA         NA      NA       NA
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 19.53 on 8 degrees of freedom
> >
> >  > summary(lm(y ~ x))
> >
> > Call:
> > lm(formula = y ~ x)
> >
> > Residuals:
> >      Min      1Q  Median      3Q     Max
> > -2.1687 -1.3345 -0.9466  1.3826  2.6551
> >
> > Coefficients:
> >               Estimate Std. Error   t value Pr(>|t|)
> > (Intercept) 1.507e+09  3.294e+00 4.574e+08  < 2e-16 ***
> > x           6.136e-01  3.413e-02 1.798e+01 4.07e-07 ***
> > ---
> > Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> >
> > Residual standard error: 1.885 on 7 degrees of freedom
> > Multiple R-squared:  0.9788,    Adjusted R-squared:  0.9758
> > F-statistic: 323.3 on 1 and 7 DF,  p-value: 4.068e-07
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> > ---
> > This email has been checked for viruses by AVG.
> > https://www.avg.com
> >
> >
>
> --
> Michael
> http://www.dewey.myzen.co.uk/home.html
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list