[Rd] Using response variable in interaction as explanatory variable in glm crashes R

Martin Maechler maechler at stat.math.ethz.ch
Mon Oct 9 17:52:43 CEST 2017


>>>>> Jan van der Laan <rhelp at eoos.dds.nl>
>>>>>     on Fri, 6 Oct 2017 12:13:39 +0200 writes:

    > It is actually model.matrix that crashes, not glm. Same
    > crash occurs with e.g. lm.

    > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)

    > also crashes R.

Yes, segmentation fault.

It only happens when these are *logical* variables, not, e.g., when
transformed to integer.

The C code in src/library/stats/src/model.c  tries to eliminate
occurances of the LHS of the formula from the RHS when building
the model matrix and it does work fine in the integer case.

Part of the culprit code may be this (from line 717),
with the  isLogical(.) which in our case, shifts the pointer by
1  in the call to firstfactor() :

			int adj = isLogical(var_i)?1:0;
			// avoid overflow of jstart * nn PR#15578
			firstfactor(&rx[jstart * nn], n, jnext - jstart,
				    REAL(contrast), nrows(contrast),
				    ncols(contrast), INTEGER(var_i)+adj);

then in firstfactor(), we see the segfault (when running R with
'-d gdb') :

    > model.matrix(dob_mon ~ dob_day*dob_mon, data = tab)

  Program received signal SIGSEGV, Segmentation fault.
  0x00007fffeafa76b5 in firstfactor (ncx=0, v=0x5c3b37c, ncc=1, nrc=2, c=0x5c90008, 
   nrx=8, x=0x5cbf150) at ../../../../../R/src/library/stats/src/model.c:252
    252		    else xj[i] = cj[v[i]-1];
    Missing separate debuginfos, .................
    (gdb) list
    247	    for (int j = 0; j < ncc; j++) {
    248		xj = &x[j * (R_xlen_t)nrx];
    249		cj = &c[j * (R_xlen_t)nrc];
    250		for (int i = 0; i < nrx; i++)
    251		    if(v[i] == NA_INTEGER) xj[i] = NA_REAL;
    252		    else xj[i] = cj[v[i]-1];
    253	    }
    254	}
    255	

and indeed in the debugger,  i=7  and  v[i] is "outside", v[]
being of length 7, hence indexed 0:6.


    > Jan



    > On 06-10-17 12:08, Jan van der Laan wrote:
    >> 
    >> The following code crashes R (I know I shouldn't try to
    >> estimate such a model; this was a bug in some code of
    >> mine). I also tried with R-devel; same result.
    >> 
    >> 
    >> tab <- structure(list(dob_day = c(FALSE, FALSE, FALSE,
    >> FALSE, TRUE, TRUE, TRUE, TRUE), dob_mon = c(FALSE, FALSE,
    >> TRUE, TRUE, FALSE, FALSE, TRUE, TRUE), dob_year =
    >> c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE), n
    >> = c(1489634L, 17491L, 134985L, 1639L, 47892L, 611L,
    >> 4365L, 750L), pred1 = c(1488301, 18187, 135605, 1657,
    >> 48547, 593, 4423, 54)), .Names = c("dob_day", "dob_mon",
    >> "dob_year", "n", "pred1"), row.names = c(NA, -8L), class
    >> = "data.frame")
    >> 
    >> m <- glm(dob_mon ~ dob_day*dob_mon, data = tab, family =
    >> binomial())
    >> 
    >> 
    >> The crash doesn't when the variables are added just as
    >> main effects (dob_day+dob_mon): this results in a warning
    >> and the removal of dob_mon from the formula.
    >> 
    >> -- 
    >> 
    >> Jan
    >> 
    >> 
    >> 
    >> > R.version                _ platform      
    >> x86_64-pc-linux-gnu arch           x86_64 os            
    >> linux-gnu system         x86_64, linux-gnu status
    >> major          3 minor          4.1 year           2017
    >> month          06 day            30 svn rev        72865
    >> language       R version.string R version 3.4.1
    >> (2017-06-30) nickname       Single Candle
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list