[R] Using lm on data.frame with categorical data as character column results in error in plot.lm

Gerhard Burger g@@@burger @ending from l@cdr@leidenuniv@nl
Tue Nov 13 18:24:32 CET 2018


Hi all,

Not sure if the following could be considered a bug, or just a user error
but here goes:

We're teaching our students to use the tidyverse for most of their R stuff
and the following gives problems (code adapted/shortened to pinpoint
problem):

```
iris_long = tidyr::gather(iris, key ="variable", value = "value", -Species)
iris_lm = lm( value ~ Species + variable, data = iris_long)
stats:::plot.lm(iris_lm, which = 5)
```

whereas, if we use reshape::melt instead of tidyr::gather it works fine:

```
iris_long = reshape2::melt(iris)
iris_lm = lm( value ~ Species + variable, data = iris_long)
stats:::plot.lm(iris_lm, which = 5)
```

Now the only difference between the output from melt and gather is that the
resulting "variable" column is a factor column in melt, but a character
column in gather:

```
testthat::expect_identical(reshape2::melt(iris), tidyr::gather(iris, key
="variable", value = "value", -Species))
```

This can be fixed by adding `factor_key = T` to the gather call, after
which everything works fine. Are categorical variables required to be in a
factor column? Because `lm` seems to handle it fine, but `plot.lm` gives
problems... Is this something that might need a fix in plot.lm?

Any insight appreciated!

Kind regards,
Gerhard

For completeness, my sessionInfo:

```
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8
LC_MONETARY=nl_NL.UTF-8
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=nl_NL.UTF-8       LC_NAME=C
           LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18     tidyr_0.8.1      crayon_1.3.4     R6_2.2.2
plyr_1.8.4       magrittr_1.5     pillar_1.3.0     rlang_0.2.2
 [9] stringi_1.2.4    reshape2_1.4.3   rstudioapi_0.7   testthat_2.0.0
tools_3.5.1      stringr_1.3.1    glue_1.3.0       purrr_0.2.5
[17] compiler_3.5.1   tidyselect_0.2.4 tibble_1.4.2
```

	[[alternative HTML version deleted]]



More information about the R-help mailing list