[R] Strange variable names in factor regression
Duncan Murdoch
murdoch@dunc@n @end|ng |rom gm@||@com
Thu May 9 14:49:39 CEST 2024
On 09/05/2024 8:09 a.m., Naresh Gurbuxani wrote:
>
> On converting character variables to ordered factors, regression result
> has strange names. Is it possible to obtain same variable names with
> and without intercept?
You are getting polynomial contrasts with the ordered factor, because
you have the default setting for options("contrasts"), i.e.
unordered ordered
"contr.treatment" "contr.poly"
If you run
options(contrasts = c("contr.treatment", "contr.treatment"))
you will get the same coefficient names in both cases.
By the way, the coefficients have different meanings, so it makes sense
they will have different names. It's perhaps a little bit more of a
problem that you *don't* get different variable names when an intercept
is included or not, because those coefficients also have different meanings.
It may also be a little bit of a surprise that you go back to treatment
contrasts when you leave out the intercept with the ordered factor, but
then it almost never makes sense to leave out the intercept in a
polynomial fit.
Duncan Murdoch
>
> Thanks,
> Naresh
>
> mydf <- data.frame(date = seq.Date(as.Date("2024-01-01"),
> as.Date("2024-03-31"), by = 1))
> mydf[, "wday"] <- weekdays(mydf$date, abbreviate = TRUE)
> mydf.work <- subset(mydf, !(wday %in% c("Sat", "Sun")))
> mydf.weekend <- subset(mydf, wday %in% c("Sat", "Sun"))
> mydf.work[, "volume"] <- round(rnorm(nrow(mydf.work), mean = 20, sd =
> 5))
> mydf.weekend[, "volume"] <- round(rnorm(nrow(mydf.weekend), mean = 10,
> sd = 5))
> mydf <- rbind(mydf.work, mydf.weekend)
>
> reg <- lm(volume ~ wday, data = mydf)
> ## Variable names as expected
> coef(reg)
> (Intercept) wdayMon wdaySat wdaySun wdayThu wdayTue
> 21.3846154 1.3076923 -12.0000000 -12.9230769 -1.9230769 -0.6923077
> wdayWed
> -1.6153846
>
> reg <- lm(volume ~ wday - 1, data = mydf)
> # Variable names as expected
> coef(reg)
> wdayFri wdayMon wdaySat wdaySun wdayThu wdayTue wdayWed
> 21.384615 22.692308 9.384615 8.461538 19.461538 20.692308 19.769231
>
> # Ordered factors for weekday sequence
> mydf$wday <- factor(mydf$wday, levels = c("Mon", "Tue", "Wed", "Thu",
> "Fri", "Sat", "Sun"), ordered = TRUE)
>
> reg <- lm(volume ~ wday - 1, data = mydf)
> # Variable names as expected
> coef(reg)
> wdayMon wdayTue wdayWed wdayThu wdayFri wdaySat wdaySun
> 22.692308 20.692308 19.769231 19.461538 21.384615 9.384615 8.461538
>
> reg <- lm(volume ~ wday, data = mydf)
> # Strange variable names
> coef(reg)
> (Intercept) wday.L wday.Q wday.C wday^4 wday^5
> 17.406593 -12.036715 -4.968654 -1.852819 3.291477 4.263642
> wday^6
> 2.591317
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list