[R] (gam) formula: Why different results for terms being factor vs. numeric?

Marius Hofert marius.hofert at math.ethz.ch
Tue Oct 29 21:16:51 CET 2013


Dear expeRts,

If I specify group = as.factor(rep(1:2, each=n)) in the below
definition of dat, I get the expected behavior I am looking for. I
wonder why I
don't get it if group is *not* a factor... My guess was that,
internally, factors are treated as natural numbers (and this indeed
seems to be true if you convert the latter to factors [essentially
meaning changing the levels]), but replacing factors by numeric values
(as below) does not provide the same answer.

Cheers,
Marius


require(mgcv)

n <- 10
yrs <- 2000+seq_len(n)
set.seed(271)
dat <- data.frame(year  = rep(yrs, 2),
                  group = rep(1:2, each=n), # *not* a factor
(as.factor() provides the expected behavior)
                  resp  = c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n)))
fit3 <- gam(resp ~ year + group - 1, data=dat)
plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp),
     xlab="Year", ylab="Response") # fit group A; mean over all
responses in this group
lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group
B; mean over all responses in this group
points(yrs, dat$resp[seq_len(n)]) # actual response group A
points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B
## => hmmm... because it is not a factor (?), this does not give an
expected answer,
##    but gam() still correctly figures out that there are two groups



More information about the R-help mailing list