[R] (gam) formula: Why different results for terms being factor vs. numeric?
Marius Hofert
marius.hofert at math.ethz.ch
Tue Oct 29 21:16:51 CET 2013
Dear expeRts,
If I specify group = as.factor(rep(1:2, each=n)) in the below
definition of dat, I get the expected behavior I am looking for. I
wonder why I
don't get it if group is *not* a factor... My guess was that,
internally, factors are treated as natural numbers (and this indeed
seems to be true if you convert the latter to factors [essentially
meaning changing the levels]), but replacing factors by numeric values
(as below) does not provide the same answer.
Cheers,
Marius
require(mgcv)
n <- 10
yrs <- 2000+seq_len(n)
set.seed(271)
dat <- data.frame(year = rep(yrs, 2),
group = rep(1:2, each=n), # *not* a factor
(as.factor() provides the expected behavior)
resp = c(seq_len(n)+runif(n), 5+seq_len(n)+runif(n)))
fit3 <- gam(resp ~ year + group - 1, data=dat)
plot(yrs, fit3$fitted.values[seq_len(n)], type="l", ylim=range(dat$resp),
xlab="Year", ylab="Response") # fit group A; mean over all
responses in this group
lines (yrs, fit3$fitted.values[n+seq_len(n)], col="blue") # fit group
B; mean over all responses in this group
points(yrs, dat$resp[seq_len(n)]) # actual response group A
points(yrs, dat$resp[n+seq_len(n)], col="blue") # actual response group B
## => hmmm... because it is not a factor (?), this does not give an
expected answer,
## but gam() still correctly figures out that there are two groups
More information about the R-help
mailing list