[R] Modifying a design matrix in mgcv
David Winsemius
dwinsemius at comcast.net
Wed Aug 14 21:32:30 CEST 2013
On Aug 14, 2013, at 11:08 AM, Andrew Crane-Droesch wrote:
> Hello,
>
> I am trying to make a slight modification to the way gam (mgcv) works.
> I want to modify the G$X matrix, which is the design matrix, to
> accommodate an estimator that I am trying to program. I am working with
> panel data, and I want to take all continuous variables, including basis
> functions evaluated at covariate variables, subtract their groupwise
> means, and then add in their overall mean. (This is an alternative to
> having a dummy variable for every single group, and I'm working in a
> context where random effects aren't appropriate). I'd then do the same
> thing with G$y. Afterwards I want to pass the modified G$X and G$Y
> matrices to estimate.gam, which takes G as an argument.
>
> As far as I can tell, G$X is organized to have the parametric terms
> (including factors) at the front with column names, and nonparametric
> terms at the end without column names. So identifying which terms to
> apply the transformation to should be straightforward.
>
> I've had some luck tweaking the internals of mgcv before, changing the
> default plotting behavior of vis.gam. But I am having trouble doing so
> with gam itself. To test things out, I did the following:
>
> * type "gam" into the terminal, to get the printout of the function's
> code.
> * add a line below line 70: G$X = G$X^2 (just to see if I can make an
> arbitrary transformation)
> * change the function name from gam to mod.gam, run it, and then run
> mod.gam on an arbitrary dataset.
>
> When I do so, I get the following error:
> Error in mod.gam(y ~ s(x)) : could not find function "variable.summary"
>
> It turns out that variable.summary, along with gam.setup, estimate.gam,
> and perhaps other constituents of gam do NOT have help pages, nor does
> typing them into the terminal reveal their code. Am I encountering
> functions compiled from some non-R language ©?
You are encountering functions that are not exported from mgcv. So the environment where the interpreter looks for them is not the same as when `gam` is called. You could get them to "work" by prepending `mgcv::` to each instance of the "not-found" functions. You might also try just this:
environment(mod.gam) <- environment(gam)
I seem to get success with that strategy but I am not qualified to assess your coding chnages
> environment(mod.gam) <- environment(gam)
> modm = mod.gam(y~s(x))
>
> modm
Family: gaussian
Link function: identity
Formula:
y ~ s(x)
Estimated degrees of freedom:
8.2 total = 9.2
GCV score: 1.375422
--
David.
> gam.setup, estimate.gam
>
> Regardless, I would greatly appreciate any help with my problem of
> transforming the model matrix before fitting. I'd very much prefer to
> use mgcv, with all of the work that has obviously gone into it, rather
> than reinventing the wheel.
>
> My code for reproducing the error is below.
>
> Thanks in advance for any help.
>
> Best,
> Andrew
>
> library(mgcv)
> gam
> mod.gam = function (formula, family = gaussian(), data = list(), weights
> = NULL,
> subset = NULL, na.action, offset = NULL, method = "GCV.Cp",
> optimizer = c("outer", "newton"), control = list(), scale = 0,
> select = FALSE, knots = NULL, sp = NULL, min.sp = NULL, H = NULL,
> gamma = 1, fit = TRUE, paraPen = NULL, G = NULL, in.out = NULL,
> ...)
> {
> control <- do.call("gam.control", control)
> if (is.null(G)) {
> gp <- interpret.gam(formula)
> cl <- match.call()
> mf <- match.call(expand.dots = FALSE)
> mf$formula <- gp$fake.formula
> mf$family <- mf$control <- mf$scale <- mf$knots <- mf$sp <-
> mf$min.sp <- mf$H <- mf$select <- mf$gamma <- mf$method <- mf$fit <-
> mf$paraPen <- mf$G <- mf$optimizer <- mf$in.out <- mf$... <- NULL
> mf$drop.unused.levels <- TRUE
> mf[[1]] <- as.name("model.frame")
> pmf <- mf
> mf <- eval(mf, parent.frame())
> if (nrow(mf) < 2)
> stop("Not enough (non-NA) data to do anything meaningful")
> terms <- attr(mf, "terms")
> vars <- all.vars(gp$fake.formula[-2])
> inp <- parse(text = paste("list(", paste(vars, collapse = ","),
> ")"))
> if (!is.list(data) && !is.data.frame(data))
> data <- as.data.frame(data)
> dl <- eval(inp, data, parent.frame())
> names(dl) <- vars
> var.summary <- variable.summary(gp$pf, dl, nrow(mf))
> rm(dl)
> pmf$formula <- gp$pf
> pmf <- eval(pmf, parent.frame())
> pterms <- attr(pmf, "terms")
> if (is.character(family))
> family <- eval(parse(text = family))
> if (is.function(family))
> family <- family()
> if (is.null(family$family))
> stop("family not recognized")
> if (family$family[1] == "gaussian" && family$link ==
> "identity")
> am <- TRUE
> else am <- FALSE
> if (!control$keepData)
> rm(data)
> G <- gam.setup(gp, pterms = pterms, data = mf, knots = knots,
> sp = sp, min.sp = min.sp, H = H, absorb.cons = TRUE,
> sparse.cons = 0, select = select, idLinksBases =
> control$idLinksBases,
> scale.penalty = control$scalePenalty, paraPen = paraPen)
> G$var.summary <- var.summary
> G$family <- family
> if (ncol(G$X) > nrow(G$X) + nrow(G$C))
> stop("Model has more coefficients than data")
> G$terms <- terms
> G$pterms <- pterms
> G$mf <- mf
> G$cl <- cl
> G$am <- am
> if (is.null(G$offset))
> G$offset <- rep(0, G$n)
> G$min.edf <- G$nsdf - dim(G$C)[1]
> if (G$m)
> for (i in 1:G$m) G$min.edf <- G$min.edf +
> G$smooth[[i]]$null.space.dim
> G$formula <- formula
> environment(G$formula) <- environment(formula)
> }
> if (!fit)
> return(G)
> G$conv.tol <- control$mgcv.tol
> G$max.half <- control$mgcv.half
> G$X = G$X^2
> object <- estimate.gam(G, method, optimizer, control, in.out,
> scale, gamma, ...)
> if (!is.null(G$L)) {
> object$full.sp <- as.numeric(exp(G$L %*% log(object$sp) +
> G$lsp0))
> names(object$full.sp) <- names(G$lsp0)
> }
> names(object$sp) <- names(G$sp)
> object$paraPen <- G$pP
> object$formula <- G$formula
> object$var.summary <- G$var.summary
> object$cmX <- G$cmX
> object$model <- G$mf
> object$na.action <- attr(G$mf, "na.action")
> object$control <- control
> object$terms <- G$terms
> object$pterms <- G$pterms
> object$assign <- G$assign
> object$contrasts <- G$contrasts
> object$xlevels <- G$xlevels
> object$offset <- G$offset
> if (!is.null(G$Xcentre))
> object$Xcentre <- G$Xcentre
> if (control$keepData)
> object$data <- data
> object$df.residual <- nrow(G$X) - sum(object$edf)
> object$min.edf <- G$min.edf
> if (G$am && !(method %in% c("REML", "ML", "P-ML", "P-REML")))
> object$optimizer <- "magic"
> else object$optimizer <- optimizer
> object$call <- G$cl
> class(object) <- c("gam", "glm", "lm")
> object
> }
> x = rnorm(100)
> y = exp(y)+rnorm(100)
> m = gam(y~s(x))
> modm = mod.gam(y~s(x))
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list