[R] How to improve, at all, a simple GLM code
Ben Bolker
bbolker at gmail.com
Fri Mar 30 04:58:04 CEST 2012
Abigail Clifton <abigailclifton <at> me.com> writes:
> I am wanting to find a good predictive model, yes. It's part of a
> project so if I have time after finding the model I may want to find
> some patterns but it's not a priority. I just want the model for now
> (I need the coefficients above all).
> It's all categorical data, I categorised any continuous data before
> I started trying to fit the glm.
That's not necessarily a good idea (categorising often loses
power relative to fitting something like an additive model),
but OK.
> I was unsure of how to get the csv file to you,however, I have
> uploaded it and it should be available for download from here:
> http://www.filedropper.com/prepareddata
Here's how far I got:
Prepared_Data <- na.omit(read.csv("Prepared_Data.csv", header=TRUE))
pd <- Prepared_Data[,-3] ## data minus response variable
## how many levels per variable?
lev <- sapply(pd,function(x) length(unique(x)))
## total parameters for n variables
par(las=1,bty="l")
plot(cumprod(lev),log="y")
library(Matrix)
m <- sparse.model.matrix(~.^2,data=pd) ## slower than model.matrix
ncol(m) ##8352 columns (!!)
library(glmnet)
g1 <- glmnet(m,Prepared_Data$C3, family="binomial")
This doesn't appear to work properly, yet (I get funny values),
but it's the direction I would go ...
More information about the R-help
mailing list