[R] How to improve, at all, a simple GLM code

Fri Mar 30 04:58:04 CEST 2012

Abigail Clifton <abigailclifton <at> me.com> writes:

> I am wanting to find a good predictive model, yes. It's part of a
> project so if I have time after finding the model I may want to find
> some patterns but it's not a priority. I just want the model for now
> (I need the coefficients above all).

> It's all categorical data, I categorised any continuous data before
>  I started trying to fit the glm.

  That's not necessarily a good idea (categorising often loses
power relative to fitting something like an additive model),
but OK.

> I was unsure of how to get the csv file to you,however, I have
> uploaded it and it should be available for download from here:
> http://www.filedropper.com/prepareddata

  Here's how far I got:

Prepared_Data <-  na.omit(read.csv("Prepared_Data.csv", header=TRUE))
pd <- Prepared_Data[,-3]  ## data minus response variable

## how many levels per variable?
lev <- sapply(pd,function(x) length(unique(x)))

## total parameters for n variables
par(las=1,bty="l")
plot(cumprod(lev),log="y")

library(Matrix)
m <- sparse.model.matrix(~.^2,data=pd)  ## slower than model.matrix
ncol(m)  ##8352 columns (!!)

library(glmnet)
g1 <- glmnet(m,Prepared_Data$C3, family="binomial")

  This doesn't appear to work properly, yet (I get funny values),
but it's the direction I would go ...