[R] How to improve, at all, a simple GLM code

Fri Mar 30 22:26:56 CEST 2012

On 12-03-30 12:40 PM, Clifton, Abigail J. wrote:
> Hi again!
> 

> Thanks very much for the code, it appears to work! Finally, I want
> to extract the coefficients and tried coef(g1), which works.
> However, there only appear to be intercepts/coefficients for 'V22N'
> out of thousands of possibilities, which are all displayed as
> dots/NaN. Is there a way of getting more coefficients - perhaps by
> changing lambda or something like that? Is it also possible to
> print the final 'model'?

  I'm afraid I'm out of time right now -- cc'ing to r-help in case
someone else has the time and energy to help.  All I can suggest is
that you spend some time reading through all of the documentation for
the package (start with help(package="glmnet") and browse through all
the help pages, run the examples, etc.  Unfortunately there is no
general-purpose vignette for that package ... an entire book on the
subject is available online
http://www-stat.stanford.edu/~tibs/ElemStatLearn/ , but that won't
provide quick answers ...

  Ben Bolker

> Kind regards,
> 
> Abigail
> 
> 
> -----Original Message----- From: Ben Bolker <bbolker at gmail.com> 
> Sender: r-help-bounces at r-project.orgDate: Fri, 30 Mar 2012 02:58:04
>  To: <r-help at stat.math.ethz.ch> Subject: Re: [R] How to improve,
> at all, a simple GLM code
> 
> Abigail Clifton <abigailclifton <at> me.com> writes:
> 
>> I am wanting to find a good predictive model, yes. It's part of a
>>  project so if I have time after finding the model I may want to 
>> find some patterns but it's not a priority. I just want the
>> model for now (I need the coefficients above all).
> 
>> It's all categorical data, I categorised any continuous data 
>> before I started trying to fit the glm.
> 
> That's not necessarily a good idea (categorising often loses power 
> relative to fitting something like an additive model), but OK.
> 
> 
>> I was unsure of how to get the csv file to you,however, I have 
>> uploaded it and it should be available for download from here: 
>> http://www.filedropper.com/prepareddata
> 
> Here's how far I got:
> 
> Prepared_Data <-  na.omit(read.csv("Prepared_Data.csv", 
> header=TRUE)) pd <- Prepared_Data[,-3]  ## data minus response 
> variable
> 
> ## how many levels per variable? lev <- sapply(pd,function(x) 
> length(unique(x)))
> 
> ## total parameters for n variables par(las=1,bty="l") 
> plot(cumprod(lev),log="y")
> 
> library(Matrix) m <- sparse.model.matrix(~.^2,data=pd)  ## slower 
> than model.matrix ncol(m)  ##8352 columns (!!)
> 
> library(glmnet) g1 <- glmnet(m,Prepared_Data$C3,
> family="binomial")
> 
> This doesn't appear to work properly, yet (I get funny values),
> but it's the direction I would go ...
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the
> posting guide http://www.R-project.org/posting-guide.html and
> provide commented, minimal, self-contained, reproducible code.