[R] A troubled state of freedom: generalized linear models wh ere number of parameters > number of samples

Liaw, Andy andy_liaw at merck.com
Mon Aug 23 02:16:24 CEST 2004


Check out the gpls package on CRAN. 

HTH,
Andy

> From: Min-Han Tan
> 
> Good morning,
> 
> Thank you all for your help so far. I really appreciate it.
> 
> The crux of my problem is that I am generating a generalized linear
> model with 1 dependent variable, approximately 50 training samples and
> 100 parameters (gene levels).
> 
> Essentially, if I have 100 genes and 50 samples, this results in
> coefficients for the first 49 samples, and NAs for the rest, with an
> ultra low residual deviance (usually approx. 10^-27). This seems to
> have something to do with the number of degrees of freedom (since as
> the number of genes increases up to 49, the number of residual degrees
> of freedom drops to 0)
> 
> What kind of methods can I use to make sense of this? 
> 
> I have a subsequent set of samples to work on to validate the results
> of this glm, so I am not sure if overfitting is really a problem.
> 
> Background: this is a microarray study, where I have divided the
> samples in the training set into 2 groups, and generated a number of
> genes to differentiate between both groups. I am going to use the GLM
> in a subsequent regression analysis to determine survival. For this
> purpose, I need to generate some kind of score for each individual
> case using the coefficients of each gene level * gene expression
> level.
> 
> I am not a statistician (but a clinician) - many apologies if I am not
> conveying myself very clearly here!
> 
> Thanks. 
> 
> Min-Han Tan
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>




More information about the R-help mailing list