[R] A troubled state of freedom: generalized linear models wh ere number of parameters > number of samples
Liaw, Andy
andy_liaw at merck.com
Mon Aug 23 02:16:24 CEST 2004
Check out the gpls package on CRAN.
HTH,
Andy
> From: Min-Han Tan
>
> Good morning,
>
> Thank you all for your help so far. I really appreciate it.
>
> The crux of my problem is that I am generating a generalized linear
> model with 1 dependent variable, approximately 50 training samples and
> 100 parameters (gene levels).
>
> Essentially, if I have 100 genes and 50 samples, this results in
> coefficients for the first 49 samples, and NAs for the rest, with an
> ultra low residual deviance (usually approx. 10^-27). This seems to
> have something to do with the number of degrees of freedom (since as
> the number of genes increases up to 49, the number of residual degrees
> of freedom drops to 0)
>
> What kind of methods can I use to make sense of this?
>
> I have a subsequent set of samples to work on to validate the results
> of this glm, so I am not sure if overfitting is really a problem.
>
> Background: this is a microarray study, where I have divided the
> samples in the training set into 2 groups, and generated a number of
> genes to differentiate between both groups. I am going to use the GLM
> in a subsequent regression analysis to determine survival. For this
> purpose, I need to generate some kind of score for each individual
> case using the coefficients of each gene level * gene expression
> level.
>
> I am not a statistician (but a clinician) - many apologies if I am not
> conveying myself very clearly here!
>
> Thanks.
>
> Min-Han Tan
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
More information about the R-help
mailing list