[R] R Newbie Question/Data Model

Rubén Roa-Ureta rroa at udec.cl
Fri Apr 25 15:33:48 CEST 2008


guox at ucalgary.ca wrote:
> Given a data set and a set of predictors and a response in the data,
> we would like to find a model that fits the data set best.
> Suppose that we do not know what kind of model (linear, polynomial
> regression,... ) might be good, we are wondering if there is R-package(s)
> can auctomatically do this.
> Otherwise, can you direct me, or point out reference(s),
> basic steps to do this. Thanks.
> 
> -james

The best-fitting model for any data is a model with a lot of parameters, 
so maybe the best fitting model for any data is a model with an infinite 
number of parameters. However, any model with more parameters than data 
will have a negative number of degrees of freedom, and you do not want 
that. The best-fitting model for any data subject to the constraint that 
the number of degrees of freedom is non-negative, is the data itself, 
with zero degrees of freedom.
The AIC tells you this too. The AIC for the model formed by the data 
itsel is 2n, whereas the AIC for any model with negative degrees of 
freedom is > 2n.
But I guess you want to make inference from sample to population. If 
that is indeed the case, then you should consider changing your focus 
from finding "a model that fits the data set best" to a model that best 
summarizes the information contained in your sample about the population 
the sample comes from. To do that, start by defining the nature of your 
response variable. What is the nature of the natural process generating 
this response variable? Is it continuous or discrete? Is it univariate 
or multivariate? Can it take negative and positive values? Can it take 
values of zero? After you have clarified the probabilistic model for the 
response variable, then you can start thinking about the mathematical 
relation between the response variable and the predictors. Is it linear 
or nonlinear? Are the predictors categorical or continuous?
Read the posting guide, formulate a clear question, and maybe you will 
be given more specific help.
Rubén



More information about the R-help mailing list