# [R] R Newbie Question/Data Model

Rubén Roa-Ureta rroa at udec.cl
Fri Apr 25 15:33:48 CEST 2008

```guox at ucalgary.ca wrote:
> Given a data set and a set of predictors and a response in the data,
> we would like to find a model that fits the data set best.
> Suppose that we do not know what kind of model (linear, polynomial
> regression,... ) might be good, we are wondering if there is R-package(s)
> can auctomatically do this.
> Otherwise, can you direct me, or point out reference(s),
> basic steps to do this. Thanks.
>
> -james

The best-fitting model for any data is a model with a lot of parameters,
so maybe the best fitting model for any data is a model with an infinite
number of parameters. However, any model with more parameters than data
will have a negative number of degrees of freedom, and you do not want
that. The best-fitting model for any data subject to the constraint that
the number of degrees of freedom is non-negative, is the data itself,
with zero degrees of freedom.
The AIC tells you this too. The AIC for the model formed by the data
itsel is 2n, whereas the AIC for any model with negative degrees of
freedom is > 2n.
But I guess you want to make inference from sample to population. If
that is indeed the case, then you should consider changing your focus
from finding "a model that fits the data set best" to a model that best
the sample comes from. To do that, start by defining the nature of your
response variable. What is the nature of the natural process generating
this response variable? Is it continuous or discrete? Is it univariate
or multivariate? Can it take negative and positive values? Can it take
values of zero? After you have clarified the probabilistic model for the
response variable, then you can start thinking about the mathematical
relation between the response variable and the predictors. Is it linear
or nonlinear? Are the predictors categorical or continuous?
Read the posting guide, formulate a clear question, and maybe you will
be given more specific help.
Rubén

```