[RsR] Is it acceptable to develope a model using a fitting procedure and then use a different fitting procedure to predict from it?

Tue Oct 1 04:15:05 CEST 2013

Dear list, I have the following problem.

I want to predict the value of a variable in nature, call it "ecological
success(ES)".

For doing that, I made some virtual simulations with which I reached a
"best model" explaining ecological success. In the simulation, ES is a
categorical variable so I use a multinomial model with the gl.multi
function to iteratively find the best combination of factors that explain
that variable in the simulations.

The model appears in R like:  ES~var1+var2:var3, so I have main terms alone
and also interactions.

I can now fit this best model using the function "multinom" from nnet
package and get the coefficients for each term in the model. Something like:

M=multinom(ES~var1+var2:var3,data)

Now, in order to predict the values in nature I would naturally use the
function predict from the same package and real data to feed the model,
like:

predict.nnet(M,realdata)

However, this gives me categorical values. Would it be there a
statistically valid way to obtain a continuous output? This is important
because gives me more power to discriminate differences in ES among
species. However simulate continuous ES is nearly impossible for my system.

For example, would it be valid to use a fitting function that assumes that
ES is continuous at  some point in the process (i.e. during the obtention
of the best model, or during the obtention of the coefficients?)

There goes some reproducible example:

ES =as.factor( sample( c("0","1","2"), 100, replace=TRUE, prob=c(0.1, 0.2,
0.65) ))
var1=  dnorm(1:100, mean = 30, sd = 20, log = FALSE)
var2=  as.numeric(ES)-var1
var3= (as.numeric(ES)-var1)/var2
simulation=data.frame(cbind(ES,var1,var2,var3))

require(glmulti)
require(nnet)
multi.multi=function(formula, data){
  multinom(paste(deparse(formula)), data = data)# to compare models with
different factors use true ML not REML
}
# find best model for ES in the simulation (may take days or not converge)
M=glmulti(
  ES~var1*var2*var3,
  data=simulation, name = "glmulti.analysis",
  intercept = TRUE, marginality = FALSE,
  level = 2, minsize = 0, maxsize = -1, minK = -1, maxK = -1,
  fitfunction=multi.multi,
  method = "g", crit = "aic", confsetsize = 100,includeobjects=TRUE
)

# determine the coefficients for the best model
M=multinom(ES~var1*var2*var3, data=simulation)
summary(M)

#"real data"
var1=  dnorm(1:3, mean = 30, sd = 20, log = FALSE)
var2= dnorm(1:3, mean = 10, sd = 20, log = FALSE)
var3= dnorm(1:3, mean = 250, sd = 20, log = FALSE)
realdata=data.frame(cbind(var1,var2,var3))

d=predict (M, realdata)# gives a lot of 1s, but  want to discriminate ES
finer.

# Would it be correct to use a permutation fitting in glmulti like:

require(lmPerm)

multi.multi=function(formula, data){
  lmp(paste(deparse(formula)), data = data)#
}

# if not, would it be correct to use a permutation procedure for fitting
the best obtained model?

M=lmp(ES~var1*var2*var3, data=simulation)
d=predict (M, realdata)# this gives a continuous ES output, but with
warning.

Thanks in advance!!

-- 
Agust�n Camacho Guerrero.
Doutor em Zoologia.
Laborat�rio de Herpetologia, Departamento de Zoologia, Instituto de
Bioci�ncias, USP.
Rua do Mat�o, trav. 14, n� 321, Cidade Universit�ria,
S�o Paulo - SP, CEP: 05508-090, Brasil.

	[[alternative HTML version deleted]]