[R] error in "predict.gam" used with "bam"

julian.bothe at elitepartner.de julian.bothe at elitepartner.de
Wed Jul 17 15:11:51 CEST 2013


Solved it!! ;)

The problem was that the test-data contained factor-levels the
training-data didn't.
So when trying to fit this new factor-levels to a model which didn't have
this levels, the error occurred.

When excluding the factor-levels not used when fitting the model or when
taking care all levels are used for modell-fitting, everything is fine.

All the best

Julian

-----Ursprüngliche Nachricht-----
Von: Simon Wood [mailto:s.wood at bath.ac.uk]
Gesendet: Dienstag, 9. Juli 2013 09:07
An: julian.bothe at elitepartner.de
Cc: r-help at r-project.org
Betreff: Re: [R] error in "predict.gam" used with "bam"

Hi Julian,

Any chance you could send me (offline) a short version of your data, which
reproduces the problem? I can't reproduce it in a quick attempt (but it is
quite puzzling, given that bam calls predict.gam internally in pretty much
the same way that you are doing here).

btw (and nothing to do with the error) given that you are using R 3.0.1
it's a good idea to upgrade to mgcv_1.7-23 or above, for the following
reason (taken from the mgcv changeLog)

1.7-23
------

*** Fix of severe bug introduced with R 2.15.2 LAPACK change. The shipped
version of dsyevr can fail to produce orthogonal eigenvectors when
uplo='U' (upper triangle of symmetric matrix used), as opposed to 'L'.
This led to a substantial number of gam smoothing parameter estimation
convergence failures, as the key stabilizing re-parameterization was
substantially degraded. The issue did not affect gaussian additive models
with GCV model selection. Other models could fail to converge any further
as soon as any smoothing parameter became `large', as happens when a
smooth is estimated as a straight line.
check.gam reported the lack of full convergence, but the issue could also
generate complete fit failures. Picked up late as full test suite had only
been run on R > 2.15.1 with an external LAPACK.

best,
Simon


On 08/07/13 10:02, julian.bothe at elitepartner.de wrote:
> Hello everyone.
>
>
>
> I am doing a logistic gam (package mgcv) on a pretty large dataframe
> (130.000 cases with 100 variables).
>
> Because of that, the gam is fitted on a random subset of 10000. Now
> when I want to predict the values for the rest of the data, I get the
> following
> error:
>
>
>
>
>
>> gam.basis_alleakti.1.pr=predict(gam.basis_alleakti.1,
>
> +
> newdata=activisale_join[gam.basis_alleakti.1.complete_cases,all.vars(g
> am.b
> asis_alleakti.1.formula)],type="response")
>
> Error in predict.gam(gam.basis_alleakti.1, newdata =
> activisale_join[gam.basis_alleakti.1.complete_cases,  :
>
>    number of items to replace is not a multiple of replacement length
>
>
>
>
>
> The following is the code:
>
> #formula with some factors and a lot of variables to be fitted
>
> gam.basis_alleakti.1.formula=as.formula( paste("verlängerung ~“,
>
>        paste( names(activisale_join)[c(2:10)], collapse="+"),
> ##factors
>
>
> paste("s(",names(activisale_join)[c(17,19:29,31:42,44)],")",
> collapse="+")) # numeric variables, all count data
>
> )
>
>
>
> # complete cases
>
> gam.basis_alleakti.1.complete_cases =
> complete.cases(activisale_join[,all.vars(gam.basis_alleakti.1.formula)
> ])
>
>
>
> # modell fitting works on random subset
>
> gam.basis_alleakti.1=bam(gam.basis_alleakti.1.formula,
>
>                           data = activisale_join[subset.10000, ],
> family=
> "binomial")
>
>
>
> # error, no idea why
>
> gam.basis_alleakti.1.pr=predict(gam.basis_alleakti.1,
> newdata=activisale_join[gam.basis_alleakti.1.complete_cases,
> ],type="response")
>
>
>
>
>
> the prediction on the same subset (subset.10000) works.
>
>
>
>
>
> It could be that this error is somewhat similar to that described as
> sidequestion in
>
> http://r.789695.n4.nabble.com/gamm-tensor-product-and-interaction-td45
> 2618 8.html, where simon answered the following:
>
>
>
> “>  Here is the error message I obtain:
>>
> vis.gam(gm1$gam,plot.type="contour",n.grid=200,color="heat",zlim=c(0,4
> ))
>>   Error in predict.gam(x, newdata = newd, se.fit = TRUE, type = type) :
> number of items to replace is not a multiple of replacement length
> - hmm, possibly a bug. I'll look into it.
>
> best,
> Simon“
>
>
>
> All the best
>
>
>
> Julian
>
>
>
> Ps.: > version
>                 _
> platform       x86_64-w64-mingw32
> arch           x86_64
> os             mingw32
> system         x86_64, mingw32
> status
> major          3
> minor          0.1
> year           2013
> month          05
> day            16
> svn rev        62743
> language       R
> version.string R version 3.0.1 (2013-05-16)
> nickname       Good Sport
>
>
>
> package mgcv version 1.7-22
>
>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


--
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283



More information about the R-help mailing list