[R] debug biglm response error on bigglm model

Mike Harwood harwood262 at gmail.com
Wed Jan 12 15:41:02 CET 2011


Thank you, Greg.  The issue was in the simulation logic, where one of
the values was not changing correctly for some iterations...

On Jan 10, 3:20 pm, Greg Snow <Greg.S... at imail.org> wrote:
> Not sure, but one possible candidate problem is that in your simulations one iteration ended up with fewer levels of a factor than the overall dataset and that caused the error.
>
> There is no recode function in the default packages, there are at least 6 recode functions in other packages, we cannot tell which you were using from the code below.
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.s... at imail.org
> 801.408.8111
>
>
>
>
>
> > -----Original Message-----
> > From: r-help-boun... at r-project.org [mailto:r-help-bounces at r-
> > project.org] On Behalf Of Mike Harwood
> > Sent: Monday, January 10, 2011 6:29 AM
> > To: r-h... at r-project.org
> > Subject: [R] debug biglm response error on bigglm model
>
> > G'morning
>
> > What does the error message "Error in x %*% coef(object) : non-
> > conformable arguments" indicate when calculating the response values
> > for
> > newdata with a model from bigglm (in package biglm), and how can I
> > debug it?  I am attempting to do Monte Carlo simulations, which may
> > explain the loop in the code that follows.  After the code I
> > have included the output, which shows that the simulations are
> > changing the response and input values, and that there are not any
> > atypical values for the
> > factors in the seventh iteration.  At the end of the output is the
> > aforementioned error message.  Finally, I have included the model from
> > biglm.
>
> > Thanks in advance!
>
> > Code:
> > =======
> > iter <- nrow(nov.2010)
> > predict.nov.2011 <- vector(mode='numeric', length=iter)
> > for (i in 1:iter) {
> >     iter.df <- nov.2010
> >     ##---------- Update values of dynamic variables ------------------
> >     iter.df$age <- iter.df$age + 12
> >     iter.df$pct_utilize <-
> >         iter.df$pct_utilize + mc.util.delta[i]
>
> >     iter.df$updated_varname1 <-
> >         ceiling(iter.df$updated_varname1 + mc.varname1.delta[i])
>
> >     if(iter.df$state=="WI")
> >         iter.df$varname3 <- iter.df$varname3 + mc.wi.varname3.delta[i]
> >     if(iter.df$state=="MN")
> >         iter.df$varname3 <- iter.df$varname3 + mc.mn.varname3.delta[i]
> >     if(iter.df$state=="IL")
> >         iter.df$varname3 <- iter.df$varname3 + mc.il.varname3.delta[i]
> >     if(iter.df$state=="US")
> >         iter.df$varname3 <- iter.df$varname3 + mc.us.varname3.delta[i]
>
> >     ##--- Bin Variables ------------------
> >     iter.df$bin_varname1 <- as.factor(recode(iter.df$updated_varname1,
> >         "300:499 = '300 - 499';
> >          500:549 = '500 - 549';
> >          550:599 = '550 - 599';
> >          600:649 = '600 - 649';
> >          650:699 = '650 - 699';
> >          700:749 = '700 - 749';
> >          750:799 = '750 - 799'; 800:849 = 'GE 800'; else    =
> > 'missing';
> >          "))
> >     iter.df$bin_age <- as.factor(recode(iter.df$age,
> >         "0:23   = ' < 24mo.';
> >          24:72  = '24 - 72mo.';
> >          72:300 = '72 - 300mo'; else   = 'missing';
> >          "))
> >     iter.df$bin_util <- as.factor(recode(iter.df$pct_utilize,
> >         "0.0:0.2 = '  0 - 20%';
> >          0.2:0.4 = '  20 - 40%';
> >          0.4:0.6 = '  40 - 60%';
> >          0.6:0.8 = '  60 - 80%';
> >          0.8:1.0 = ' 80 - 100%';
> >          1.0:1.2 = '100 - 120%'; else    = 'missing';
> >          "))
> >     iter.df$bin_varname2 <- as.factor(recode(iter.df$varname2_prop,
> >         "0:70 = '    < 70%';
> >          70:85 = ' 70 - 85%';
> >          85:95 = ' 85 - 95%';
> >          95:110 = '95 - 110%'; else  =  'missing';
> >          "))
> >     iter.df$bin_varname1 <- relevel(iter.df$bin_varname1, 'missing')
> >     iter.df$bin_age <- relevel(iter.df$bin_age, 'missing')
> >     iter.df$bin_util <- relevel(iter.df$bin_util, 'missing')
> >     iter.df$bin_varname2 <- relevel(iter.df$bin_varname2, 'missing')
>
> > #~     print(head(iter.df))
> >     if (i>=6 & i<=8){
> >          print('---------------------------------')
> >          browser()
> >          print(i)
> >          print(table(iter.df$bin_varname1))
> >          print(table(iter.df$bin_age))
> >          print(table(iter.df$bin_util))
> >          print(table(iter.df$bin_varname2))
> > #~         debug(predict.nov.2011[i] <-
> > #~              sum(predict(logModel.1, newdata=iter.df,
> > type='response')))
> >      }
>
> >     predict.nov.2011[i] <-
> >          sum(predict(logModel.1, newdata=iter.df, type='response'))
>
> >     print(predict.nov.2011[i])
>
> >   }
>
> > Output
> > ==========
> > [1] 36.56073
> > [1] 561.4516
> > [1] 4.83483
> > [1] 5.01398
> > [1] 7.984146
> > [1] "---------------------------------"
> > Called from: top level
> > Browse[1]>
> > [1] 6
>
> >   missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
> > 750 - 799    GE 800
> >       842       283       690      1094      1695      3404
> > 6659     18374     21562
>
> >    missing    < 24mo. 24 - 72mo. 72 - 300mo
> >         16       2997      19709      31881
>
> >    missing    0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
> > - 120%
> >      17906       4832       4599       5154       7205
> > 14865         42
>
> >   missing     < 70%  70 - 85%  85 - 95% 95 - 110%
> >     10423     19429     10568      8350      5833
> > [1] 11.04090
> > [1] "---------------------------------"
> > Called from: top level
> > Browse[1]>
> > [1] 7
>
> >   missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
> > 750 - 799
> >       847       909      1059      1586      3214      6304
> > 16349     24335
>
> >    missing    < 24mo. 24 - 72mo. 72 - 300mo
> >         16       2997      19709      31881
>
> >    missing    0 - 20%   20 - 40%   40 - 60%   60 - 80%  80 - 100% 100
> > - 120%
> >      17145       4972       4617       5020       6634
> > 16139         76
>
> >   missing     < 70%  70 - 85%  85 - 95% 95 - 110%
> >     10423     19429     10568      8350      5833
> > Error in x %*% coef(object) : non-conformable arguments
>
> > Model
> > =======
> > Large data regression model: bigglm(outcome ~ bin_varname1 +
> > bin_varname2 + bin_age + bin_util +
> >     state + varname3 + varname3:state, family = binomial(link =
> > "logit"),
> >     data = dev.data, maxit = 75, sandwich = FALSE)
> > Sample size =  1372250
>
> > ______________________________________________
> > R-h... at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guidehttp://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.- Hide quoted text -
>
> - Show quoted text -



More information about the R-help mailing list