[R] debug biglm response error on bigglm model
Greg Snow
Greg.Snow at imail.org
Mon Jan 10 22:20:51 CET 2011
Not sure, but one possible candidate problem is that in your simulations one iteration ended up with fewer levels of a factor than the overall dataset and that caused the error.
There is no recode function in the default packages, there are at least 6 recode functions in other packages, we cannot tell which you were using from the code below.
--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Mike Harwood
> Sent: Monday, January 10, 2011 6:29 AM
> To: r-help at r-project.org
> Subject: [R] debug biglm response error on bigglm model
>
> G'morning
>
> What does the error message "Error in x %*% coef(object) : non-
> conformable arguments" indicate when calculating the response values
> for
> newdata with a model from bigglm (in package biglm), and how can I
> debug it? I am attempting to do Monte Carlo simulations, which may
> explain the loop in the code that follows. After the code I
> have included the output, which shows that the simulations are
> changing the response and input values, and that there are not any
> atypical values for the
> factors in the seventh iteration. At the end of the output is the
> aforementioned error message. Finally, I have included the model from
> biglm.
>
> Thanks in advance!
>
> Code:
> =======
> iter <- nrow(nov.2010)
> predict.nov.2011 <- vector(mode='numeric', length=iter)
> for (i in 1:iter) {
> iter.df <- nov.2010
> ##---------- Update values of dynamic variables ------------------
> iter.df$age <- iter.df$age + 12
> iter.df$pct_utilize <-
> iter.df$pct_utilize + mc.util.delta[i]
>
> iter.df$updated_varname1 <-
> ceiling(iter.df$updated_varname1 + mc.varname1.delta[i])
>
> if(iter.df$state=="WI")
> iter.df$varname3 <- iter.df$varname3 + mc.wi.varname3.delta[i]
> if(iter.df$state=="MN")
> iter.df$varname3 <- iter.df$varname3 + mc.mn.varname3.delta[i]
> if(iter.df$state=="IL")
> iter.df$varname3 <- iter.df$varname3 + mc.il.varname3.delta[i]
> if(iter.df$state=="US")
> iter.df$varname3 <- iter.df$varname3 + mc.us.varname3.delta[i]
>
> ##--- Bin Variables ------------------
> iter.df$bin_varname1 <- as.factor(recode(iter.df$updated_varname1,
> "300:499 = '300 - 499';
> 500:549 = '500 - 549';
> 550:599 = '550 - 599';
> 600:649 = '600 - 649';
> 650:699 = '650 - 699';
> 700:749 = '700 - 749';
> 750:799 = '750 - 799'; 800:849 = 'GE 800'; else =
> 'missing';
> "))
> iter.df$bin_age <- as.factor(recode(iter.df$age,
> "0:23 = ' < 24mo.';
> 24:72 = '24 - 72mo.';
> 72:300 = '72 - 300mo'; else = 'missing';
> "))
> iter.df$bin_util <- as.factor(recode(iter.df$pct_utilize,
> "0.0:0.2 = ' 0 - 20%';
> 0.2:0.4 = ' 20 - 40%';
> 0.4:0.6 = ' 40 - 60%';
> 0.6:0.8 = ' 60 - 80%';
> 0.8:1.0 = ' 80 - 100%';
> 1.0:1.2 = '100 - 120%'; else = 'missing';
> "))
> iter.df$bin_varname2 <- as.factor(recode(iter.df$varname2_prop,
> "0:70 = ' < 70%';
> 70:85 = ' 70 - 85%';
> 85:95 = ' 85 - 95%';
> 95:110 = '95 - 110%'; else = 'missing';
> "))
> iter.df$bin_varname1 <- relevel(iter.df$bin_varname1, 'missing')
> iter.df$bin_age <- relevel(iter.df$bin_age, 'missing')
> iter.df$bin_util <- relevel(iter.df$bin_util, 'missing')
> iter.df$bin_varname2 <- relevel(iter.df$bin_varname2, 'missing')
>
> #~ print(head(iter.df))
> if (i>=6 & i<=8){
> print('---------------------------------')
> browser()
> print(i)
> print(table(iter.df$bin_varname1))
> print(table(iter.df$bin_age))
> print(table(iter.df$bin_util))
> print(table(iter.df$bin_varname2))
> #~ debug(predict.nov.2011[i] <-
> #~ sum(predict(logModel.1, newdata=iter.df,
> type='response')))
> }
>
> predict.nov.2011[i] <-
> sum(predict(logModel.1, newdata=iter.df, type='response'))
>
> print(predict.nov.2011[i])
>
> }
>
> Output
> ==========
> [1] 36.56073
> [1] 561.4516
> [1] 4.83483
> [1] 5.01398
> [1] 7.984146
> [1] "---------------------------------"
> Called from: top level
> Browse[1]>
> [1] 6
>
> missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
> 750 - 799 GE 800
> 842 283 690 1094 1695 3404
> 6659 18374 21562
>
> missing < 24mo. 24 - 72mo. 72 - 300mo
> 16 2997 19709 31881
>
> missing 0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100
> - 120%
> 17906 4832 4599 5154 7205
> 14865 42
>
> missing < 70% 70 - 85% 85 - 95% 95 - 110%
> 10423 19429 10568 8350 5833
> [1] 11.04090
> [1] "---------------------------------"
> Called from: top level
> Browse[1]>
> [1] 7
>
> missing 300 - 499 500 - 549 550 - 599 600 - 649 650 - 699 700 - 749
> 750 - 799
> 847 909 1059 1586 3214 6304
> 16349 24335
>
> missing < 24mo. 24 - 72mo. 72 - 300mo
> 16 2997 19709 31881
>
> missing 0 - 20% 20 - 40% 40 - 60% 60 - 80% 80 - 100% 100
> - 120%
> 17145 4972 4617 5020 6634
> 16139 76
>
> missing < 70% 70 - 85% 85 - 95% 95 - 110%
> 10423 19429 10568 8350 5833
> Error in x %*% coef(object) : non-conformable arguments
>
> Model
> =======
> Large data regression model: bigglm(outcome ~ bin_varname1 +
> bin_varname2 + bin_age + bin_util +
> state + varname3 + varname3:state, family = binomial(link =
> "logit"),
> data = dev.data, maxit = 75, sandwich = FALSE)
> Sample size = 1372250
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list