[R] Prediction from a rank deficient fit may be misleading
Michael Artz
michaeleartz at gmail.com
Thu Mar 10 23:21:31 CET 2016
Here is the results of the logistic regression model. Is it because of the
NA values?
Call:
glm(formula = TARGET_A ~ Contract + Dependents + DeviceProtection +
gender + InternetService + MonthlyCharges + MultipleLines +
OnlineBackup + OnlineSecurity + PaperlessBilling + Partner +
PaymentMethod + PhoneService + SeniorCitizen + StreamingMovies +
StreamingTV + TechSupport + tenure + TotalCharges, family =
binomial(link = "logit"),
data = churn_training)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.8943 -0.6867 -0.2863 0.7378 3.4259
Coefficients: (7 not defined because of singularities)
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.0664928 1.7195494 0.620 0.5351
ContractOne year -0.6874005 0.1314227 -5.230 1.69e-07
***
ContractTwo year -1.2775385 0.2101193 -6.080 1.20e-09
***
DependentsYes -0.1485301 0.1095348 -1.356 0.1751
DeviceProtectionNo internet service -1.5547306 0.9661837 -1.609 0.1076
DeviceProtectionYes 0.0459115 0.2114253 0.217 0.8281
genderMale -0.0350970 0.0776896 -0.452 0.6514
InternetServiceFiber optic 1.4800374 0.9545398 1.551 0.1210
InternetServiceNo NA NA NA NA
MonthlyCharges -0.0324614 0.0379646 -0.855 0.3925
MultipleLinesNo phone service 0.0808745 0.7736359 0.105 0.9167
MultipleLinesYes 0.3990450 0.2131343 1.872 0.0612
.
OnlineBackupNo internet service NA NA NA NA
OnlineBackupYes -0.0328892 0.2081145 -0.158 0.8744
OnlineSecurityNo internet service NA NA NA NA
OnlineSecurityYes -0.2760602 0.2132917 -1.294 0.1956
PaperlessBillingYes 0.3509944 0.0890884 3.940 8.15e-05
***
PartnerYes 0.0306815 0.0940650 0.326 0.7443
PaymentMethodCredit card (automatic) -0.0710923 0.1377252 -0.516 0.6057
PaymentMethodElectronic check 0.3074078 0.1137939 2.701 0.0069
**
PaymentMethodMailed check -0.0201076 0.1377539 -0.146 0.8839
PhoneServiceYes NA NA NA NA
SeniorCitizen 0.1856454 0.1023527 1.814 0.0697
.
StreamingMoviesNo internet service NA NA NA NA
StreamingMoviesYes 0.5260087 0.3899615 1.349 0.1774
StreamingTVNo internet service NA NA NA NA
StreamingTVYes 0.4781321 0.3905777 1.224 0.2209
TechSupportNo internet service NA NA NA NA
TechSupportYes -0.2511197 0.2181612 -1.151 0.2497
tenure -0.0702813 0.0077113 -9.114 < 2e-16
***
TotalCharges 0.0004276 0.0000874 4.892 9.97e-07
***
On Thu, Mar 10, 2016 at 4:05 PM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> > On Mar 10, 2016, at 8:08 AM, Michael Artz <michaeleartz at gmail.com>
> wrote:
> >
> > HI all,
> > I have the following error -
> >> resultVector <- predict(logitregressmodel, dataset1, type='response')
> > Warning message:
> > In predict.lm(object, newdata, se.fit, scale = 1, type = ifelse(type ==
> :
> > prediction from a rank-deficient fit may be misleading
>
> It wasn't an R error. It was an R warning. Was the `summary` output on
> logitregressmodel informative? Does the resultVector look sensible given
> its inputs?
>
>
> > I have seen on internet that there may be some collinearity in the data
> and
> > this is causing that. How can I be sure?
>
> Do some diagnostics. After looking carefully at the output of
> summary(logitregressmodel) and perhaps summary(dataset1) if it was the
> original input to the modeling functions, and then you could move on to
> looking at cross-correlations on things you think are continuous and
> crosstabs on factor variables and the condition number on the full data
> matrix.
>
> Lots of stuff turns up on search for "detecting collinearity condition
> number in r"
>
> >
> > Thanks
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list