[R] Help : glm p-values for a factor predictor

Benoît PELE benoit.pele at acoss.fr
Thu Jun 29 15:00:18 CEST 2017


Thank you for your answer.

The used code is the next one :

champ_model<-c("y","categ_juridique","Indic_CTRLAUTRE_RPOS","Indic_CTRLAUTRE_RNEG","Indic_CTRLCCA_RPOS",
 
"Indic_CTRLCCA_RNEG","Indic_CTRLCPAP_RPOS","Indic_CTRLCPAP_RNEG","Indic_CTRLLCTI_RPOS",
 
"Indic_Changement_NomLogiciel","Indic_Changement_NomEditeur","Changt_NomEditeurPaie",
 
"Changt_NomLogicielPaie","Infoabs_NomEditeurPaie","Infoabs_NomLogicielPaie",
 
"Indic_Decla_comple","Indic_Decla_AnnuRempl","class_ape","class_Logiciel","class_Editeur",
 
"moda_delai_soldeN_1","moda_delai_soldeN_2","moda_delai_soldeN_3","moda_delai_soldeN_4",
              "moda_delai_soldeN_5",
 
"moda_anciennete_debitN_1","moda_anciennete_debitN_2","moda_anciennete_debitN_3",
              "moda_anciennete_debitN_4","moda_anciennete_debitN_5",
              "moda_moy_anciennete_debit","moda_std_anciennete_debit",
              "moda_moy_delai_solde","moda_std_delai_solde",
 
var_cluster_Arome,var_cluster_BRC,var_cluster_Cedre,var_cluster_cntx2,var_cluster_ctrl,
 
var_cluster_DADS_assiette2,var_cluster_DADS_avantage2,var_cluster_DADS_contrat2,
              var_cluster_DADS_salarie2,var_cluster_Sequoia)

--> The predictors between quotes (excepted y) are qualitative ; others 
are groups of continuous predictors

Var_model<-paste0("y ~ ", paste(champ_model_cont[-1],collapse=" + "))
Logit_appr<-glm(formula=Var_model,family=binomial(link="logit"),data=pop_ctrl_siren_cca2017_appr)

--> The results of this glm do not provide overall pvalues for the 
qualitative predictors, only one pvalue by modality. And for selecting the 
qualitative predictors, i need that overall pvalue that SAS for example 
provides with PROC LOGISTIC.

Benoit Pelé.




De :    "Bob O'Hara" <rni.boh at gmail.com>
A :     Benoît PELE <benoit.pele at acoss.fr>, 
Cc :    r-help <r-help at r-project.org>
Date :  29/06/2017 11:46
Objet : Re: [R] Help : glm p-values for a factor predictor



It might help if you provided the code you used. It's possible that
you didn't use direction="backward" in stepAIC(). Or if you did, it
was still running, so whatever else you try will still be slow. The
statement "R provides only the pvalues for each level" is wrong: look
at the anova() function.

Bob

On 29 June 2017 at 11:13, Benoît PELE <benoit.pele at acoss.fr> wrote:
> Hello,
>
> i am a newby on R and i am trying to make a backward selection on a
> binomial-logit glm on a large dataset (69000 lines for 145 predictors).
>
> After 3 days working, the stepAIC function did not terminate. I do not
> know if that is normal but i would like to try computing a "homemade"
> backward with a repeated glm ; at each step, the predictor with the max
> pvalue would be excluded until reaching a set of 20 predictors for
> example.
>
> My question is about the factor predictors with several levels. R 
provides
> only the pvalues for each level whereas i need an overall pvalue for
> testing the predictor.
>
> On internet, the only solution i found suggests to compute a Khi2
> log-likelihood test between the complete model and the model without the
> factor predictor to emphasize its relevance.
>
> Do you know other ways? Another R package managing this kind of issue?
>
> Thank you and best regards, Benoit.
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Bob O'Hara
NOTE NEW ADDRESS!!!
Institutt for matematiske fag
NTNU
7491 Trondheim
Norway

Mobile: +49 1515 888 5440
Journal of Negative Results - EEB: www.jnr-eeb.org


	[[alternative HTML version deleted]]



More information about the R-help mailing list