[R] Error with text analysis data

Bill Dunlap w||||@mwdun|@p @end|ng |rom gm@||@com
Wed Apr 13 18:23:19 CEST 2022


Constant columns can be the model when you do some subsetting or are
exploring a new dataset.  My objection is that constant columns of numbers
and logicals are fine but those of characters and factors are not.

-Bill

On Wed, Apr 13, 2022 at 9:15 AM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:

> What is the goal of having a constant in the model? To me that seems
> pointless. Also there is no variability in sexCode regardless of whether
> you call it integer or factor. So the model y ~ sexCode is just a strange
> way to look at the variability in y and it would be better to do something
> like summarize(y) or mean(y) if that was the goal.
>
> Tim
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Bill Dunlap
> Sent: Wednesday, April 13, 2022 9:56 AM
> To: Neha gupta <neha.bologna90 using gmail.com>
> Cc: r-help mailing list <r-help using r-project.org>
> Subject: Re: [R] Error with text analysis data
>
> [External Email]
>
> This sounds like what I think is a bug in stats::model.matrix.default(): a
> numeric column with all identical entries is fine but a constant character
> or factor column is not.
>
> > d <- data.frame(y=1:5, sex=rep("Female",5)) d$sexFactor <-
> > factor(d$sex, levels=c("Male","Female")) d$sexCode <-
> > as.integer(d$sexFactor) d
>   y    sex sexFactor sexCode
> 1 1 Female    Female       2
> 2 2 Female    Female       2
> 3 3 Female    Female       2
> 4 4 Female    Female       2
> 5 5 Female    Female       2
> > lm(y~sex, data=d)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>   contrasts can be applied only to factors with 2 or more levels
> > lm(y~sexFactor, data=d)
> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>   contrasts can be applied only to factors with 2 or more levels
> > lm(y~sexCode, data=d)
>
> Call:
> lm(formula = y ~ sexCode, data = d)
>
> Coefficients:
> (Intercept)      sexCode
>           3           NA
>
> Calling traceback() after the error would clarify this.
>
> -Bill
>
>
> On Tue, Apr 12, 2022 at 3:12 PM Neha gupta <neha.bologna90 using gmail.com>
> wrote:
>
> > Hello everyone, I have text data with output variable have three
> subgroups.
> > I am using the following code but getting the error message (see error
> > after the code).
> >
> > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE)
> > d$REMEDIATION_FUNCTION=NULL d$DEF_REMEDIATION_GAP_MULT=NULL
> > d$REMEDIATION_BASE_EFFORT=NULL
> >
> > index <- createDataPartition(d$TYPE, p = .70,list = FALSE) tr <-
> > d[index, ] ts <- d[-index, ]
> >
> > ctrl <- trainControl(method = "cv",number=3, index = index, classProbs
> > = TRUE, summaryFunction = multiClassSummary)
> >
> > ran <- train(TYPE ~ ., data = tr,
> >                     method = "rpart",
> >                     ## Will create 48 parameter combinations
> >                     tuneLength = 3,
> >                     na.action= na.pass,
> >                     metric = "Accuracy",
> >                     preProc = c("center", "scale", "nzv"),
> >                     trControl = ctrl)
> > getTrainPerf(ran)
> >
> > *It gives me error:*
> >
> >
> > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
> > contrasts can be applied only to factors with 2 or more levels*
> >
> >
> > *My data is as follow*
> >
> > Rows: 1,819
> > Columns: 14
> > $ PLUGIN_RULE_KEY             <chr> "InsufficientBranchCoverage",
> > "InsufficientLin~
> > $ PLUGIN_CONFIG_KEY           <chr> "", "", "", "", "", "", "", "", "",
> "",
> > "S1120~
> > $ PLUGIN_NAME                 <chr> "common-java", "common-java",
> > "common-java", "~
> > $ DESCRIPTION                 <chr> "An issue is created on a file as
> soon
> > as the ~
> > $ SEVERITY                    <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR",
> > "MAJOR", "~
> > $ NAME                        <chr> "Branches should have sufficient
> > coverage by t~
> > $ DEF_REMEDIATION_FUNCTION    <chr> "LINEAR", "LINEAR", "LINEAR",
> > "LINEAR_OFFSET",~
> > $ REMEDIATION_GAP_MULT        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA,
> > NA, NA~
> > $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "",
> > "5min", "5min", "~
> > $ GAP_DESCRIPTION             <chr> "number of uncovered conditions",
> > "number of l~
> > $ SYSTEM_TAGS                 <chr> "bad-practice", "bad-practice",
> > "convention", ~
> > $ IS_TEMPLATE                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0,
> > 0, 0, 0~
> > $ DESCRIPTION_FORMAT          <chr> "HTML", "HTML", "HTML", "HTML",
> "HTML",
> > "HTML"~
> > $ TYPE                        <chr> "CODE_SMELL", "CODE_SMELL",
> > "CODE_SMELL", "COD~
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mail
> > man_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAs
> > Rzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWxo
> > RrVO5eP&s=f3IyuRfeDDjr_8UWlwyBTC5Yn4Y56QV4FjYC0GCWcVc&e=
> > PLEASE do read the posting guide
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.or
> > g_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeA
> > sRzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWx
> > oRrVO5eP&s=Vo6cRRCeqGApsiEGGtA6pndDHjOIuGFOs7BOkJMvuaw&e=
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWxoRrVO5eP&s=f3IyuRfeDDjr_8UWlwyBTC5Yn4Y56QV4FjYC0GCWcVc&e=
> PLEASE do read the posting guide
> https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=HOpL0ELxWdK0xzzVxRd_DnxukD-qPEQIBxDJnlSkAQrae1FdSHYJTfWxoRrVO5eP&s=Vo6cRRCeqGApsiEGGtA6pndDHjOIuGFOs7BOkJMvuaw&e=
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list