[R] Error with text analysis data

Jim Lemon drj|m|emon @end|ng |rom gm@||@com
Wed Apr 13 11:24:45 CEST 2022


Hi Neha,
The suggestion I made was to try stringsAsFactors=TRUE, although I
will be surprised if it solves your problem.
CSV means "Comma Separated Variables". The following examples are
valid CSV formats:

Date,Temperature,Humidity
13/04/2022,18,87

Country,PrimeMinister,Party
Australia,Morrison,Liberal

You could read in the second example as character OR factor type,
depending upon the setting of stringsAsFactors=

Jim

On Wed, Apr 13, 2022 at 7:05 PM Neha gupta <neha.bologna90 using gmail.com> wrote:
>
> Thank you Jim
>
> So what solution you do suggest? The features are text so it doesn't look like a csv format.
>
> Best regards
>
> On Wednesday, April 13, 2022, Jim Lemon <drjimlemon using gmail.com> wrote:
>>
>> Hi Neha,
>> The error message is about not having _factors_ with two or more
>> levels. Apart from using stringsAsFactors=FALSE (meaning that you
>> probably won't get any factors in "d"), your sample data doesn't look
>> like CSV format. Perhaps the lines have been truncated. You may get
>> something with stringsAsFactors=TRUE, but I don't know whether it will
>> be sensibler.
>>
>> Jim
>>
>> On Wed, Apr 13, 2022 at 8:12 AM Neha gupta <neha.bologna90 using gmail.com> wrote:
>> >
>> > Hello everyone, I have text data with output variable have three subgroups.
>> > I am using the following code but getting the error message (see error
>> > after the code).
>> >
>> > d=read.csv("SONAR_RULES.csv", stringsAsFactors = FALSE)
>> > d$REMEDIATION_FUNCTION=NULL
>> > d$DEF_REMEDIATION_GAP_MULT=NULL
>> > d$REMEDIATION_BASE_EFFORT=NULL
>> >
>> > index <- createDataPartition(d$TYPE, p = .70,list = FALSE)
>> > tr <- d[index, ]
>> > ts <- d[-index, ]
>> >
>> > ctrl <- trainControl(method = "cv",number=3, index = index, classProbs =
>> > TRUE, summaryFunction = multiClassSummary)
>> >
>> > ran <- train(TYPE ~ ., data = tr,
>> >                     method = "rpart",
>> >                     ## Will create 48 parameter combinations
>> >                     tuneLength = 3,
>> >                     na.action= na.pass,
>> >                     metric = "Accuracy",
>> >                     preProc = c("center", "scale", "nzv"),
>> >                     trControl = ctrl)
>> > getTrainPerf(ran)
>> >
>> > *It gives me error:*
>> >
>> >
>> > *Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) :
>> > contrasts can be applied only to factors with 2 or more levels*
>> >
>> >
>> > *My data is as follow*
>> >
>> > Rows: 1,819
>> > Columns: 14
>> > $ PLUGIN_RULE_KEY             <chr> "InsufficientBranchCoverage",
>> > "InsufficientLin~
>> > $ PLUGIN_CONFIG_KEY           <chr> "", "", "", "", "", "", "", "", "", "",
>> > "S1120~
>> > $ PLUGIN_NAME                 <chr> "common-java", "common-java",
>> > "common-java", "~
>> > $ DESCRIPTION                 <chr> "An issue is created on a file as soon
>> > as the ~
>> > $ SEVERITY                    <chr> "MAJOR", "MAJOR", "MAJOR", "MAJOR",
>> > "MAJOR", "~
>> > $ NAME                        <chr> "Branches should have sufficient
>> > coverage by t~
>> > $ DEF_REMEDIATION_FUNCTION    <chr> "LINEAR", "LINEAR", "LINEAR",
>> > "LINEAR_OFFSET",~
>> > $ REMEDIATION_GAP_MULT        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
>> > NA, NA~
>> > $ DEF_REMEDIATION_BASE_EFFORT <chr> "", "", "", "10min", "", "", "5min",
>> > "5min", "~
>> > $ GAP_DESCRIPTION             <chr> "number of uncovered conditions",
>> > "number of l~
>> > $ SYSTEM_TAGS                 <chr> "bad-practice", "bad-practice",
>> > "convention", ~
>> > $ IS_TEMPLATE                 <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
>> > 0, 0, 0~
>> > $ DESCRIPTION_FORMAT          <chr> "HTML", "HTML", "HTML", "HTML", "HTML",
>> > "HTML"~
>> > $ TYPE                        <chr> "CODE_SMELL", "CODE_SMELL",
>> > "CODE_SMELL", "COD~
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list