[R] Error in Rose Method (class balancing)

David Winsemius dw|n@em|u@ @end|ng |rom comc@@t@net
Sat Jul 25 08:48:52 CEST 2020


On 7/24/20 3:08 AM, Neha gupta wrote:
> Ohhhh, I am very sorry for that, I have now included
>
> output of dput is: structure(list(unique_id = c("L116", "L117", 
> "L496", "L9719",
> "L9720", "L9721", "L9722", "L9723", "L10200", "L10201", "L10202",
> "L10203", "L10204", "L10205", "L10206", "L10705", "L10706", "L10707",
> "L10708", "L10709", "L10710", "L10711", "L10712", "L10713", "L10714",
> "L10715", "L10716", "L10717", "L10718", "L13486"), McCC = c(6,
> 40, 115, 12, 14, 1, 56, 17, 1, 22, 24, 3, 59, 67, 11, 30, 1,
> 16, 1, 18, 4, 4, 1, 44, 1, 18, 40, 54, 1, 23), CLOC = c(0, 0,
> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0, 0, 0, 0), LLOC = c(52, 276, 663, 73, 82, 28, 318,
> 167, 50, 110, 98, 22, 374, 532, 39, 266, 67, 198, 37, 84, 63,
> 68, 4, 372, 58, 97, 290, 318, 8, 90), `Number of previous fixes` = c(1,
> 2, 6, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 1,
> 0, 1, 0, 0, 0, 1, 0, 0), `Number of previous modifications` = c(19,
> 58, 195, 50, 22, 11, 43, 47, 25, 14, 24, 10, 53, 97, 13, 58,
> 22, 94, 23, 51, 34, 18, 19, 75, 47, 28, 79, 96, 4, 10), `Number of 
> committers` = c(3,
> 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2, 2, 1, 3, 2, 2,
> 1, 2, 2, 2, 2, 3, 1, 1), `Number of developer commits` = c(1843,
> 1843, 1843, 1300, 1843, 1843, 1843, 1843, 1843, 1843, 1843, 1843,
> 1843, 1843, 1843, 1843, 1843, 1843, 1843, 1843, 1843, 1843, 1843,
> 1843, 1843, 1843, 1843, 1843, 1843, 1843), `Bug class` = structure(c(2L,
> 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
> 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("true",
> "false"), class = "factor")), row.names = c(NA, 30L), class = 
> "data.frame")


I suggest this pre-processing step:


names(d) <- gsub("\\s", "", names(d) )

# then add `library(ROSE)`


# and rerun. Some packages are not adept at handling non-standard column 
names.

-- 

David

>
> library(caret)
> library(farff)
> library(DMwR)
> library(pROC)
> library(pls)
>
> setwd("C:/Users/PC/Documents")
> d=readARFF("bughunter.arff")
> dput( head( d, 30 ) )
>
> index <- createDataPartition(d$`Bug class`, p = .70,list = FALSE)
>
> tr <- d[index, ]
>
> ts <- d[-index, ]
>
> boot3 <- trainControl(method = "repeatedcv", number=10, 
> repeats=10,classProbs = TRUE,verboseIter = FALSE,
>
> summaryFunction = twoClassSummary, sampling = "rose")
>
> set.seed(30218)
>
> ct <- train(`Bug class` ~ ., data = tr, method = "pls", metric = 
> "AUC", preProc = c("center", "scale", "nzv"), trControl = boot3)
>
> getTrainPerf(ct)
>
>
> On Thu, Jul 23, 2020 at 11:50 PM Jeff Newmiller 
> <jdnewmil using dcn.davis.ca.us <mailto:jdnewmil using dcn.davis.ca.us>> wrote:
>
>     All you did was include the dput command in your example. We need
>     the output of dput, not the command itself.
>
>     On July 23, 2020 2:43:31 PM PDT, Neha gupta
>     <neha.bologna90 using gmail.com <mailto:neha.bologna90 using gmail.com>> wrote:
>     >David, I understand that the file will not be in your directory but I
>     >have
>     >provided the data using dput? Didn't I? Previously members of this
>     >group
>     >have used dput to provide the detail about their data. Seriously, I
>     >have no
>     >idea how else I can provide a reproducible example.
>     >
>     >
>     >
>     ><https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
>     >Virus-free.
>     >www.avast.com <
>     ><https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>     ><#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>     >
>     >On Thu, Jul 23, 2020 at 10:47 PM David Winsemius
>     ><dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>>
>     >wrote:
>     >
>     >>
>     >> On 7/23/20 9:34 AM, Neha gupta wrote:
>     >>
>     >>
>     >> Hello David, file not found should be the path problem I guess. I
>     >just
>     >> forgot the pROC library, which I included here. These are all the
>     >libraries
>     >> I am using.
>     >>
>     >> library(caret)
>     >> library(farff)
>     >> library(DMwR)
>     >> library(pROC)
>     >> library(pls)
>     >>
>     >> setwd("C:/Users/PC/Documents")
>     >> d=readARFF("bughunter.arff")
>     >>
>     >>
>     >> I suppose *you* might have such a file in that directory, but
>     do you
>     >> assume that *we* will????
>     >>
>     >> A reproducible example will allow others to run your code. Seems
>     >fairly
>     >> clear that we are not there yet.
>     >>
>     >> --
>     >>
>     >> David.
>     >>
>     >> dput( head( d, 30 ) )
>     >>
>     >> index <- createDataPartition(d$`Bug class`, p = .70,list = FALSE)
>     >>
>     >> tr <- d[index, ]
>     >>
>     >> ts <- d[-index, ]
>     >>
>     >> boot3 <- trainControl(method = "repeatedcv", number=10,
>     >> repeats=10,classProbs = TRUE,verboseIter = FALSE,
>     >>
>     >> summaryFunction = twoClassSummary, sampling = "rose")
>     >>
>     >> set.seed(30218)
>     >>
>     >> ct <- train(`Bug class` ~ ., data = tr, method = "pls", metric =
>     >"AUC", preProc
>     >> = c("center", "scale", "nzv"), trControl = boot3)
>     >>
>     >> getTrainPerf(ct)
>     >>
>     >>
>     >>
>     >>
>     ><https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
>     >Virus-free.
>     >> www.avast.com <
>     >>
>     ><https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>     >>
>     >> On Thu, Jul 23, 2020 at 4:01 PM Neha gupta
>     <neha.bologna90 using gmail.com <mailto:neha.bologna90 using gmail.com>>
>     >> wrote:
>     >>
>     >>>
>     >>> Hello David, thanks for your reply. I have added the information.
>     >>>
>     >>> library(caret)
>     >>> library(farff)
>     >>> library(DMwR)
>     >>>
>     >>> d=readARFF("bughunter.arff")
>     >>> dput( head( d, 30 ) )
>     >>>
>     >>> index <- createDataPartition(d$`Bug class`, p = .70,list = FALSE)
>     >>>
>     >>> tr <- d[index, ]
>     >>>
>     >>> ts <- d[-index, ]
>     >>>
>     >>> boot3 <- trainControl(method = "repeatedcv", number=10,
>     >>> repeats=10,classProbs = TRUE,verboseIter = FALSE,
>     >>>
>     >>> summaryFunction = twoClassSummary, sampling = "rose")
>     >>>
>     >>> set.seed(30218)
>     >>>
>     >>> ct <- train(`Bug class` ~ ., data = tr, method = "pls", metric =
>     >"AUC", preProc
>     >>> = c("center", "scale", "nzv"), trControl = boot3)
>     >>>
>     >>> getTrainPerf(ct)
>     >>>
>     >>> On Thu, Jul 23, 2020 at 1:08 AM David Winsemius
>     ><dwinsemius using comcast.net <mailto:dwinsemius using comcast.net>>
>     >>> wrote:
>     >>>
>     >>>>
>     >>>> On 7/22/20 3:43 PM, Neha gupta wrote:
>     >>>> > Hello,
>     >>>> >
>     >>>> >
>     >>>> > I get the following error when I use the ROSE class balancing
>     >method
>     >>>> but
>     >>>> > when I use other methods like SMOTE, up, down, I do not get any
>     >error
>     >>>> > message.
>     >>>> >
>     >>>> >
>     >>>> > Something is wrong; all the ROC metric values are missing:
>     >>>> >
>     >>>> > ROC Sens Spec
>     >>>> >
>     >>>> > Min. : NA Min. : NA Min. : NA
>     >>>> >
>     >>>> > 1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
>     >>>> >
>     >>>> > Median : NA Median : NA Median : NA
>     >>>> >
>     >>>> > Mean :NaN Mean :NaN Mean :NaN
>     >>>> >
>     >>>> > 3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
>     >>>> >
>     >>>> > Max. : NA Max. : NA Max. : NA
>     >>>> >
>     >>>> >
>     >>>> >
>     >>>> > library(DMwR)
>     >>>> >
>     >>>> > d=readARFF("bughunter.arff")
>     >>>>
>     >>>> After installing that package and loading pkg:DMwR I get:
>     >>>>
>     >>>>
>     >>>> Error in readARFF("bughunter.arff") : could not find function
>     >"readARFF"
>     >>>>
>     >>>>
>     >>>> Since you also posted in HTML, I suggest you read the Posting
>     >Guide,
>     >>>> restart and R session and post a reproducible example that loads
>     >all
>     >>>> needed packages and data.
>     >>>>
>     >>>> --
>     >>>>
>     >>>> David.
>     >>>>
>     >>>> >
>     >>>> > index <- createDataPartition(d$`Bug class`, p = .70,list =
>     FALSE)
>     >>>> >
>     >>>> > tr <- d[index, ]
>     >>>> >
>     >>>> > ts <- d[-index, ]
>     >>>> >
>     >>>> > boot3 <- trainControl(method = "repeatedcv", number=10,
>     >>>> > repeats=10,classProbs = TRUE,verboseIter = FALSE,
>     >>>> >
>     >>>> > summaryFunction = twoClassSummary, sampling = "rose")
>     >>>> >
>     >>>> > set.seed(30218)
>     >>>> >
>     >>>> > ct <- train(`Bug class` ~ ., data = tr,
>     >>>> >
>     >>>> > method = "pls",
>     >>>> >
>     >>>> > metric = "AUC",
>     >>>> >
>     >>>> > preProc = c("center", "scale", "nzv"),
>     >>>> >
>     >>>> > trControl = boot3)
>     >>>> >
>     >>>> > getTrainPerf(ct)
>     >>>> >
>     >>>> >       [[alternative HTML version deleted]]
>     >>>> >
>     >>>> > ______________________________________________
>     >>>> > R-help using r-project.org <mailto:R-help using r-project.org> mailing
>     list -- To UNSUBSCRIBE and more, see
>     >>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>     >>>> > PLEASE do read the posting guide
>     >>>> http://www.R-project.org/posting-guide.html
>     >>>> > and provide commented, minimal, self-contained, reproducible
>     >code.
>     >>>>
>     >>>
>     >
>     >       [[alternative HTML version deleted]]
>     >
>     >______________________________________________
>     >R-help using r-project.org <mailto:R-help using r-project.org> mailing list
>     -- To UNSUBSCRIBE and more, see
>     >https://stat.ethz.ch/mailman/listinfo/r-help
>     >PLEASE do read the posting guide
>     >http://www.R-project.org/posting-guide.html
>     >and provide commented, minimal, self-contained, reproducible code.
>
>     -- 
>     Sent from my phone. Please excuse my brevity.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list