[R] Saving misclassified records into dataframe within a loop

William Dunlap wdunlap at tibco.com
Fri May 13 01:47:10 CEST 2011


Your question concerned how to return data from a function.
It looks like you are using the following idiom
to save the data a function generates:
  f <- function() {
     result <- ... some calculations ...
     save(result, file="result.Rdata")
  }
  load("result.Rdata")
  ... now you will find a dataset called "result" ...
The save call stores f's local dataset called 'result' in
a file and the load call loads the data from the file into
a dataset also called result but in a different frame
(the frame of the caller of f, not f's frame).

Don't use save() and load() for this sort of thing.
It will mystify people reading your code and make the
code difficult to reuse.

Instead return the value of f's result from f and
use the assignment operator when calling f to store
that return value in the caller's frame:
  f <- function() {
     fResult <- ... some calculations ...
     fResult # the return value of f
  }
  result <- f()
When f is finished all variables in it disappear and its
return value is passed back to its caller, who can name it or
use it directly in another function call.

You didn't ask about the following, but the code
  results <- as.data.frame(1)
  j <- 0
  for (i in 1:length(kyphosis$Kyphosis)) {
    if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
      j <- j+1
      results[j,] <- row.names(kyphosis[c(i),])
    }
  }
may be written without the for loop as
  isMisclassified <- ((kyphosis$Kyphosis=="absent") ==
(prediction[,1]==1)) == 0
  results <- data.frame("1" = rownames(kyphosis)[isMisclassified],
check.names=FALSE, stringsAsFactors=FALSE)
Note the the isMisclassified<- line is your line with
the subscripts 'i' taken out, as we want to evaluate the condition for
all i.
I find the intent of that easier to understand than that
of the code in the for loop.

I don't know why you want 'results' to be a data.frame instead
of a simple character vector; the expression
  rownames(kyphosis)[isMisclassified]
would give you that.

Also, since 'i' is an integer,
  c(i)
is just a long-winded way of saying
  i

The test
  logicalValue == 0
really ought to have the same type of data on both sides
of the ==, as in
  logicalValue == FALSE
or, even better in this case,
  !logicalValue # bang means not
or, since logicalValue is x==y you could replace !(x==y) with
  x != y
so the following is equivalent to what you wrote
  isMisclassified <- (kyphosis$Kyphosis=="absent") !=
(prediction[,1]==1)
(and, in my opinion, the latter is easier to understand).

Finally, you defined a function of one argument, x, and didn't
use the argument.  Functions don't need arguments,
   f <- function() {
      ....
   }
would do just as well.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com  

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of John Dennison
> Sent: Thursday, May 12, 2011 2:41 PM
> To: r-help at r-project.org
> Subject: Re: [R] Saving misclassified records into dataframe 
> within a loop
> 
> Having poked the problem a couple more times it appears my 
> issue is that the
> object i save within the loop is not available after the 
> function ends. I
> have no idea why it is acting in this manner.
> 
> 
> library(rpart)
> 
> # grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
>  method="class", data=kyphosis)
> #predict
> prediction<-predict(fit, kyphosis)
> 
> #misclassification index function
> 
> results<-as.data.frame(1)
> 
> predict.function <- function(x){
>   j<-0
> for (i in 1:length(kyphosis$Kyphosis)) {
> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> 
>  j<-j+1
> results[j,]<-row.names(testing[c(i),])
> print( row.names(kyphosis[c(i),]))
> } }
> {
> print(results)
> save(results, file="results") } }
> 
> 
> i can load results from file and my out put is there. how 
> ever if i just
> type results i get the original 1. what is in the lords name 
> is occurring.
> 
> Thanks
> 
> John
> 
> 
> 
> On Thu, May 12, 2011 at 1:50 PM, Phil Spector 
> <spector at stat.berkeley.edu>wrote:
> 
> > John -
> >   In your example, the misclassified observations (as defined by
> > your predict.function) will be
> >
> >  kyphosis[kyphosis$Kyphosis == 'absent' & prediction[,1] != 1,]
> >
> > so you could start from there.
> >                                        - Phil Spector
> >                                         Statistical 
> Computing Facility
> >                                         Department of Statistics
> >                                         UC Berkeley
> >                                         spector at stat.berkeley.edu
> >
> >
> >
> > On Thu, 12 May 2011, John Dennison wrote:
> >
> >  Greetings R world,
> >>
> >> I know some version of the this question has been asked 
> before, but i need
> >> to save the output of a loop into a data frame to 
> eventually be written to
> >> a
> >> postgres data base with dbWriteTable. Some background. I 
> have developed
> >> classifications models to help identify problem accounts. 
> The logic is
> >> this,
> >> if the model classifies the record as including variable X 
> and it turns
> >> out
> >> that record does not have X then it should be reviewed(ie 
> i need the row
> >> number/ID saved to a database). Generally i want to look at the
> >> misclassified records. This is a little hack i know, 
> anyone got a better
> >> idea please let me know. Here is an example
> >>
> >> library(rpart)
> >>
> >> # grow tree
> >> fit <- rpart(Kyphosis ~ Age + Number + Start,
> >>  method="class", data=kyphosis)
> >> #predict
> >> prediction<-predict(fit, kyphosis)
> >>
> >> #misclassification index function
> >>
> >> predict.function <- function(x){
> >> for (i in 1:length(kyphosis$Kyphosis)) {
> >> #the idea is that if the record is "absent" but the prediction is
> >> otherwise
> >> then show me that record
> >> if 
> (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> >>  #THIS WORKS
> >> print( row.names(kyphosis[c(i),]))
> >> }
> >> } }
> >>
> >> predict.function(x)
> >>
> >> Now my issue is that i want to save these id to a 
> data.frame so i can
> >> later
> >> save them to a database. This this an incorrect approach. 
> Can I save each
> >> id
> >> to the postgres instance as it is found. i have a ignorant 
> fear of lapply,
> >> but it seems it may hold the key.
> >>
> >>
> >> Ive tried
> >>
> >> predict.function <- function(x){
> >> results<-as.data.frame(1)
> >> for (i in 1:length(kyphosis$Kyphosis)) {
> >> #the idea is that if the record is "absent" but the prediction is
> >> otherwise
> >> then show me that record
> >> if 
> (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> >>  #THIS WORKS
> >> results[i,]<- as.data.frame(row.names(kyphosis[c(i),]))
> >> }
> >> } }
> >>
> >> this does not work. results object does not get saved. Any 
> Help would be
> >> greatly appreciated.
> >>
> >>
> >> Thanks
> >>
> >> John Dennison
> >>
> >>        [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list