[R] Saving misclassified records into dataframe within a loop

David Winsemius dwinsemius at comcast.net
Fri May 13 03:18:34 CEST 2011


On May 12, 2011, at 6:49 PM, John Dennison wrote:

> It is little ugly i agree but it is acting as it should. I am trying  
> to capture the cases where the model produced a false positive but  
> only for one of the variables. ie where the model predicts "present"  
> but the case is "absent". I know this is only half of the  
> misclassifications,

23 cases out of 81

> but the inverse is not interesting to me. I just imported the logic  
> from my own application to a general case, my apologies. Take that  
> part as correct. How would we save the rows it does returns.
>

It now will run. It just won't populate a dataframe because you  
initialized it with on column. Try instead:

results<-data.frame(Kyphosis=NA, Age=NA, Number=NA, Start=NA)

You never reference 'x' so just leave it out.

The place where you use kyphosis[ c(i), ] is a bit ugly. You can just  
use kyphosis[ i, ]

And don't put the row.names in results... put the whole row if that is  
what you want.

#create output data.frame
results<-data.frame(Kyphosis=NA, Age=NA, Number=NA, Start=NA)

#misclassification index function

predict.function <- function(){
   j<-0

for (i in 1:length(kyphosis$Kyphosis)) {
if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){

  j<-j+1
results[j,]<-kyphosis[ i,]

print( kyphosis[i,])
} }
{
print(results)
save(results, file="results") } }


predict.function()

>
> Thanks,
>
> John
>
> On Thu, May 12, 2011 at 6:37 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On May 12, 2011, at 6:26 PM, John Dennison wrote:
>
> My apologies. I have transgressed the first law of posting, test  
> your code. here is an updated set this should run:
>
> library(rpart)
>
> # grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
>  method="class", data=kyphosis)
> #predict
> prediction<-predict(fit, kyphosis)
>
> #create output data.frame
> results<-as.data.frame(1)
>
>
> #misclassification index function
>
> predict.function <- function(x){
>  j<-0
>
> for (i in 1:length(kyphosis$Kyphosis)) {
> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>
> I think your next task is figuring out if this expression ,,,, which  
> you have not explained at all ... is really doing what you intend:
>
>
> (kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0
>
> I would have guessed that you might be intending:
>
>
> kyphosis$Kyphosis[i]=="absent" & prediction[i,1]==1
>
> Since it will hold about half the time:
>
> > sum(kyphosis$Kyphosis[1:81]=="absent" & prediction[1:81,1]==1)
> [1] 41
>
>
>
>
>  j<-j+1
> results[j,]<-row.names(kyphosis[c(i),])
>
> print( row.names(kyphosis[c(i),]))
> } }
> {
> print(results)
> save(results, file="results") } }
>
>
> predict.function(x)
>
>
> results
>
> output: results
>      1
>    1 1
>
>
> load("results")
>
> results
> > results
>    1
> 1   1
> 2   2
> 3   4
> 4  13
> 5  18
> 6  24
> 7  27
> 8  28
> 9  32
> 10 33
> 11 35
> 12 43
> 13 44
> 14 48
> 15 50
> 16 51
> 17 60
> 18 63
> 19 68
> 20 71
> 21 72
> 22 74
> 23 79
>
> why the two different 'results'??
>
> Thanks
>
> John Dennison
>
> On Thu, May 12, 2011 at 6:06 PM, David Winsemius <dwinsemius at comcast.net 
> > wrote:
>
> On May 12, 2011, at 5:41 PM, John Dennison wrote:
>
> Having poked the problem a couple more times it appears my issue is  
> that the
> object i save within the loop is not available after the function  
> ends. I
> have no idea why it is acting in this manner.
>
>
> library(rpart)
>
> # grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
> method="class", data=kyphosis)
> #predict
> prediction<-predict(fit, kyphosis)
>
> #misclassification index function
>
> results<-as.data.frame(1)
>
> predict.function <- function(x){
>  j<-0
> for (i in 1:length(kyphosis$Kyphosis)) {
> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
>
> j<-j+1
> results[j,]<-row.names(testing[c(i),])
>
> Are we  supposed to know where to find 'testing" (and if we cannot  
> find it, how is the R interpreter going to find it)?
>
>
>
> print( row.names(kyphosis[c(i),]))
> } }
> {
> print(results)
> save(results, file="results") } }
>
>
> i can load results from file and my out put is there. how ever if i  
> just
> type results i get the original 1. what is in the lords name is  
> occurring.
>
> Thanks
>
> John
>
>
>
> On Thu, May 12, 2011 at 1:50 PM, Phil Spector <spector at stat.berkeley.edu 
> >wrote:
>
> John -
>  In your example, the misclassified observations (as defined by
> your predict.function) will be
>
> kyphosis[kyphosis$Kyphosis == 'absent' & prediction[,1] != 1,]
>
> so you could start from there.
>                                     - Phil Spector
>                                      Statistical Computing Facility
>                                      Department of Statistics
>                                      UC Berkeley
>                                      spector at stat.berkeley.edu
>
>
>
> On Thu, 12 May 2011, John Dennison wrote:
>
> Greetings R world,
>
> I know some version of the this question has been asked before, but  
> i need
> to save the output of a loop into a data frame to eventually be  
> written to
> a
> postgres data base with dbWriteTable. Some background. I have  
> developed
> classifications models to help identify problem accounts. The logic is
> this,
> if the model classifies the record as including variable X and it  
> turns
> out
> that record does not have X then it should be reviewed(ie i need the  
> row
> number/ID saved to a database). Generally i want to look at the
> misclassified records. This is a little hack i know, anyone got a  
> better
> idea please let me know. Here is an example
>
> library(rpart)
>
> # grow tree
> fit <- rpart(Kyphosis ~ Age + Number + Start,
> method="class", data=kyphosis)
> #predict
> prediction<-predict(fit, kyphosis)
>
> #misclassification index function
>
> predict.function <- function(x){
> for (i in 1:length(kyphosis$Kyphosis)) {
> #the idea is that if the record is "absent" but the prediction is
> otherwise
> then show me that record
> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> #THIS WORKS
> print( row.names(kyphosis[c(i),]))
> }
> } }
>
> predict.function(x)
>
> Now my issue is that i want to save these id to a data.frame so i can
> later
> save them to a database. This this an incorrect approach. Can I save  
> each
> id
> to the postgres instance as it is found. i have a ignorant fear of  
> lapply,
> but it seems it may hold the key.
>
>
> Ive tried
>
> predict.function <- function(x){
> results<-as.data.frame(1)
> for (i in 1:length(kyphosis$Kyphosis)) {
> #the idea is that if the record is "absent" but the prediction is
> otherwise
> then show me that record
> if (((kyphosis$Kyphosis[i]=="absent")==(prediction[i,1]==1)) == 0 ){
> #THIS WORKS
> results[i,]<- as.data.frame(row.names(kyphosis[c(i),]))
> }
> } }
>
> this does not work. results object does not get saved. Any Help  
> would be
> greatly appreciated.
>
>
> Thanks
>
> John Dennison
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
> David Winsemius, MD
> West Hartford, CT
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list