[R] Removing Outliers Function

kirtau kirtau at live.com
Wed Feb 9 03:11:36 CET 2011


I am working on a function that will remove outliers for regression analysis.
I am stating that a data point is an outlier if its studentized residual is
above or below 3 and -3, respectively. The code below is what i have thus
far for the function

x = c(1:20)
y = c(1,3,4,2,5,6,18,8,10,8,11,13,14,14,15,85,17,19,19,20)
data1 = data.frame(x,y)

 
rm.outliers = function(dataset,dependent,independent){
    dataset$predicted = predict(lm(dependent~independent))
    dataset$stdres = rstudent(lm(dependent~independent))
    m = 1
    for(i in 1:length(dataset$stdres)){
      dataset$outlier_counter[i] = if(dataset$stdres[i] >= 3 |
dataset$stdres[i] <= -3) {m} else{0}
    }
    j = length(which(dataset$outlier_counter >= 1))
    while(j>=1){
      print(dataset[which(dataset$outlier_counter >= 1),])
      dataset = dataset[which(dataset$outlier_counter == 0),]
      dataset$predicted = predict(lm(dependent~independent))
      dataset$stdres = rstudent(lm(dependent~independent))
        m = m+1
        for(k in 1:length(dataset$stdres)){
          dataset$outlier_counter[k] = if(dataset$stdres[k] >= 3 |
dataset$stdres[k] <= -3) {m} else{0}
        }
      j = length(which(dataset$outlier_counter >= 1))
    }
    return(dataset)
}

The problem that I run into is that i receive this error when i type 

rm.outliers(data1,data1$y,data1$x)

"    x  y predicted   stdres outlier_counter
16 16 85  22.98647 24.04862               1
Error in `$<-.data.frame`(`*tmp*`, "predicted", value = c(0.114285714285714, 
: 
  replacement has 20 rows, data has 19"

Note: the outlier_counter variable is used to state which "round" of the
loop the datapoint was marked as an outlier.

This would be a HUGE help to me and a few buddies who run a lot of different
regression tests.

Thanks, and if the question is still confusing please ask

 

-----
- AK
-- 
View this message in context: http://r.789695.n4.nabble.com/Removing-Outliers-Function-tp3293395p3293395.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list