[R] How to detect and exclude outliers in R?
guygreen at netvigator.com
Thu Feb 18 10:58:25 CET 2010
I had a similar problem. In my case, I had a large table of data and wanted
to find and exclude a single huge value in one column (i.e. remove the
entire row). There were thousands of rows of data, and this single value
was more than 3x the next value, and at least 30x the typical value. I
wanted to see what the effect of removing that one datapoint was, without
having to change the underlying data.
This finds & removes that one value. I assume it could be repeated to get
rid of more values based on pre-defined criteria:
First, load the "outliers" package.
outlier_tf = outlier(data_full$target column,logical=TRUE)
#This gives an array with all values False, except for the outlier (as
defined in the package documentation "Finds value with largest difference
between it and sample mean, which can be an outlier"). That value is
returned as True.
find_outlier = which(outlier_tf==TRUE,arr.ind=TRUE)
#This finds the location of the outlier by finding that "True" value within
the "outlier_tf" array.
data_new = data_full[-find_outlier,]
#This creates a new dataset based on the old data, removing the one row that
contains the outlier
> Suppose I am reading data from a file and the data contains some outliers.
> I want to know if it is possible in R to automatically detect outliers in
> a dataset and remove them
View this message in context: http://n4.nabble.com/How-to-detect-and-exclude-outliers-in-R-tp1017285p1559883.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help