[R] grubbs test to detect all outliers
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Sat Apr 29 14:05:48 CEST 2023
Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:
> *R: *Grubbs Test to detect all outliers Per group for all columns in a data
> frame
>
>
>
> Dear All: good morning
>
> I have a dataset (as an example) with two column factors (factor1 and
> factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have same
> length as factor1; and Z, U, and V have same length as factor2. Please see
> dataset is copied below. Please note that all dataset columns have NAs
> values.
>
> *Need help on this:*
>
>
> Can we use the grubbs.test() function to detect all outliers and replace it
> by NA in X and Y datasets per group in factor1; and in Z, U, and V datasets
> per group in factor2. Columns in the dataframe have different lengths, but
> when I read the .csv file, R added NA values for the shorter columns.
>
> If you need the .csv data file, please let me know.
>
>
> Thank you very much for your help in advance.
>
>
>
>
> install.packages("outliers")
> library(outliers)
>
> datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
> datafortest
>
> datafortest<-data.frame(datafortest)
>
> datafortest$factor1<-as.factor(datafortest$factor1)
> datafortest$factor2<-as.factor(datafortest$factor2)
>
> str(datafortest)
>
> ##### tried to use grubbs.test() on a single column of the dataframe, but
> still not working
> tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11)
>
>
> ####################################
>
> *grubbs.test() on a single dataset: but this can only detect if the min and
> the max are outliers.*
>
>
> xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99)
> grubbs.test(xx999, type=11)
>
>
>
>
> With many thanks
>
> Abou
>
>
>
> factor1 X Y factor2 Z U
> V
> 1 4455.077 888 1 999 NA 999
> 1 4348.031 333 1 475 NA 240
> 1 9999.789 618 1 507 252 394
> 1 3813.139 417 1 603 332 265
> 1 7512.65 344 1 442 216 NA
> 1 5642.667 NA 1 486 217 275
> 1 6684.386 341 1 927 698 479
> 2 5165.731 999 1 971 311 562
> 2 NA 265 1 388 999 512
> 2 3259.241 557 2 888 444 777
> 2 3288.383 234 2 514 NA 322
> 2 1997.878 383 2 409 311 NA
> 2 99990.61 NA 2 546 327 728
> 2 2655.977 NA 2 523 228 653
> 3 3189.49 7777 2 313 456 450
> 3 1826.851 287 2 296 412 576
> 3 4386.002 352 2 320 251 NA
> 3 3295.091 308 2 388 888 396.5
> 3 2120.902 526 3 9999 398 888
> 3 NA 489 3 677 438 307
> 3 2056.123 291 3 555 428 219
> 3 1995.088 444 3 NA 319 NA
> 3 NA 349 3 479 NA 321
> 3 2539.873 333 3 257 406 417
> 3 313 334 409
> 3 296 465 546
> 3 320 180 523
> 3 388 999 313
>
>
>
> ______________________
>
>
> *AbouEl-Makarim Aboueissa, PhD*
>
> *Professor, Mathematics and Statistics*
> *Graduate Coordinator*
>
> *Department of Mathematics and Statistics*
> *University of Southern Maine*
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
With the data file you have attached I cannot reproduce any errors, all
went well at the first try.
library(outliers)
fl <- "~/data_for_test.csv"
datafortest <- read.csv(fl)
# these are not needed to run the test
datafortest$factor1 <- as.factor(datafortest$factor1)
datafortest$factor2 <- as.factor(datafortest$factor2)
str(datafortest)
#> 'data.frame': 28 obs. of 7 variables:
#> $ factor1: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 2 2 2 ...
#> $ X : num 4455 4348 10000 3813 7513 ...
#> $ Y : int 888 333 618 417 344 NA 341 999 265 557 ...
#> $ factor2: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 2 ...
#> $ Z : int 999 475 507 603 442 486 927 971 388 888 ...
#> $ U : int NA NA 252 332 216 217 698 311 999 444 ...
#> $ V : num 999 240 394 265 NA 275 479 562 512 777 ...
head(datafortest)
#> factor1 X Y factor2 Z U V
#> 1 1 4455.077 888 1 999 NA 999
#> 2 1 4348.031 333 1 475 NA 240
#> 3 1 9999.789 618 1 507 252 394
#> 4 1 3813.139 417 1 603 332 265
#> 5 1 7512.650 344 1 442 216 NA
#> 6 1 5642.667 NA 1 486 217 275
##### tried to use grubbs.test() on a single column of the dataframe, but
##### still not working
grubbs.test(datafortest$X, type = 11)
#>
#> Grubbs test for two opposite outliers
#>
#> data: datafortest$X
#> G = 4.6640014, U = 0.0091756, p-value = 0.02867
#> alternative hypothesis: 1826.851 and 99990.608 are outliers
Hope this helps,
Rui Barradas
More information about the R-help
mailing list