[R] grubbs test to detect all outliers
AbouEl-Makarim Aboueissa
@boue|m@k@r|m1962 @end|ng |rom gm@||@com
Sat Apr 29 10:26:29 CEST 2023
Hi Rui: good morning
I forgot to cc my previous email to the R mailing list.
Please find below the the output of *dput(datafortest)*.
Also, please see below the printed dataset.
Thank you very much for your help
abou
> dput(datafortest)
structure(list(factor1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, NA, NA, NA, NA), levels = c("1", "2", "3"), class = "factor"),
X = c(4455.077, 4348.031, 9999.789, 3813.139, 7512.65, 5642.667,
6684.386, 5165.731, NA, 3259.241, 3288.383, 1997.878, 99990.608,
2655.977, 3189.49, 1826.851, 4386.002, 3295.091, 2120.902,
NA, 2056.123, 1995.088, NA, 2539.873, NA, NA, NA, NA), Y = c(888L,
333L, 618L, 417L, 344L, NA, 341L, 999L, 265L, 557L, 234L,
383L, NA, NA, 7777L, 287L, 352L, 308L, 526L, 489L, 291L,
444L, 349L, 333L, NA, NA, NA, NA), factor2 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("1",
"2", "3"), class = "factor"), Z = c(999L, 475L, 507L, 603L,
442L, 486L, 927L, 971L, 388L, 888L, 514L, 409L, 546L, 523L,
313L, 296L, 320L, 388L, 9999L, 677L, 555L, NA, 479L, 257L,
313L, 296L, 320L, 388L), U = c(NA, NA, 252L, 332L, 216L,
217L, 698L, 311L, 999L, 444L, NA, 311L, 327L, 228L, 456L,
412L, 251L, 888L, 398L, 438L, 428L, 319L, NA, 406L, 334L,
465L, 180L, 999L), V = c(999, 240, 394, 265, NA, 275, 479,
562, 512, 777, 322, NA, 728, 653, 450, 576, NA, 396.5, 888,
307, 219, NA, 321, 417, 409, 546, 523, 313)), row.names = c(NA,
-28L), class = "data.frame")
>
> datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
> datafortest
factor1 X Y factor2 Z U V
1 1 4455.077 888 1 999 NA 999.0
2 1 4348.031 333 1 475 NA 240.0
3 1 9999.789 618 1 507 252 394.0
4 1 3813.139 417 1 603 332 265.0
5 1 7512.650 344 1 442 216 NA
6 1 5642.667 NA 1 486 217 275.0
7 1 6684.386 341 1 927 698 479.0
8 2 5165.731 999 1 971 311 562.0
9 2 NA 265 1 388 999 512.0
10 2 3259.241 557 2 888 444 777.0
11 2 3288.383 234 2 514 NA 322.0
12 2 1997.878 383 2 409 311 NA
13 2 99990.608 NA 2 546 327 728.0
14 2 2655.977 NA 2 523 228 653.0
15 3 3189.490 7777 2 313 456 450.0
16 3 1826.851 287 2 296 412 576.0
17 3 4386.002 352 2 320 251 NA
18 3 3295.091 308 2 388 888 396.5
19 3 2120.902 526 3 9999 398 888.0
20 3 NA 489 3 677 438 307.0
21 3 2056.123 291 3 555 428 219.0
22 3 1995.088 444 3 NA 319 NA
23 3 NA 349 3 479 NA 321.0
24 3 2539.873 333 3 257 406 417.0
25 NA NA NA 3 313 334 409.0
26 NA NA NA 3 296 465 546.0
27 NA NA NA 3 320 180 523.0
28 NA NA NA 3 388 999 313.0
>
______________________
*AbouEl-Makarim Aboueissa, PhD*
*Professor, Mathematics and Statistics*
*Graduate Coordinator*
*Department of Mathematics and Statistics*
*University of Southern Maine*
On Fri, Apr 28, 2023 at 11:35 AM Rui Barradas <ruipbarradas using sapo.pt> wrote:
> Às 14:09 de 28/04/2023, AbouEl-Makarim Aboueissa escreveu:
> > *R: *Grubbs Test to detect all outliers Per group for all columns in a
> data
> > frame
> >
> >
> >
> > Dear All: good morning
> >
> > I have a dataset (as an example) with two column factors (factor1 and
> > factor2) and 5 numerical columns (X,Y,Z,U,V). The X and Y columns have
> same
> > length as factor1; and Z, U, and V have same length as factor2. Please
> see
> > dataset is copied below. Please note that all dataset columns have NAs
> > values.
> >
> > *Need help on this:*
> >
> >
> > Can we use the grubbs.test() function to detect all outliers and replace
> it
> > by NA in X and Y datasets per group in factor1; and in Z, U, and V
> datasets
> > per group in factor2. Columns in the dataframe have different lengths,
> but
> > when I read the .csv file, R added NA values for the shorter columns.
> >
> > If you need the .csv data file, please let me know.
> >
> >
> > Thank you very much for your help in advance.
> >
> >
> >
> >
> > install.packages("outliers")
> > library(outliers)
> >
> > datafortest<-read.csv("G:/data_for_test.csv", header=TRUE)
> > datafortest
> >
> > datafortest<-data.frame(datafortest)
> >
> > datafortest$factor1<-as.factor(datafortest$factor1)
> > datafortest$factor2<-as.factor(datafortest$factor2)
> >
> > str(datafortest)
> >
> > ##### tried to use grubbs.test() on a single column of the dataframe, but
> > still not working
> > tests.for.outliers.X<- grubbs.test(datafortest$X, na.rm = TRUE, type=11)
> >
> >
> > ####################################
> >
> > *grubbs.test() on a single dataset: but this can only detect if the min
> and
> > the max are outliers.*
> >
> >
> > xx999<-c(0.088,1,2,3,4,5,6,7,8,9,88,98,99)
> > grubbs.test(xx999, type=11)
> >
> >
> >
> >
> > With many thanks
> >
> > Abou
> >
> >
> >
> > factor1 X Y factor2 Z U
> > V
> > 1 4455.077 888 1 999 NA 999
> > 1 4348.031 333 1 475 NA 240
> > 1 9999.789 618 1 507 252 394
> > 1 3813.139 417 1 603 332 265
> > 1 7512.65 344 1 442 216 NA
> > 1 5642.667 NA 1 486 217 275
> > 1 6684.386 341 1 927 698 479
> > 2 5165.731 999 1 971 311 562
> > 2 NA 265 1 388 999 512
> > 2 3259.241 557 2 888 444 777
> > 2 3288.383 234 2 514 NA 322
> > 2 1997.878 383 2 409 311 NA
> > 2 99990.61 NA 2 546 327 728
> > 2 2655.977 NA 2 523 228 653
> > 3 3189.49 7777 2 313 456 450
> > 3 1826.851 287 2 296 412 576
> > 3 4386.002 352 2 320 251 NA
> > 3 3295.091 308 2 388 888 396.5
> > 3 2120.902 526 3 9999 398 888
> > 3 NA 489 3 677 438 307
> > 3 2056.123 291 3 555 428 219
> > 3 1995.088 444 3 NA 319 NA
> > 3 NA 349 3 479 NA 321
> > 3 2539.873 333 3 257 406 417
> > 3 313 334 409
> > 3 296 465 546
> > 3 320 180 523
> > 3 388 999 313
> >
> >
> >
> > ______________________
> >
> >
> > *AbouEl-Makarim Aboueissa, PhD*
> >
> > *Professor, Mathematics and Statistics*
> > *Graduate Coordinator*
> >
> > *Department of Mathematics and Statistics*
> > *University of Southern Maine*
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> Please post the output of
>
> dput(datafortest)
>
> your data is difficult to read into a R session.
>
>
> Hope this helps,
>
> Rui Barradas
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list