[R] Need some suggestions for outlier detection in a matrix

arun smartpink111 at yahoo.com
Wed Jan 15 18:09:26 CET 2014

If you need the 'position' of the outlier in each row:

Position <- apply(!(mat1 -ctest_mat1[,1]),1,function(x) if(length(which(x))>1) NA else which(x))
mat3 <- cbind(mat2,Position)

On Wednesday, January 15, 2014 11:33 AM, arun <smartpink111 at yahoo.com> wrote:
dat1 <- read.table("ZvsPGRT_frag_0filt.txt",sep="\t",header=TRUE,row.names=1)
dat_Z <- dat1[,1:4] ## unnecessary to do cbind() here
mat1 <- as.matrix(dat_Z)
#            Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0
#XLOC_000001           626          3516          1277           770
#XLOC_000002            82           342           185            72
 ctest_mat1 <- t(apply(mat1,1,function(x) {test <- chisq.out.test(as.numeric(x)); c(outLier=as.numeric(gsub("[[:alpha:]]","",test$alternative)), Pval=test$p.value)}))
 mat2 <- cbind(mat1,ctest_mat1)
#            Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0 outLier
#XLOC_000001           626          3516          1277           770    3516
#XLOC_000002            82           342           185            72     342
#                 Pval
#XLOC_000001 0.1423296
#XLOC_000002 0.1707215


On Wednesday, January 15, 2014 7:12 AM, Vivek Das <vd4mmind at gmail.com> wrote:

HI Arun,

I was wondering how to use the package outliers. There is a package which can help me identifying outliers for each row. So I have a matrix with rownames for first column and next 4 colmns have values. for each row I want to find the outlier and also the test statistic of it. So there is a package ‘outliers’. Which has this test chisq.out.test that  performs a chisquared test for detection of one outlier in a vector. So now I want to apply this for my matrix. and want to find out for each row which is the outlier and then what is the p.value associated to it. I was using the below code 

data<-read.table("my_file.txt",,sep='\t', header=T)
## Selecting only the centers
mat1<- as.matrix(data_Z[,2:5])
row.names(mat1)<- data_Z[,1]

            Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0
XLOC_000001           626          3516          1277           770
XLOC_000002            82           342           185            72
XLOC_000003           361          2000           867           438
XLOC_000004            30           143            67            37
XLOC_000010             1             7             5             3
XLOC_000011            10            63            19            15


for (i in 1:length(mat1[,1]))


But this does not give me the outlier for each row. I mean it should be ideally but when am trying to combine it with the matrix mat1 with below command I get the error

res <-cbind(mat1,ctest_mat1)
Warning message:
In .Method(..., deparse.level = deparse.level) :
  number of rows of result is not a multiple of vector length (arg 2)

I want my matrix  with the mat1 and also the columns for each row saying which is the outlier and the p- value associated with it.  I mean when I 


[1] "highest value 3516 is an outlier"

[1] 0.1423296

[1] "chi-squared test for outlier"

[1] "as.numeric(mat1[i, ])"


I get only the following for the first row. I want it was a matrix for all the rows and combine it with my mat1 so that I can then evaluate. Can you help me with that? I am also attaching the matrix. I hope you understood my point.


Vivek Das

More information about the R-help mailing list