[R] Need some suggestions for outlier detection in a matrix
arun
smartpink111 at yahoo.com
Wed Jan 15 17:33:27 CET 2014
Hi,
Try:
dat1 <- read.table("ZvsPGRT_frag_0filt.txt",sep="\t",header=TRUE,row.names=1)
dat_Z <- dat1[,1:4] ## unnecessary to do cbind() here
mat1 <- as.matrix(dat_Z)
head(mat1,2)
# Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0
#XLOC_000001 626 3516 1277 770
#XLOC_000002 82 342 185 72
library(outliers)
ctest_mat1 <- t(apply(mat1,1,function(x) {test <- chisq.out.test(as.numeric(x)); c(outLier=as.numeric(gsub("[[:alpha:]]","",test$alternative)), Pval=test$p.value)}))
mat2 <- cbind(mat1,ctest_mat1)
head(mat2,2)
# Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0 outLier
#XLOC_000001 626 3516 1277 770 3516
#XLOC_000002 82 342 185 72 342
# Pval
#XLOC_000001 0.1423296
#XLOC_000002 0.1707215
A.K.
On Wednesday, January 15, 2014 7:12 AM, Vivek Das <vd4mmind at gmail.com> wrote:
HI Arun,
I was wondering how to use the package outliers. There is a package which can help me identifying outliers for each row. So I have a matrix with rownames for first column and next 4 colmns have values. for each row I want to find the outlier and also the test statistic of it. So there is a package ‘outliers’. Which has this test chisq.out.test that performs a chisquared test for detection of one outlier in a vector. So now I want to apply this for my matrix. and want to find out for each row which is the outlier and then what is the p.value associated to it. I was using the below code
data<-read.table("my_file.txt",,sep='\t', header=T)
## Selecting only the centers
data_Z<-cbind(data[,1:5])
mat1<- as.matrix(data_Z[,2:5])
row.names(mat1)<- data_Z[,1]
head(mat1)
Sample_118z.0 Sample_132z.0 Sample_141z.0 Sample_183z.0
XLOC_000001 626 3516 1277 770
XLOC_000002 82 342 185 72
XLOC_000003 361 2000 867 438
XLOC_000004 30 143 67 37
XLOC_000010 1 7 5 3
XLOC_000011 10 63 19 15
ctest_mat1<-c()
for (i in 1:length(mat1[,1]))
{
ctest_mat1<-c(ctest_mat1,chisq.out.test(as.numeric(mat1[i,])))
}
But this does not give me the outlier for each row. I mean it should be ideally but when am trying to combine it with the matrix mat1 with below command I get the error
res <-cbind(mat1,ctest_mat1)
Warning message:
In .Method(..., deparse.level = deparse.level) :
number of rows of result is not a multiple of vector length (arg 2)
I want my matrix with the mat1 and also the columns for each row saying which is the outlier and the p- value associated with it. I mean when I
head(ctest_mat1)
$statistic
X-squared
2.152591
$alternative
[1] "highest value 3516 is an outlier"
$p.value
[1] 0.1423296
$method
[1] "chi-squared test for outlier"
$data.name
[1] "as.numeric(mat1[i, ])"
$statistic
X-squared
1.876596
I get only the following for the first row. I want it was a matrix for all the rows and combine it with my mat1 so that I can then evaluate. Can you help me with that? I am also attaching the matrix. I hope you understood my point.
----------------------------------------------------------
Vivek Das
More information about the R-help
mailing list