[BioC] outlier removal from gene chip
Weiwei Shi
helprhelp at gmail.com
Tue Sep 19 21:21:54 CEST 2006
hi, Sean:
I added some info here:
I did some pathway analysis and compare the results between using
those "outliers" and not using them. My result (validated by domain
knowledge, since they are unsupervised learning) shows the former is
better, which agrees with your suggestion. but i still do not think
the one with -14k and some numbers shown in the summary in the first
email make sense to me.
weiwei
On 9/19/06, Weiwei Shi <helprhelp at gmail.com> wrote:
> my current way is using mahalanobis() distance.
>
> to Sean:
> do u think that example: -14k is ok?
>
>
> On 9/19/06, fhong at salk.edu <fhong at salk.edu> wrote:
> > Dear Weiwei,
> > The definition of outlier is not clear, and no data point should be
> > treated as outlier unless there is reason to believe so. The simple way to
> > detect it is that 1.5IQR criteria, which you can write your own code (one
> > or two lines). Update me if there are any other method to detect outliers.
> >
> > Fangxin
> >
> >
> > > dear listers:
> > >
> > > I have a question on whether bioconductor has some tool-kit to detect
> > > outliers and remove them.
> > >
> > > my original dataset looks like this:
> > > V1 V51 V53 V55 V57
> > > 1 -493249600 1.459459 -3.069444 -1.300000 1.935484
> > > 2 -1613096495 -1.139269 -5.525281 -16.592593 -1.831978
> > > 3 1626196571 -3.500000 -1.011662 2.223881 3.921053
> > > 4 -1397009217 -3.571429 1.685714 -1.180297 -6.807692
> > > 5 1428659728 -1.405405 -1.469004 -4.779754 -1.033708
> > > 6 459853658 -2.158879 -7.510823 -1.085581 -9.382979
> > > 7 530182506 -1.431677 -1.336343 -3.126437 4.878788
> > > 8 1173842263 1.215385 1.856410 -2.059794 -6.020833
> > > 9 28847 2.407895 -2.048889 -1.730337 -1.178947
> > > 10 -1961875610 2.864159 -2.301234 -4.733264 -1.172058
> > >
> > > V1: internal probe id
> > > the rests are different samples. the cells are fold-change of
> > > disease/normal.
> > >
> > > summary of the sample columns( V51, ... V57) gives the following:
> > > V51 V53 V55 V57
> > > Min. :-482.000 Min. : -55.7342 Min. :-122.074 Min.
> > > :-14086.750
> > > 1st Qu.: -2.159 1st Qu.: -1.7312 1st Qu.: -2.125 1st Qu.:
> > > -1.831
> > > Median : -1.199 Median : -1.0416 Median : -1.200 Median :
> > > -1.080
> > > Mean : -0.918 Mean : 0.1662 Mean : -1.027 Mean :
> > > -1.874
> > > 3rd Qu.: 1.441 3rd Qu.: 1.5721 3rd Qu.: 1.419 3rd Qu.:
> > > 1.521
> > > Max. : 198.434 Max. :1478.1639 Max. : 95.768 Max. :
> > > 683.519
> > >
> > >
> > > My question is, is there any package which can detect those outliers
> > > (like -14086.750)and remove them and get an "average" for each gene
> > > (instead of each probe)?
> > >
> > > Thank you.
> > >
> > > Weiwei
> > >
> > > --
> > > Weiwei Shi, Ph.D
> > > Research Scientist
> > > GeneGO, Inc.
> > >
> > > "Did you always know?"
> > > "No, I did not. But I believed..."
> > > ---Matrix III
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > > Search the archives:
> > > http://news.gmane.org/gmane.science.biology.informatics.conductor
> > >
> > >
> >
> >
> > --------------------
> > Fangxin Hong Ph.D.
> > Plant Biology Laboratory
> > The Salk Institute
> > 10010 N. Torrey Pines Rd.
> > La Jolla, CA 92037
> > E-mail: fhong at salk.edu
> > (Phone): 858-453-4100 ext 1105
> >
> >
>
>
> --
> Weiwei Shi, Ph.D
> Research Scientist
> GeneGO, Inc.
>
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
More information about the Bioconductor
mailing list