[BioC] question about outlier removal

Weiwei Shi helprhelp at gmail.com
Thu Sep 28 20:46:06 CEST 2006

On 9/27/06, James Anderson <janderson_net at yahoo.com> wrote:
> Hi,
>   This may be a generic question, not necessarily related to the usage of R and bioconductor. The question is: for microarray experiment, suppose I have 50 normal and 50 cancer samples. I want to find some sample outliers which may come from different resources due to:
> 1. Mislabelling, i.e, mislabel cancer into normal or normal into cancer

I think it is not an outlier problem but class noise problem. Google
class noise correction or removal instead and there are some work on
this topic.

> 2. Misbehavior, i.e, some normal samples are actually sick or have heart attack, although they don't have cancer.

If your problem is cancer vs non-cancer one, then again, they should
not be removed either, IMHO.

> Should I do gene selection or not before doing outlier removal? Sometimes I find some samples are identified as outliers using 200 genes, other samples will be identified as outliers if I use 50 or 20 genes. Normally in microarray experiment, what is the percentage of genes affected by treatment or unnormal conditions?
>   Thanks,
>   James
> ---------------------------------
> Get your email and more, right on the  new Yahoo.com
>         [[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

More information about the Bioconductor mailing list