missing value zero imputation (was Re: [BioC] Installing bioconductor from behind a firewall)

Adaikalavan Ramasamy ramasamy at cancer.org.uk
Tue Aug 17 16:20:34 CEST 2004


Please use an appropriate subject and not simply press reply to another
thread. See the posting guide at the footnote.

Please give more information about where does the missing value come
from (flagging, failed spot criterion, computation) and what type of
arrays (affy, cDNA) 

You do not necessarily zero impute for the following reasons :


1) There are other better ways of imputing. In the following papers, the
authors showed that k-nn imputing is better than row mean and SVD
imputation.

 Missing value estimation methods for DNA microarrays.  
 Troyanskaya O., Cantor M., Sherlock G., Brown P., Hastie T., Tibshirani
R., Botstein D., Altman R.B.  
 Bioinformatics 2001; 17(6):520-5
 PMID:11395428

There are more recent papers on missing value imputation for
microarrays. Try a google or pubmed search and you will find many more.


2) Most of the functions in R can deal with missing values if you set
the argument na.rm = TRUE. Some do this by default (see below) or you
can easily write a parser.

 t.test( c(500,502,501, NA, NA), c(400,380,410, NA) )$p.value
  [1] 0.006880871

 t.test( c(500,502,501, 0, 0), c(400,380,410, 0) )$p.value
  [1] 0.9848868

This example also shows you an example when some imputation can be
inappropriate.


3) Missing values can be informative but this depends on how the missing
values were generated. I commonly filter genes with more than 70%
missing (across arrays) to avoid spurious results. Sometimes arrays with
more than say 50% missing value can be indicative of array problems.


In short, this is depends on how much missing values you have, if they
are informative and what do you want to use them for. I tend to impute
the data only if I plan on using some classification method that cannot
handle missing values. 


Regards, Adai.


On Tue, 2004-08-17 at 13:48, S Peri wrote:
> Dear Group,
>   In the expression values, if there is N/A do we have
> to convert them to '0' before processing it?
> If so, how can I convert N/A to '0'?
> 
> Thank you. 
> PS
> 
> 
> 
> --- Michael Hoffman <hoffman at ebi.ac.uk> wrote:
> 
> > On Mon, 16 Aug 2004, James MacDonald wrote:
> > 
> > > You can download all the packages you are
> > interested in at
> > > www.bioconductor.org, and then install using R CMD
> > install.
> > 
> > So I have to install them one by one this way?
> > There's no distribution
> > of them all?
> > 
> > Thank you,
> > -- 
> > Michael Hoffman <hoffman at ebi.ac.uk>
> > European Bioinformatics Institute
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> >
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list