[BioC] imputing missing data for 70mer array platform, need advice

Thu Jan 18 12:20:00 CET 2007

On Wednesday 17 January 2007 19:34, Betty Gilbert wrote:
> Hello,
> If this has been discussed in the archives, my apologies but I
> couldn't find it. I am comparing two array CGH datasets, one
> generated by Nimblegen which is very complete and one generated by
> myself on a 70mer array with over 10,000 elements which has 3-4
> replicates for three species I have Nimblegen data for. I have
> calculated corrected pvalues for the nimblegen set using multtest and
> would like to do so for the 70mer set but have issues with missing
> data. I used t-tests, testing for variance, that filter out or
> disregard the missing data for the 70mer set already using the
> program ACUITY to calculate p-values.
>
> I wanted to compare the corrected p-values after using a method to
> impute the missing data to see how different the results are from
> filtered dataset.
>
> My question: For a 70mer array with one oligo per open reading frame
> what method of data imputation is best statistically. I looked over
> the knn method in the package impute (mostly recommended for
> expression data) and impute.lowess in the package aCGH which may be
> optimized for high density arrays from what i can tell and my
> apologies if that is not the case.
>
> Does anyone have any recommendations about which method for imputing
> data I should try for a 70mer  platform? Thank you for your time.

A couple of questions:

1)  Why are the data "missing"?  Is it due to quality of the spot or due to 
low intensity?  These are two related but different situations.  

2)  Why not use a package like limma, or some other package that can account 
for missing data and/or downweight questionable values?  I don't know about 
ACUITY, but it sounds like it may be doing something like that.

Sean