[BioC] imputing missing data for 70mer array platform, need advice

J.delasHeras at ed.ac.uk J.delasHeras at ed.ac.uk
Thu Jan 18 15:23:25 CET 2007


Quoting Sean Davis <sdavis2 at mail.nih.gov>:

> On Wednesday 17 January 2007 19:34, Betty Gilbert wrote:
>> Hello,
>> If this has been discussed in the archives, my apologies but I
>> couldn't find it. I am comparing two array CGH datasets, one
>> generated by Nimblegen which is very complete and one generated by
>> myself on a 70mer array with over 10,000 elements which has 3-4
>> replicates for three species I have Nimblegen data for. I have
>> calculated corrected pvalues for the nimblegen set using multtest and
>> would like to do so for the 70mer set but have issues with missing
>> data. I used t-tests, testing for variance, that filter out or
>> disregard the missing data for the 70mer set already using the
>> program ACUITY to calculate p-values.
>>
>> I wanted to compare the corrected p-values after using a method to
>> impute the missing data to see how different the results are from
>> filtered dataset.
>>
>> My question: For a 70mer array with one oligo per open reading frame
>> what method of data imputation is best statistically. I looked over
>> the knn method in the package impute (mostly recommended for
>> expression data) and impute.lowess in the package aCGH which may be
>> optimized for high density arrays from what i can tell and my
>> apologies if that is not the case.
>>
>> Does anyone have any recommendations about which method for imputing
>> data I should try for a 70mer  platform? Thank you for your time.
>
> A couple of questions:
>
> 1)  Why are the data "missing"?  Is it due to quality of the spot or due to
> low intensity?  These are two related but different situations.
>
> 2)  Why not use a package like limma, or some other package that can account
> for missing data and/or downweight questionable values?  I don't know about
> ACUITY, but it sounds like it may be doing something like that.
>
> Sean

I would second that second point.
We have Acuity here, and while it's proven useful to obtain some info  
very quickly, it is irritatingly inflexible. Quite frankly, we've all  
gone off it. It appears to be a GeneSpring wannabe, but doesn't quite  
make it, and it's far too expensive for what it is. Using R (an in  
particular limma) seems a bit more hard work at first, but well worth  
it. And free!

Jose

-- 
Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
Swann Building, Mayfield Road
University of Edinburgh
Edinburgh EH9 3JR
UK



More information about the Bioconductor mailing list