[BioC] RMA and justRMA error
aedin
aedin at jimmy.harvard.edu
Wed Aug 16 02:25:48 CEST 2006
Hi Ben
I traced my problem down a bit more. I ftp the cel files as a .ZIP
archive. If I uncompress them using winzip on windows, the files are
ok. However I was using unzip on Linux and this seems to do some weird
and wonderful things. Although the 1st quartile, median and 3rd
quartile appear to be consistent (from the files I have checked), the
min value and the max value seem to be different. So unzip is
extracting the files without error (gzip or gunzip don't appear to be
winzip .ZIP archive friendly), but it is clearly doing some character
re-shuffling.
Sorry this is not a BioC problem. But do you know if this a known
problem or if there is a parameter that I should specify??
Thanks so much for all of your help
Regards
Aedin
***unzip details. I am using FC4***
UnZip 5.51 of 22 May 2004, by Info-ZIP. Maintained by C. Spieler.
Compiled with gcc 4.0.2 20051125 (Red Hat 4.0.2-8) for Unix (Linux ELF)
on Feb 6 2006.
Ben Bolstad wrote:
>If you can send me the original CEL file I can take a look to see if it
>is something I consider that should be detectable parsing error.
>
>Ben
>
>
>On Tue, 2006-08-15 at 19:48 -0400, aedin wrote:
>
>
>>Thanks Ben
>>Sorry I thought the same parser would apply to each method. I found the
>>culprit file using the approach you list below.
>>
>>It was not obvious in any of the normal plots (hist, boxplot etc) as
>>only one probeset had a ridiculous value (it was 5.6 x10^14). This
>>would completely skew a mean but not a median.
>>
>>Should I be wary of this cel file and dump it, or if it looks ok in the
>>hist, boxplot should I try to keep it? Do you know what would cause
>>this? How frequently does this occur?
>>
>>Thanks for your help
>>Aedin
>>
>>
>>Ben Bolstad wrote:
>>
>>
>>
>>>The parsing code does not necessarily detect all potential corruptions.
>>>And you will find that gcrma() will quite happily process the "corrupt"
>>>data I show below.
>>>
>>>The error itself is from the density() function. If you could isolate
>>>the array that is causing trouble using say something like this:
>>>
>>>for (i in 1:4){
>>>cat(i,"\n")
>>>blah <- bg.correct.rma(Dilution.Corrupted[,i])
>>>}
>>>
>>>The perhaps we could look at it a little closer.
>>>
>>>best,
>>>
>>>Ben
>>>
>>>
>>>
>>>On Tue, 2006-08-15 at 18:13 -0400, aedin wrote:
>>>
>>>
>>>
>>>
>>>>Dear Ben
>>>>Thanks for your reply. However if the data were corrupted, surely they
>>>>would not be read by ReadAffy and gcrma?
>>>>Aedin
>>>>
>>>>Ben Bolstad wrote:
>>>>
>>>>
>>>>
>>>>
>>>>>Typically, when I have encountered others who have had this error occur
>>>>>it is because they have corrupted data. For instance this piece of
>>>>>demonstration code will generate the same error:
>>>>>
>>>>>
>>>>>library(affy);library(affydata)
>>>>>data(Dilution)
>>>>>Dilution.Corrupted <- Dilution
>>>>>pm(Dilution.Corrupted)[1,1] <- 30000000
>>>>># that is an extreme value outside the
>>>>># range of normal raw probe intensities
>>>>>
>>>>>eset <- rma(Dilution.Corrupted)
>>>>>
>>>>>
>>>>>My suggestion would be to examine things along those lines.
>>>>>
>>>>>Best,
>>>>>
>>>>>Ben
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On Tue, 2006-08-15 at 15:01 -0400, aedin wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Dear BioC
>>>>>>I know that this error is reported a few times on the Bioc mailing list,
>>>>>>however no resolution to it is available in the archives (or at least
>>>>>>none that google and I could find). I get the same error whether I use
>>>>>>R 2.3.1 or the devel version. I enclose the devel version error.
>>>>>>
>>>>>>The cels files are read in by ReadAffy and are processed ok by gcrma,
>>>>>>however fall over when I try to run rma or justRMA.
>>>>>>
>>>>>>Thanks for your help
>>>>>>Aedin
>>>>>>
>>>>>>
>>>>>>
>>>>>>>df = justRMA(filenames=filenam[125:130])
>>>>>>>
>>>>>>>
>>>>>>Background correcting
>>>>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) :
>>>>>> need at least 2 points to select a bandwidth automatically
>>>>>>
>>>>>>
>>>>>>
>>>>>>>df = ReadAffy(filenames=filenam[125:130])
>>>>>>>df
>>>>>>>
>>>>>>>
>>>>>>AffyBatch object
>>>>>>size of arrays=1164x1164 features (63518 kb)
>>>>>>cdf=HG-U133_Plus_2 (54675 affyids)
>>>>>>number of samples=6
>>>>>>number of genes=54675
>>>>>>annotation=hgu133plus2
>>>>>>
>>>>>>
>>>>>>
>>>>>>>df.rma= rma(df)
>>>>>>>
>>>>>>>
>>>>>>Background correcting
>>>>>>Error in density.default(x, kernel = "epanechnikov", n = 2^14) :
>>>>>> need at least 2 points to select a bandwidth automatically
>>>>>>
>>>>>>
>>>>>>
>>>>>>>library(gcrma)
>>>>>>>df.gcrma= gcrma(df)
>>>>>>>
>>>>>>>
>>>>>>Adjusting for optical effect......Done.
>>>>>>Computing affinities.Done.
>>>>>>Adjusting for non-specific binding......Done.
>>>>>>Normalizing
>>>>>>Calculating Expression
>>>>>>
>>>>>>
>>>>>>
>>>>>>>sessionInfo()
>>>>>>>
>>>>>>>
>>>>>>R version 2.4.0 Under development (unstable) (2006-08-06 r38809)
>>>>>>i686-pc-linux-gnu
>>>>>>
>>>>>>locale:
>>>>>>LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>>>>>>
>>>>>>attached base packages:
>>>>>>[1] "splines" "tools" "methods" "stats" "graphics" "grDevices"
>>>>>>[7] "utils" "datasets" "base"
>>>>>>
>>>>>>other attached packages:
>>>>>>hgu133plus2probe hgu133plus2cdf gcrma matchprobes
>>>>>> "1.12.0" "1.12.0" "2.5.1" "1.5.0"
>>>>>> affy affyio Biobase made4
>>>>>> "1.11.6" "1.1.5" "1.11.24" "1.7.1"
>>>>>> scatterplot3d ade4
>>>>>> "0.3-24" "1.4-1"
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>:-)
>>>>
>>>>--
>>>>Aedín Culhane
>>>>Research Associate in Prof. J Quackenbush Lab
>>>>Harvard School of Public Health, Dana-Farber Cancer Institute
>>>>
>>>>
>>>>44 Binney Street, Mayer 232
>>>>Department of Biostatistics
>>>>Dana-Farber Cancer Institute
>>>>Boston, MA 02115
>>>>USA
>>>>
>>>>Phone: +1 (617) 632 2468
>>>>Fax: +1 (617) 632 5444
>>>>Email: aedin at jimmy.harvard.edu
>>>>Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
--
Aedín Culhane
Research Associate in Prof. J Quackenbush Lab
Harvard School of Public Health, Dana-Farber Cancer Institute
44 Binney Street, Mayer 232
Department of Biostatistics
Dana-Farber Cancer Institute
Boston, MA 02115
USA
Phone: +1 (617) 632 2468
Fax: +1 (617) 632 5444
Email: aedin at jimmy.harvard.edu
Web URL: http://www.hsph.harvard.edu/researchers/aculhane.html
More information about the Bioconductor
mailing list