[BioC] problem read.maimage("Agilent") -limma

Gordon Smyth smyth at wehi.edu.au
Wed Jul 27 02:29:30 CEST 2005


This is caused by an R bug introduced in R 2.1.1, which persists in R 2.1.1 
patched. The function read.table() is now interpreting backslashes as 
C-style special characters. This change was supposed to affect scan() only, 
but apparently has spilled over into read.table() as well.

The gene names in the AgilentFE export files contain strings such as \0, 
which is being matched as the null character. This is not only causing the 
file read to terminate premately, it is also causing a crash of R itself 
when the string is printed.

At this moment, I can see no good work around apart from going back to an 
earlier version of R. I will take up the problem with R core for a fix. Martin?

Gordon

At 01:14 AM 27/07/2005, Henrik Bengtsson wrote:
>Recently we detected some problems with internal regexpr libraries in R 
>v2.1.1. One of the symptoms was that R would crash on Windows, but also 
>that the regular expression became corrupt in memory. This was partly 
>fixed in the R v2.1.1 patched (2005-07-20). Note that this was introduced 
>when the went from R v2.1.0 to v2.1.1, so this might be related to your 
>problem.
>
>Cheers
>
>Henrik
>
>Naomi Altman wrote:
>>There are "\" and "#" before the offending line.  I could not find any 
>>other unusual characters in the offending line.
>>--Naomi
>>At 09:59 AM 7/26/2005, Sean Davis wrote:
>>
>>>On Jul 26, 2005, at 8:13 AM, Gordon K Smyth wrote:
>>>
>>>
>>>>>Date: Mon, 25 Jul 2005 12:22:22 -0400
>>>>>From: Naomi Altman <naomi at stat.psu.edu>
>>>>>Subject: [BioC] problem read.maimage("Agilent") -limma
>>>>>To: bioconductor at stat.math.ethz.ch
>>>>>
>>>>>I am having trouble reading the Agilent arabidopsis 22575 gene array
>>>>>using
>>>>>read.maimage in Limma under R 2.1.1 (I don't know the limma version,
>>>>>but I
>>>>>just downloaded using the R packages interface, and also used the
>>>>>update,
>>>>>so I presume this is the most recent.
>>>>
>>>>You should have limma 2.0.2.
>>>>
>>>>
>>>>>Under R 2.0.1, there was no problem reading all the data in the
>>>>>arrays using:
>>>>>
>>>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4508. 
>>>>>txt","4509.txt"),source="agilent"
>>>>>)
>>>>>
>>>>>dim(RGf$R)
>>>>>22575     6
>>>>>
>>>>>
>>>>>But under R 2.1.I,  I get:
>>>>>
>>>>>RGf=read.maimages(c("2792.txt","2793.txt","2796.txt","4507.txt","4508. 
>>>>>txt","4509.txt"),source="agilent"
>>>>>)
>>>>>
>>>>>dim(RGf$R)
>>>>>12956    6
>>>>>
>>>>>The last line of RGf$R is all NA.
>>>>>
>>>>>The problem might be in RGf$genes.  When I try to print any row up to
>>>>>the
>>>>>last one, everything looks normal.  Trying to print the last row kills
>>>>>R.  The annotation for this gene appears to be exceptionally long.
>>>
>>>I have had problems with Agilent annotation files containing "special"
>>>characters that cause similar "termination" of file reading.  I would
>>>look at the annotation for quotation marks, single quotes, # symbols
>>>(no idea why this seems to affect things), and backslashes.  I
>>>typically write a little perl script to "clean" the files.  I'm not
>>>sure why this should vary from one version to the next, though.
>>>
>>>Sean
>>
>>Naomi S. Altman                                814-865-3791 (voice)
>>Associate Professor
>>Bioinformatics Consulting Center
>>Dept. of Statistics                              814-863-7114 (fax)
>>Penn State University                         814-865-1348 (Statistics)
>>University Park, PA 16802-2111
>>_______________________________________________
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list