[Bioc-devel] problem with matchprobes:getProbeDataAffy() -

Francesco Ferrari ferrari.francesco at unimore.it
Mon Aug 25 16:30:45 CEST 2008


A few days ago I received a couple of bug reports concerning an error
message occurring when using gcrma preprocessing procedure with custom
probeset definitions (gahgu133acdf and gahgu133aprobe package).

The source of the problem is one single missing probe sequence into
the environment "gahgu133aprobe".
I also verified that the same problem occurs on the other "probe"
packages with custom definitions of probesets that I am currently
maintaining: i.e. gahgu133bprobe, gahgu133plus2probe, ... etc.


After carefully debugging the package generation procedure, I found
the likely source of this problem into the function "getProbeDataAffy"
from the matchprobes package, that is used to read the probetable from
a TXT file, in order to generate the "probetable" object, that is
subsequently used to create the "probe" package.


#Within the function code, the following lines change the "datafile"
argument of the function from a character, i.e. the path to the file,
to a "connection" to the file itself.

  if (missing(datafile)) {
        datafile <- paste(arraytype, "_probe_tab", sep = "")
    } else {
        if (is(datafile, "character")) {
            datafile <- file(datafile, "r")
            on.exit(close(datafile))
        }
     }

# Then a few lines below, the connection to the file is firstly used
to read the header line, and then the remaining part of the data
   head <- scan(datafile, sep = "\t", quiet = TRUE, multi.line = FALSE,
        nlines = 1, what = "character")
    dat <- scan(datafile, sep = "\t", quiet = TRUE, multi.line = FALSE,
        what = what, skip = 1)


The second call to the "scan()" function misses one of the lines of
data, therefore there is one missing line in the resulting object.
The problem can be solved just using the filename instead of a
connection to access the file itself. I temporary solved the problem
commenting the initial part of the function code as follows:

 if (missing(datafile)) {
        datafile <- paste(arraytype, "_probe_tab", sep = "")
#    } else {
#        if (is(datafile, "character")) {
#            datafile <- file(datafile, "r")
#            on.exit(close(datafile))
#        }
     }


I think that my problem is due to the fact that, when using the
connection instead of the file path, the connection itself "remembers"
the last line that was read, thus the second call to the scan()
function skips an additional line containing meaningful data.

What do you think about this problem and the proposed "diagnosis" and solution?

All the best,
Francesco Ferrari




> sessionInfo()
R version 2.7.1 (2008-06-23)
i686-pc-linux-gnu

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=it_IT.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=it_IT.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] matchprobes_1.12.0   affy_1.18.2          preprocessCore_1.2.0
[4] affyio_1.8.0         Biobase_2.0.1



More information about the Bioc-devel mailing list