[BioC] limma reading genpix files

Tue Apr 13 19:53:52 CEST 2004

Hello all,

I've just started plowing thru some two-color arrays and ran across the 
same bug.

The problem I found is due to an efficiency hack that doesn't always 
work: read.maimages assumes that all the genepix files it reads have the 
same number of header records. That is, rather than taking the number of 
header records directly from each .gpr file, it counts the number of 
lines until the header line in the *first file only*, and then assumes 
that all the rest of the files have the same number of records.

If not, then the skip= parameter is incorrect, read.table() starts in 
the wrong place, and results are, as they say, unpredictable.

The "right" way to do this is to read the first lines of each .gpr file, 
get the number of header records, and then use read.table with the right 
number of header records. A quick hack is just to search for the header 
in each file.

I've attached a modification to read.maimages() that does exactly that, 
at least the three times I've tried it :-)

Cheers,

Dave Nelson

Gordon K Smyth wrote:

>>hello all limma exprts,
>>
>>I have started using limma to analyse some spotted microarray data  but
>>have run into problems trying to read the data files in.
>> From the limma guide, having specified the "files" ,  then using the
>>command:
>>  RG<-read.maimages(files, source="genepix")
>>
>>I get:
>>
>>Read 7863scan3.gpr
>>Error in read.table(fullname, skip = skip, header = TRUE, sep = sep,
>>as.is = TRUE,  :
>>	more columns than column names
> 
> 
> Something is wrong with your second data file.  Have you looked at it to
> check?
> 
> 
>>I have 4 .gpr files that were generated from genepix version 3 with
>>7863scan3 being the first
>>
>>I also tried specifying columns as mentioned in the limma guide:
>>
>>RG<-read.maimages(files, columns=list(Rf="F635 Median", Gf="F532
>>Median", Rb="B635 Median", Gb ="532 Median"))
>>
>>and got
>>
>>Error in "[.data.frame"(obj, , columns$Gb) :
>>	undefined columns selected.
> 
> 
> The error message seems to me to be not impossible to interpret.  It tells
> you that you've tried to specify a column that doesn't exist and that the
> offending column is Gb.  If you look at your own code you'll see an
> obvious typo.
> 
> Gordon
> 
> 
>>I have no idea how to interpret these error messages and have to say
>>that my forays into BioConductor have been a frequent exercise in
>>frustration because of constant unintelligible error messages. Could
>>some one please help me in solving these issues.
>>
>>I'm running R 1.8.0 on MacOS X and recently updated limma (1.3?)
>>
>>thanks
>>Bryce
>>	[[alternative HTML version deleted]]
> 
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> 

-------------- next part --------------
###
### A hack to read genepix .gpr files with a differ number of header lines in each file.
###
### Does not work with other formats. SHOULD rewrite to take the header count from the second
### line of the file.
###

my.read.maimages <- function (files, source = "spot", path = NULL, ext = NULL, names = NULL, 
                           columns = NULL, wt.fun = NULL, verbose = TRUE, sep = "\t", 
                           quote = "\"", ...) 
{
    if (missing(files)) {
        if (missing(ext)) 
            stop("Must specify input files")
        else {
            extregex <- paste("\\.", ext, "$", sep = "")
            files <- dir(path = ifelse(is.null(path), ".", path), 
                pattern = extregex)
            files <- sub(extregex, "", files)
        }
    }
    if (!missing(source) && !missing(columns)) 
        stop("Cannot specify both source and columns")
    source <- match.arg(source, c("arrayvision", "genepix", "imagene", 
        "quantarray", "smd", "spot", "spot.close.open"))
    if (source == "imagene") 
        return(read.imagene(files = files, path = path, ext = ext, 
            names = names, columns = columns, wt.fun = wt.fun, 
            verbose = verbose, sep = sep, quote = quote, ...))
    slides <- as.vector(as.character(files))
    if (!is.null(ext)) 
        slides <- paste(slides, ext, sep = ".")
    nslides <- length(slides)
    if (is.null(names)) 
        names <- removeExt(files)
    if (is.null(columns)) 
        columns <- switch(source, smd = list(Gf = "CH1I_MEAN", 
            Gb = "CH1B_MEDIAN", Rf = "CH2I_MEAN", Rb = "CH2B_MEDIAN"), 
            spot = list(Rf = "Rmean", Gf = "Gmean", Rb = "morphR", 
                Gb = "morphG"), spot.close.open = list(Rf = "Rmean", 
                Gf = "Gmean", Rb = "morphR.close.open", Gb = "morphG.close.open"), 
            genepix = list(Rf = "F635 Mean", Gf = "F532 Mean", 
                Rb = "B635 Median", Gb = "B532 Median"), quantarray = list(Rf = "ch2 Intensity", 
                Gf = "ch1 Intensity", Rb = "ch2 Background", 
                Gb = "ch1 Background"))
    fullname <- slides[1]
    if (!is.null(path)) 
        fullname <- file.path(path, fullname)
    if (source == "quantarray") {
        firstfield <- scan(fullname, what = "", sep = "\t", flush = TRUE, 
            quiet = TRUE, blank.lines.skip = FALSE, multi.line = FALSE)
        skip <- grep("Begin Data", firstfield)
        if (length(skip) == 0) 
            stop("Cannot find \"Begin Data\" in image output file")
        nspots <- grep("End Data", firstfield) - skip - 2
        obj <- read.table(fullname, skip = skip, header = TRUE, 
            sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, 
            comment.char = "", nrows = nspots, ...)
    }
    else if (source == "arrayvision") {
        skip <- 1
        cn <- scan(fullname, what = "", sep = sep, quote = quote, 
            skip = 1, nlines = 1, quiet = TRUE)
        fg <- grep("^Median Dens - RFU", cn)
        if (length(fg) != 2) 
            stop(paste("Cannot find foreground columns in", fullname))
        bg <- grep("Bkgd", cn)
        if (length(fg) != 2) 
            stop(paste("Cannot find background columns in", fullname))
        columns <- list(Rf = fg[1], Rb = bg[1], Gf = fg[2], Gb = bg[2])
        obj <- read.table(fullname, skip = skip, header = TRUE, 
            sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, 
            comment.char = "", ...)
        nspots <- nrow(obj)
    }
    else {
        skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1
        if (length(skip) == 0) 
            stop(paste("Cannot find column heading in image output file", fullname))
        else skip <- skip[1]
        if (verbose)
                cat("Reading", fullname, "after skipping", skip, "...")

        obj <- read.table(fullname, skip = skip, header = TRUE, 
            sep = sep, quote = quote, as.is = TRUE, check.names = FALSE, 
            comment.char = "", ...)
        if (verbose)
            cat("\tDone.\n")
        nspots <- nrow(obj)
    }
    Y <- matrix(0, nspots, nslides)
    colnames(Y) <- names
    RG <- list(R = Y, G = Y, Rb = Y, Gb = Y)
    if (source == "smd") {
        anncol <- grep(columns$Gf, colnames(obj)) - 1
        if (anncol > 0) 
            RG$genes <- data.frame(obj[, 1:anncol])
    }
    if (!is.null(wt.fun)) 
        RG$weights <- Y
    for (i in 1:nslides) {
        if (i > 1) {
            fullname <- slides[i]
            if (!is.null(path)) 
                fullname <- file.path(path, fullname)

            ##
            ## HACK HACK. Works for genepix files, but others???
            ##
            skip <- grep(columns$Rf, readLines(fullname, n = 80)) - 1
            if (length(skip) == 0) 
                stop(paste("Cannot find column heading in image output file", fullname))
            else skip <- skip[1]
            if (verbose)
                cat("Reading", fullname, "after skipping", skip, "...")

            obj <- read.table(fullname, skip = skip, header = TRUE, 
                sep = sep, as.is = TRUE, quote = quote, check.names = FALSE, 
                comment.char = "", nrows = nspots, ...)
            cat("\tDone.\n")
        }
        RG$R[, i] <- obj[, columns$Rf]
        RG$G[, i] <- obj[, columns$Gf]
        RG$Rb[, i] <- obj[, columns$Rb]
        RG$Gb[, i] <- obj[, columns$Gb]
        if (!is.null(wt.fun)) 
            RG$weights[, i] <- wt.fun(obj)
#        if (verbose) 
#            cat(paste("Read", fullname, "\n"))
    }
    new("RGList", RG)
}