[BioC] [Fwd: batch info for cellHTS]

Florian Hahne fhahne at fhcrc.org
Thu Oct 23 00:09:08 CEST 2008


Hi Yan,
the idea was that the user constructs the batch array manually and 
assigns it to the batch slot using the accessor method, e.g.
bt <- array(rep(1:2, each=5, dim=c(5,2,1))
batch(x) <- bt

I agree that it would be useful to have that functionality directly in 
the import functions. The only natural place where the batch information 
could go is in the platelist file. In the plateconf file, we don't have 
the notion of  plate replicates or samples any more.
Attached you find a slightly modified readPlateList function which 
evaluates an (optional) "Batch" column in the platelist file.
Hope that is the functionality you where looking for. Please let me know 
if you have further input.
Thanks,
Florian

PS: I forwarded this conversation to the mailing list. Others might 
benefit from it...

readPlateList <- function(filename,
                          path=dirname(filename),
                          name,
                          importFun,
                          verbose=interactive())
{
    file <- basename(filename)
    dfiles <- dir(path)
 
    if(!(is.character(path)&&length(path)==1))
        stop("'path' must be character of length 1")
 
    pd <- read.table(file.path(path, file), sep="\t", header=TRUE, 
as.is=TRUE)
 
    checkColumns(pd, file, mandatory=c("Filename", "Plate", "Replicate"),
                 numeric=c("Plate", "Replicate", "Channel", "Batch"))
 
    ## consistency check for "importFun"
    if (!missing(importFun)) {
        if (!is(importFun, "function"))
            stop("'importFun' should be a function to use to read the 
raw data files")
    } else {
        ## default function (compatible with the file format of the 
plate reader)
        importFun <- function(f) {
            txt <- readLines(f, warn=FALSE)
            sp <- strsplit(txt, "\t")
            well <- sapply(sp, "[", 2)
            val <- sapply(sp, "[", 3)
            out <- list(data.frame(well=I(well), val=as.numeric(val)), 
txt=I(txt))
            return(out)
        }
    }
 
    ## check if the data files are in the given directory
    a <- unlist(sapply(pd$Filename, function(z) grep(z, dfiles, 
ignore.case=TRUE)))
    if (length(a)==0)
        stop(sprintf("None of the files were found in the given 'path': 
%s", path))
 
    f <- file.path(path, dfiles[a])

    ## check if 'importFun' gives the output in the desired form
    aux <- importFun(f[1])
    if (which(unlist(lapply(aux, is, "data.frame"))) != 1 |
        !all(c("val", "well") %in% names(aux[[1]])) | length(aux)!=2)
        stop("The output of 'importFun' must be a list with 2 
components;\n",
             "the first component should be a 'data.frame' with slots 
'well' and 'val'.")

    ## auto-determine the plate format
    well <- as.character(importFun(f[1])[[1]]$well)
    let <- substr(well, 1, 1)
    num <- substr(well, 2, 3)
    let <- match(let, LETTERS)
    num <- as.integer(num)
    if(any(is.na(let))||any(is.na(num)))
        stop(sprintf("Malformated column 'well' in input file %s", f[1]))

    dimPlate <- c(nrow=max(let), ncol=max(num))
    nrWell <- prod(dimPlate)

    if(verbose)
        cat(sprintf("%s: found data in %d x %d (%d well) format.\n", name,
                    dimPlate[1], dimPlate[2], nrWell))
   
    ## Should we check whether these are true?
    ##     "96"  = c(nrow=8, ncol=12),
    ##     "384" = c(nrow=16, ncol=24),

    nrRep <- max(pd$Replicate)
    nrPlate <- max(pd$Plate)

    combo <- paste(pd$Plate, pd$Replicate)

    ## Channel: if not given, this implies that there is just one
    if("Channel" %in% colnames(pd)) {
        nrChannel <- max(pd$Channel)
        channel <- pd$Channel
        combo <- paste(combo, pd$Channel)
    } else {
        nrChannel <- 1L
        channel <- rep(1L, nrow(pd))
        pd$Channel <- channel   
    }

    whDup <- which(duplicated(combo))
    if(length(whDup)>0L) {
        idx <- whDup[1:min(5L, length(whDup))]
        msg <- paste("The following rows are duplicated in the plateList 
table:\n",
                     "\tPlate Replicate Channel\n", "\t",
                     paste(idx, combo[idx], sep="\t", collapse="\n\t"),
                     if(length(whDup)>5) sprintf("\n\t...and %d more.\n",
                                                 length(whDup)-5), "\n", 
sep="")
        stop(msg)
    }

    xraw <- array(NA_real_, dim=c(nrWell, nrPlate, nrRep, nrChannel))
    intensityFiles <- vector(mode="list", length=nrow(pd))
    names(intensityFiles) <- pd[, "Filename"]

    status <- character(nrow(pd))

    for(i in seq_len(nrow(pd))) {
        if(verbose)
            cat("\rReading ", i, ": ", pd$Filename[i], sep="")

        ff <- grep(pd[i, "Filename"], dfiles, ignore.case=TRUE)

        if (length(ff)!=1) {
            f <- file.path(path, pd[i, "Filename"])
            status[i] <- sprintf("File not found: %s", f)
        } else {
            f <- file.path(path, dfiles[ff])
            names(intensityFiles)[i] <- dfiles[ff]
            status[i] <- tryCatch({
                out <- importFun(f)
                pos <- convertWellCoordinates(out[[1]]$well, dimPlate)$num
                intensityFiles[[i]] <- out[[2]]
                xraw[pos, pd$Plate[i], pd$Replicate[i], channel[i]] <- 
out[[1]]$val
                "OK"
            }, warning=function(e) paste(class(e)[1], e$message, sep=": "),
                                  error=function(e)   paste(class(e)[1], 
e$message, sep=": ")
                                  ) ## tryCatch
           
        } ## else
    } ## for

    if(verbose)
        cat("\rRead", nrow(pd), "plates.             \n\n")

    ## ----  Store the data as a "cellHTS" object ----
    ## arrange the assayData slot:
    dat <- lapply(seq_len(nrChannel), function(ch)
                  matrix(xraw[,,,ch], ncol=nrRep, nrow=nrWell*nrPlate))
    names(dat) <- paste("ch", seq_len(nrChannel), sep="")
   
    adata <- do.call("assayDataNew", c(storage.mode="lockedEnvironment", 
dat))
   
    ## arrange the phenoData slot:
    pdata <- new("AnnotatedDataFrame",
                 data <- data.frame(replicate=seq_len(nrRep),
                                    assay=rep(name, nrRep),
                                    stringsAsFactors=FALSE),
                 varMetadata=data.frame(labelDescription=c("Replicate 
number",
                                                           "Biological 
assay"),
                                        channel=factor(rep("_ALL_", 2L),
                                                       
levels=c(names(dat), "_ALL_")),
                                        row.names=c("replicate", "assay"),
                                        stringsAsFactors=FALSE))

    ## arrange the featureData slot:
    well <- convertWellCoordinates(seq_len(nrWell), pdim=dimPlate)$letnum
    fdata <- new("AnnotatedDataFrame",
                 data <- data.frame(plate=rep(seq_len(nrPlate), 
each=nrWell),
                                    well=rep(well, nrPlate),
                                    controlStatus=factor(rep("unknown", 
nrWell*nrPlate)),
                                    stringsAsFactors=FALSE),
                 varMetadata=data.frame(labelDescription=c("Plate 
number", "Well ID",
                                                           "Well 
annotation"),
                                        row.names=c("plate", "well", 
"controlStatus"),
                                        stringsAsFactors=FALSE))
    res <- new("cellHTS",
               assayData=adata,
               phenoData=pdata,
               featureData=fdata,
               plateList=cbind(pd[,1L,drop=FALSE], status=I(status), 
pd[,-1L,drop=FALSE]),
               intensityFiles=intensityFiles)

    ## if there is a batch column in the platelist file we want to import it
    if("Batch" %in% colnames(pd)){
        bat <- pd$Batch[order(pd$Replicate, pd$Channel)]
        dim(bat) <- c(max(plate(res)), ncol(res), length(channelNames(res)))
        res at batch <- bat
    }
   
    ## output the possible errors that were encountered along the way:
    whHadProbs <- which(status!="OK")
    if(length(whHadProbs)>0 & verbose) {
        idx <- whHadProbs[1:min(5, length(whHadProbs))]
        msg <- paste("Please check the following problems encountered 
while reading the data:\n",
                     "\tFilename \t Error\n", "\t",
                     paste(plateList(res)$Filename[idx], status[idx], 
sep="\t", collapse="\n\t"),
                     if(length(whHadProbs)>5) sprintf("\n\t...and %d 
more.\n",
                                                      
length(whHadProbs)-5), "\n", sep="")
        stop(msg)
    }

    return(res)
}



Yan Zhou wrote:
> Dear Florian,
>
> I understand the different meaning of batch from the 2 cellHTS 
> packages. I just don't know how to add the "batch" slot.(we have a 
> large screen with screen done on different day,we want to use the 
> variance adjustment by batch function). I tried to add a "batch" 
> column in the plate configuration file, but doesn't seem to be taken 
> into cellHTS2 object. then I tried to add "batch" column in the plate 
> list file , still didn't do anything. In another word, I couldn't 
> figure out how to add the batch slot to the cellHTS object. please 
> give more detail. Thank you so much!
>
> yan
>
> Florian Hahne wrote:
>
>> Hi Yan,
>> Ligia forwarded your mail to me.
>>
>> The batch concept is a little bit different in cellHTS2. Basically, 
>> we separated the ability to included several plate configurations and 
>> the batch-specific parameter estimation (e.g. in the normalizePlate 
>> function). For the former, you can use a regular expression syntax in 
>> your plate configuration file, while the latter has to be added 
>> manually into the new 'batch' slot. All of this is explained in much 
>> more detail in the package vignette: cellHTS2 - Main vignette 
>> (complete version): End-to-end analysis of cell-based screens
>>
>> Let me know if this doesn't get you anywhere.
>>
>> Florian
>>
>> ligia at ebi.ac.uk wrote:
>>
>>> Hi Florian,
>>>
>>> Hope you're doing fine.
>>> Could you take care of this for me?
>>> Cheers,
>>> Ligia
>>>
>>>
>>> ------------------------- Original Message ----------------------------
>>> Subject: batch info for cellHTS
>>> From:    "Yan Zhou" <Yan.Zhou at fccc.edu>
>>> Date:    Fri, October 17, 2008 19:14
>>> To:      ligia at ebi.ac.uk
>>> -------------------------------------------------------------------------- 
>>>
>>>
>>> Dear Ligia,
>>>
>>> I'm using the cellHTS2 package for HTS data analysis. For the old
>>> cellHTS package, I knew how to incorporate the batch information. But
>>> for the new cellHTS2 package, I couldn't have it done right. I attached
>>> my plate configuration file and also plate list file with this email. I
>>> was wondering whether you would have time to help me take a look, and
>>> point me to the right directions. Thanks a lot for your time and 
>>> kind help.
>>>
>>> yan
>>>   
>>
>>
>>
>


-- 
Florian Hahne, PhD
Computational Biology Program
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-3148
fhahne at fhcrc.org



More information about the Bioconductor mailing list