[BioC] continued problems with ReadAffy

Tue Feb 7 17:22:19 CET 2006

Hi Mark,

I recently fixed read.affybatch() to conform to the new object checking, 
but the fix hasn't propagated to the download server yet.

As a temporary fix, you can load affy and then paste the following 
function into your R session (or if you are set up to build from source, 
just replace this function in read.affybatch.R in the source package and 
build the package).

read.affybatch <- function(..., filenames=character(0),
                            phenoData=new("phenoData"),
                            description=NULL,
                            notes="",
                            compress = getOption("BioC")$affy$compress.cel,
                            rm.mask = FALSE, rm.outliers=FALSE, 
rm.extra=FALSE,
                            verbose = FALSE,sd=FALSE, cdfname = NULL) {

   auxnames <- as.list(substitute(list(...)))[-1]
   filenames <- .Primitive("c")(filenames, auxnames)

   n <- length(filenames)

   ## error if no file name !
   if (n == 0)
     stop("No file name given !")

   pdata <- pData(phenoData)
   ##try to read sample names form phenoData. if not there use CEL filenames
   if(dim(pdata)[1] != n) {
     ##if empty pdata filename are samplenames
     warning("Incompatible phenoData object. Created a new one.\n")

     samplenames <- sub("^/?([^/]*/)*", "", unlist(filenames), 
extended=TRUE)
     pdata <- data.frame(sample=1:n, row.names=samplenames)
     phenoData <- 
new("phenoData",pData=pdata,varLabels=list(sample="arbitrary numbering"))
   }
   else samplenames <- rownames(pdata)

   if (is.null(description))
     {
       description <- new("MIAME")
       description at preprocessing$filenames <- filenames
       description at preprocessing$affyversion <- 
library(help=affy)$info[[2]][[2]][2]
     }
   ## read the first file to see what we have
   if (verbose) cat(1, "reading",filenames[[1]],"...")

   headdetails <- .Call("ReadHeader",filenames[[1]],compress, 
PACKAGE="affy")

   #print(headdetails)

   ##now we use the length
   dim.intensity <- headdetails[[2]]   ##dim(intensity(cel))
   ##and the cdfname as ref
   ref.cdfName <- headdetails[[1]]   #cel at cdfName

   ## allow for non-standard cdfs
   if(is.null(cdfname))
     cdfname <- ref.cdfName

   if (verbose)
     cat(paste("instantiating an AffyBatch (intensity a ", 
prod(dim.intensity), "x", length(filenames), " matrix)...", sep=""))

   if (verbose)
     cat("done.\n")

   ## Change sampleNames to be consistent with row.names of phenoData object

   exprs <- .Call("read_abatch",filenames,compress, rm.mask,
                rm.outliers, rm.extra, ref.cdfName,
                dim.intensity,verbose, PACKAGE="affy")
   colnames(exprs) <- samplenames

   #### this is where the code changes from the original read.affybatch.
   #### what we will do here is read in from the 1st to the nth CEL file
   if (!sd){
     return(new("AffyBatch",
                exprs  = exprs,
                ##se.exprs = array(NaN, dim=dim.sd),
                cdfName    = cdfname,   ##cel at cdfName,
                phenoData  = phenoData,
                nrow       = dim.intensity[1],
                ncol       = dim.intensity[2],
                annotation = cleancdfname(cdfname, addcdf=FALSE),
                description= description,
                notes      = notes))
   } else {
     return(new("AffyBatch",
                exprs  = exprs,
                se.exprs = 
.Call("read_abatch_stddev",filenames,compress, rm.mask,
                rm.outliers, rm.extra, ref.cdfName,
                dim.intensity,verbose, PACKAGE="affy"),
                cdfName    = cdfname,   ##cel at cdfName,
                phenoData  = phenoData,
                nrow       = dim.intensity[1],
                ncol       = dim.intensity[2],
                annotation = cleancdfname(cdfname, addcdf=FALSE),
                description= description,
                notes      = notes))
   }

}

HTH,

Jim

Kimpel, Mark William wrote:
> After thinking that I had the problem solved, I am continuing to have
> problems with "ReadAffy". The default options of the function are still
> leading to the error " sampleNames different from names of phenoData
> rows" and I am unable to figure out its cause. See below for output and
> sessionInfo(). It appears to me that the samplenames and rownames of
> phenodata are identical and I have used setdiff() to verify this (output
> is character(0) ).
> 
> Ideas?  Thanks, Mark
> 
> 
>>ReadAffy()
> 
> Error in validObject(.Object) : invalid class "AffyBatch" object:
> sampleNames different from names of phenoData rows
> 
> Enter a frame number, or 0 to exit   
> 
> 1: ReadAffy()
> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
> description = l$description, notes = notes, compress = compress, r
> 3: new("AffyBatch", exprs = .Call("read_abatch", filenames, compress,
> rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity, verb
> 4: initialize(value, ...)
> 5: initialize(value, ...)
> 6: validObject(.Object)
> 
> Selection: 2
> Called from: eval(expr, envir, enclos)
> Browse[1]> ls()
>  [1] "auxnames"      "cdfname"       "compress"      "description"
> "dim.intensity" "filenames"     "headdetails"  
>  [8] "n"             "notes"         "pdata"         "phenoData"
> "ref.cdfName"   "rm.extra"      "rm.mask"      
> [15] "rm.outliers"   "samplenames"   "sd"            "verbose"      
> Browse[1]> samplenames
>  [1] "BB01R018.CEL" "BB01R019.CEL" "BB01R020.CEL" "BB01R021.CEL"
> "BB01R022.CEL" "BB01R023.CEL" "BB01R024.CEL" "BB01R025.CEL"
>  [9] "BB01R026.CEL" "BB01R027.CEL" "BB01R029.CEL" "BB01R030.CEL"
> "BB01R031.CEL" "BB01R032.CEL" "BB01R033.CEL" "BB01R034.CEL"
> Browse[1]> pdata
>              sample
> BB01R018.CEL      1
> BB01R019.CEL      2
> BB01R020.CEL      3
> BB01R021.CEL      4
> BB01R022.CEL      5
> BB01R023.CEL      6
> BB01R024.CEL      7
> BB01R025.CEL      8
> BB01R026.CEL      9
> BB01R027.CEL     10
> BB01R029.CEL     11
> BB01R030.CEL     12
> BB01R031.CEL     13
> BB01R032.CEL     14
> BB01R033.CEL     15
> BB01R034.CEL     16
> Browse[1]> c
> 
> Enter a frame number, or 0 to exit   
> 
> 1: ReadAffy()
> 2: read.affybatch(filenames = l$filenames, phenoData = l$phenoData,
> description = l$description, notes = notes, compress = compress, r
> 3: new("AffyBatch", exprs = .Call("read_abatch", filenames, compress,
> rm.mask, rm.outliers, rm.extra, ref.cdfName, dim.intensity, verb
> 4: initialize(value, ...)
> 5: initialize(value, ...)
> 6: validObject(.Object)
> 
> Selection: 0
> 
>>sessionInfo()
> 
> Version 2.3.0 Under development (unstable) (2006-01-01 r36947) 
> i386-pc-mingw32 
> 
> attached base packages:
> [1] "tools"     "methods"   "stats"     "graphics"  "grDevices" "utils"
> "datasets"  "base"     
> 
> other attached packages:
>    affy Biobase RWinEdt 
> "1.9.7" "1.9.6" "1.7-4"
> 
> Mark W. Kimpel MD 
> 
>  
> 
> Official Business Address:
> 
>  
> 
> Department of Psychiatry
> 
> Indiana University School of Medicine
> 
> Biotechnology, Research, & Training Center
> 
> 1345 W. 16th Street
> 
> Indianapolis, IN  46202
> 
>  
> 
> Preferred Mailing Address:
> 
>  
> 
> 15032 Hunter Court
> 
> Westfield, IN  46074
> 
>  
> 
> (317) 490-5129 Home, Work, & Mobile
> 
> 1-(317)-536-2730 FAX
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623