[BioC] beadarray readIllumina suggestions

Mon Jun 25 15:07:57 CEST 2007

I am reading single channel bead level data using the readIllumina function. 
The docs indicate that there is a path parameter:

path 	character string specifying the location of files to be read by the 
function

Calling the function with a path argument results in an error:

readIllumina(path = "/path/to/data", txtType = ".txt")
Error in strtrim(x, width) : invalid 'width' argument
> traceback()
4: strtrim(xyFiles, nchar(xyFiles) - 4)
3: as.vector(y)
2: intersect(strtrim(GImages, nchar(GImages) - 8), strtrim(xyFiles, 
       nchar(xyFiles) - 4))
1: readIllumina(path = "/path/to/data", txtType = ".txt")

I've seen in another post that readIllumina expects the files to be in the 
working directory, and this is the case since this line in the function 
relies on the default path for dir calls:

GImages = dir(pattern = "_Grn.tif")

At first I took this to be a documentation bug, but in fact the path argument 
is honoured for loading the csv files:

 file = csv_files[i]
        if (!is.null(path)) 
            file = file.path(path, file)

and the annotation (.opa file):

 if (!is.null(path)) 
            annoFile = file.path(path, annoFile)

 but apparently not for loading the metrics file:

 if (metrics) {
         metrics = dir(pattern = metricsFile)

My suggestion is to make the behaviour consistent across all the data, i.e. to 
honour a path argument for tif and metrics files.

It is probably worth noting in the docs the assumptions the function makes 
about the files it expects in the data directory. i.e. that *all* tif images 
will be loaded and *all* .txt files. My data directory contained other .tif 
and .txt files ("targets.txt",  "notes.txt") which caused the function to 
choke. I think that it is optimistic to assume that users will have no other 
such files present.

In addition, I wonder whether the column names in the csv/txt files vary with 
the version of the scanner or scanner software. Instead of

ProbeID G Gb GrnX GrnY

or similar, we have

Code    Grn     GrnX    GrnY

So, 4 columns rather than 3 or 5.

Finally, we appreciate all the work you've done in enabling us to work with 
raw Illumina data. Many thanks.

> sessionInfo()
R version 2.5.0 (2007-04-23) 
i686-pc-linux-gnu 

locale:
LC_CTYPE=en_GB;LC_NUMERIC=C;LC_TIME=en_GB;LC_COLLATE=en_GB;LC_MONETARY=en_GB;LC_MESSAGES=en_GB;LC_PAPER=en_GB;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB;LC_IDENTIFICATION=C

attached base packages:
[1] "grid"      "tools"     "stats"     "graphics"  "grDevices" "utils"    
[7] "datasets"  "methods"   "base"     

other attached packages:
   beadarray beadarraySNP  quantsmooth      lodplot     quantreg      SparseM 
     "1.4.0"      "1.2.0"      "1.2.0"        "1.1"       "4.06"       "0.73" 
        affy       affyio  geneplotter      lattice     annotate      Biobase 
    "1.14.0"      "1.4.0"     "1.14.0"     "0.15-4"     "1.14.1"     "1.14.0" 
       limma 
    "2.10.0" 

-- 

- Keith James <kdj at sanger.ac.uk> Microarray Informatics Group -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -

-- 
The Wellcome Trust Sanger Institute is operated by Genome Research 
Limited, a charity registered in England with number 1021457 and a 
company registered in England with number 2742969, whose registered 
office is 215 Euston Road, London, NW1 2BE.