[BioC] beadarray readIllumina suggestions
Keith James
kdj at sanger.ac.uk
Mon Jun 25 15:07:57 CEST 2007
I am reading single channel bead level data using the readIllumina function.
The docs indicate that there is a path parameter:
path character string specifying the location of files to be read by the
function
Calling the function with a path argument results in an error:
readIllumina(path = "/path/to/data", txtType = ".txt")
Error in strtrim(x, width) : invalid 'width' argument
> traceback()
4: strtrim(xyFiles, nchar(xyFiles) - 4)
3: as.vector(y)
2: intersect(strtrim(GImages, nchar(GImages) - 8), strtrim(xyFiles,
nchar(xyFiles) - 4))
1: readIllumina(path = "/path/to/data", txtType = ".txt")
I've seen in another post that readIllumina expects the files to be in the
working directory, and this is the case since this line in the function
relies on the default path for dir calls:
GImages = dir(pattern = "_Grn.tif")
At first I took this to be a documentation bug, but in fact the path argument
is honoured for loading the csv files:
file = csv_files[i]
if (!is.null(path))
file = file.path(path, file)
and the annotation (.opa file):
if (!is.null(path))
annoFile = file.path(path, annoFile)
but apparently not for loading the metrics file:
if (metrics) {
metrics = dir(pattern = metricsFile)
My suggestion is to make the behaviour consistent across all the data, i.e. to
honour a path argument for tif and metrics files.
It is probably worth noting in the docs the assumptions the function makes
about the files it expects in the data directory. i.e. that *all* tif images
will be loaded and *all* .txt files. My data directory contained other .tif
and .txt files ("targets.txt", "notes.txt") which caused the function to
choke. I think that it is optimistic to assume that users will have no other
such files present.
In addition, I wonder whether the column names in the csv/txt files vary with
the version of the scanner or scanner software. Instead of
ProbeID G Gb GrnX GrnY
or similar, we have
Code Grn GrnX GrnY
So, 4 columns rather than 3 or 5.
Finally, we appreciate all the work you've done in enabling us to work with
raw Illumina data. Many thanks.
> sessionInfo()
R version 2.5.0 (2007-04-23)
i686-pc-linux-gnu
locale:
LC_CTYPE=en_GB;LC_NUMERIC=C;LC_TIME=en_GB;LC_COLLATE=en_GB;LC_MONETARY=en_GB;LC_MESSAGES=en_GB;LC_PAPER=en_GB;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB;LC_IDENTIFICATION=C
attached base packages:
[1] "grid" "tools" "stats" "graphics" "grDevices" "utils"
[7] "datasets" "methods" "base"
other attached packages:
beadarray beadarraySNP quantsmooth lodplot quantreg SparseM
"1.4.0" "1.2.0" "1.2.0" "1.1" "4.06" "0.73"
affy affyio geneplotter lattice annotate Biobase
"1.14.0" "1.4.0" "1.14.0" "0.15-4" "1.14.1" "1.14.0"
limma
"2.10.0"
--
- Keith James <kdj at sanger.ac.uk> Microarray Informatics Group -
- The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK -
--
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Bioconductor
mailing list