[BioC] bead-level data from Infinium methylation arrays
Tim Triche, Jr.
ttriche at usc.edu
Thu Jul 9 20:07:08 CEST 2009
On Wed, Jul 8, 2009 at 2:16 AM, Mark Dunning<mark.dunning at gmail.com> wrote:
> Hi Tim,
>
> Do you know what scanning software was used to create these bead-level
> data? BeadScan or the newer iScan system? I'm wondering if the format
> of the files has changed since we wrote readIllumina. When the object
> 'dat1' is created in readIllumina it assumes a set number of columns
> in the bead-level text files (4,6 or 7) so if the number of columns is
> something different then this dat1 object will not be created causing
> the function to error.
I confirmed with the staff of the data production facility that my
files are from BeadScan. I don't yet have a copy of the settings.xml
file in use, or changes to it, but I'll get one. I have attached
other files suggested by you and Dr. Carey, along with a feeble patch
I wrote.
The files I have are chipnumber_array_color.(idat|xml|locs|tif),
chipnumber_array.txt, and chipnumber.sdf for each chip, along with a
Metrics.txt file, a manifest file (Excel, but I converted it to CSV in
hopes of turning it into an annotation package), and a targets.txt
file which I wrote in the format shown by the example bead-level-data
in the vignette.
The .txt files with which I am provided have only the columns 'Code',
'Grn', and 'Red' (all with integer-valued contents). If I'm not hosed
-- if the .txt and .tif files are enough -- could anyone provide a bit
of guidance in terms of where I should start hacking? I'm not averse
to monkeying around in the C code but I don't know where I should look
first.
I did write a simple kludge to read in Infinium two-channel data. It
is not clever, just a small patch to readIllumina to deal with the
3-column format I have. Nonetheless it causes the package to inspect
the .tif files, putting quite a strain on my pokey laptop. Then an
error (and not the one I added as a checkpoint) is thrown:
Error in data[, 2] = bgCorrectSingleArray(fg = greenIntensities[[5]], :
replacement has length zero
I didn't request background correction, for what that's worth.
The lack of useful X,Y location information seems to be the culprit
here. I am not sure how best to fix this. Files with the extension
.locs are provided, but I could not find useful specs on this file
format. Am I stymied with regards to accessing the bead-level data?
(A presentation by Matt Ritchie at Cambridge hinted that this may be
the case. Dr. Carey's reply suggested that perhaps the oft-changing
Illumina file formats might also be involved.)
I could request that the core facility not default to these
proprietary formats, if that is an insurmountable obstacle. Have
others found themselves in this situation before?
Thanks for any suggestions,
--tim
-------------- next part --------------
Code Grn Red
10008 106 1847
10008 139 1680
10008 135 1675
10008 52 1315
10008 59 1832
10008 96 1250
10008 65 1314
10008 66 1457
10008 85 1560
-------------- next part --------------
4321207025_A_Grn.idat
4321207025_A_Grn.locs
4321207025_A_Grn.tif
4321207025_A_Grn.xml
4321207025_A_Red.idat
4321207025_A_Red.locs
4321207025_A_Red.tif
4321207025_A_Red.xml
4321207025_A.txt
4321207025_B_Grn.idat
4321207025_B_Grn.locs
4321207025_B_Grn.tif
4321207025_B_Grn.xml
4321207025_B_Red.idat
4321207025_B_Red.locs
4321207025_B_Red.tif
4321207025_B_Red.xml
4321207025_B.txt
4321207025.sdf
files.txt
Metrics.txt
probe_sequences.csv
readIllumina.diff
readIllumina.orig.R
readIllumina.patched.R
targets.txt
-------------- next part --------------
R version 2.10.0 Under development (unstable) (2009-06-25 r48836)
i686-pc-linux-gnu
locale:
[1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
[3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
[5] LC_MONETARY=C LC_MESSAGES=en_US.utf8
[7] LC_PAPER=en_US.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] beadarray_1.13.4 Biobase_2.5.4 sandwich_2.2-1 zoo_1.5-6
[5] Design_2.2-0 survival_2.35-4 Hmisc_3.6-0
loaded via a namespace (and not attached):
[1] cluster_1.12.0 grid_2.10.0 hwriter_1.1 lattice_0.17-25
[5] limma_2.19.2 tools_2.10.0
More information about the Bioconductor
mailing list