[BioC] Encrypting IDAT files using IDATreader

Mike Smith grimbough at gmail.com
Fri Aug 31 12:03:13 CEST 2012


Hi Seungyeul,

First off I should just say that Illumina don't actually document the
idat file as far as I know, since they don't support the decryption
and extraction other than by using GenomeStudio, so these answers are
educated guesses.

The CodesBinData and IllumicodeBinData fields are equivalent to the
ProbeID that you encounter in the text version of the summary data.
I've no idea why it appears twice, but in the files I've looked at the
two columns have not differed.

You can use either of the MeanBinData columns as your measure of
expression.  I believe the TrimmedMeanBinData is the result of the
outlier removal that Illumina perform as standard, so if you want
consistency with GenomeStudio then go for that one.  You could also
use the MedianBinData if you prefer.

The data here is what we call "bead summary" data.  That means the
foreground and background intensities for all beads of a particular
type have been calculated, and each gets a final intensity of
foreground minus background.  Any that fall outside of three median
absolute deviations from the mean are then discarded, which is the
outlier removal step mentioned above.  Those that remain are then
averaged to get the final intensity for that bead type.  They have not
been normalized relative to other arrays, nor background corrected in
the sense of negative controls are used to find a baseline.

If you jump to the "Summary Data Analysis" section of this paper:

http://www.ncbi.nlm.nih.gov/pubmed/22144879

the details of the type of data are covered in more detail, along with
some suggestions for which software packages you could use.  They
don't specifically reference the use of idat files, but you can
transform the data.frame you've got into the inputs for them.  You
probably want to look at lumi and limma in particular.

Hope that helps,

To everyone else,

As far as incorporating the import routines into a standard package
goes, I'm happy to work with anyone that's interested.  I don't know
how much work it will be to try and distribute the relevant parts of
openssl will be, and whether we should even go down that route, but if
anyone more knowledgable than me has some suggestions I'm all ears.

Mike

On 27 August 2012 23:49, Yoo, Seungyeul <seungyeul.yoo at mssm.edu> wrote:
> Hi Mike,
>
> Thank you so much for your advice. I can now install it properly and
> succeeded to read a idat file.
>
> After I read a idat file, then it contains information something like
> following.
>
>> idat[1:10,]
>    MeanBinData TrimmedMeanBinData DevBinData MedianBinData BackgroundBinData
> BackgroundDevBinData CodesBinData NumBeadsBinData
> 1    577.37201          586.94641  102.94595     563.49585                 0
> 0        10008              21
> 2    105.32843          107.54086   17.36357     102.84825                 0
> 0        10010              18
> 3    105.47916          106.90141   23.27995      98.20106                 0
> 0        10014              26
> 4     96.03654           97.22727   12.01887      93.85848                 0
> 0        10017              18
> 5    101.15079          102.46756   14.26707     102.40277                 0
> 0        10019              21
> 6    110.97807          111.94042   15.54425     108.54636                 0
> 0        10020              25
> 7     93.36533           94.80687   12.88733      91.90852                 0
> 0        10021              17
> 8    103.91557          105.35213   19.37169     102.13950                 0
> 0        10025              22
> 9    105.39189          106.81569   12.94761     107.72005                 0
> 0        10026              19
> 10   100.90147          102.46033   18.38206      93.95204                 0
> 0        10035              16
>    NumGoodBeadsBinData IllumicodeBinData
> 1                   20             10008
> 2                   18             10010
> 3                   26             10014
> 4                   17             10017
> 5                   20             10019
> 6                   23             10020
> 7                   16             10021
> 8                   21             10025
> 9                   19             10026
> 10                  16             10035
>
> This is my first time to analyze gene expression of idat file, so please let
> me ask a few naive questions.
>
> 1) Is CodesBinData (which is identical with IllumicodeBinData here) same as
> probe ID?
> 2) What value would be compared with other samples as gene expression level?
> MeanBinData or TrimmedMeanBinData?
> 3) It this data processed or pre-processed (i.e background correction or
> normalization)?
> 4) Is there any other known package I can work on from this point to proceed
> analysis?
>
> Thank you so much for your help.
>
> Best regards,
>
> Seungyeul
>
>
> On Aug 25, 2012, at 3:12 PM, Mike Smith wrote:
>
> Hi Seungyeul,
>
> IDATreader's a pretty experimental package, but it's worked on the few
> systems I've tested it on.  The reliance on third party libraries is
> why I've not submitted it to Bioconductor, so they may be causing
> problems.
>
> I'm not sure why you're using source(), rather than loading the
> package via library().  Have you installed the package, or just
> unzipped it and grabbed the .R file?  It need to be installed in order
> to build the DLL and link against the decryption routines.  To do that
> you need to download the tar.gz from the website you linked to and
> then run
>
> R CMD INSTALL IDATreader_0.1.1.tar.gz
>
> in a terminal to installed the package.  You then need to load it using
>
> library(IDATreader)
>
> in your R session, rather than using source().
>
> Hopefully that's of some help,
>
> Mike
>
>
>
> On 22 August 2012 23:35, Yoo, Seungyeul <seungyeul.yoo at mssm.edu> wrote:
>
> Hi all,
>
>
> I'm trying to analyze Illumina array of gene expression. The file format is
> .idat which is encrypted data.
>
>
> I downloaded IDATreader packages as Mike suggested in some of previous post.
> I also install openssl for my mac os x.
> http://www.compbio.group.cam.ac.uk/software/idatreader
>
>
> But when I tried to read idat file error message is prompted.
>
>
> source("/Library/R/readIDAT.R")
>
> filenames<-dir(patter="idat")
>
> idat<-readIDAT(file=filenames[1])
>
>
> idat<-readIDAT(file=filenames[1])
>
> Decrypting to XML
>
> Error in .C("decryptSSL", as.character(file), as.character(tempFile),  :
>
>  C symbol name "decryptSSL" not in DLL for package "IDATreader"
>
>
> sessionInfo()
>
> R version 2.15.0 (2012-03-30)
>
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
>
> locale:
>
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
>
> attached base packages:
>
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
>
> other attached packages:
>
> [1] BiocInstaller_1.4.7
>
>
> loaded via a namespace (and not attached):
>
> [1] tools_2.15.0
>
>
> Please let me have any advices to solve this problem.
>
>
> Seungyeul Yoo
>
>
> Postdoctoral Fellow
>
> Jun Zhu's Laboratory
>
> Institute of Genomics and Multiscale Biology
>
> Department of Genetics and Genomic Sciences
>
> Mount Sinai School of Medicine
>
>
>
>
>
>
>        [[alternative HTML version deleted]]
>
>
> _______________________________________________
>
> Bioconductor mailing list
>
> Bioconductor at r-project.org
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> --
> Mike Smith
> PhD Student
> Computational Biology Group
> Cambridge University
>
>



-- 
Mike Smith
PhD Student
Computational Biology Group
Cambridge University



More information about the Bioconductor mailing list