[BioC] Encrypting IDAT files using IDATreader
Mike Smith
grimbough at gmail.com
Fri Aug 31 12:03:13 CEST 2012
Hi Seungyeul,
First off I should just say that Illumina don't actually document the
idat file as far as I know, since they don't support the decryption
and extraction other than by using GenomeStudio, so these answers are
educated guesses.
The CodesBinData and IllumicodeBinData fields are equivalent to the
ProbeID that you encounter in the text version of the summary data.
I've no idea why it appears twice, but in the files I've looked at the
two columns have not differed.
You can use either of the MeanBinData columns as your measure of
expression. I believe the TrimmedMeanBinData is the result of the
outlier removal that Illumina perform as standard, so if you want
consistency with GenomeStudio then go for that one. You could also
use the MedianBinData if you prefer.
The data here is what we call "bead summary" data. That means the
foreground and background intensities for all beads of a particular
type have been calculated, and each gets a final intensity of
foreground minus background. Any that fall outside of three median
absolute deviations from the mean are then discarded, which is the
outlier removal step mentioned above. Those that remain are then
averaged to get the final intensity for that bead type. They have not
been normalized relative to other arrays, nor background corrected in
the sense of negative controls are used to find a baseline.
If you jump to the "Summary Data Analysis" section of this paper:
http://www.ncbi.nlm.nih.gov/pubmed/22144879
the details of the type of data are covered in more detail, along with
some suggestions for which software packages you could use. They
don't specifically reference the use of idat files, but you can
transform the data.frame you've got into the inputs for them. You
probably want to look at lumi and limma in particular.
Hope that helps,
To everyone else,
As far as incorporating the import routines into a standard package
goes, I'm happy to work with anyone that's interested. I don't know
how much work it will be to try and distribute the relevant parts of
openssl will be, and whether we should even go down that route, but if
anyone more knowledgable than me has some suggestions I'm all ears.
Mike
On 27 August 2012 23:49, Yoo, Seungyeul <seungyeul.yoo at mssm.edu> wrote:
> Hi Mike,
>
> Thank you so much for your advice. I can now install it properly and
> succeeded to read a idat file.
>
> After I read a idat file, then it contains information something like
> following.
>
>> idat[1:10,]
> MeanBinData TrimmedMeanBinData DevBinData MedianBinData BackgroundBinData
> BackgroundDevBinData CodesBinData NumBeadsBinData
> 1 577.37201 586.94641 102.94595 563.49585 0
> 0 10008 21
> 2 105.32843 107.54086 17.36357 102.84825 0
> 0 10010 18
> 3 105.47916 106.90141 23.27995 98.20106 0
> 0 10014 26
> 4 96.03654 97.22727 12.01887 93.85848 0
> 0 10017 18
> 5 101.15079 102.46756 14.26707 102.40277 0
> 0 10019 21
> 6 110.97807 111.94042 15.54425 108.54636 0
> 0 10020 25
> 7 93.36533 94.80687 12.88733 91.90852 0
> 0 10021 17
> 8 103.91557 105.35213 19.37169 102.13950 0
> 0 10025 22
> 9 105.39189 106.81569 12.94761 107.72005 0
> 0 10026 19
> 10 100.90147 102.46033 18.38206 93.95204 0
> 0 10035 16
> NumGoodBeadsBinData IllumicodeBinData
> 1 20 10008
> 2 18 10010
> 3 26 10014
> 4 17 10017
> 5 20 10019
> 6 23 10020
> 7 16 10021
> 8 21 10025
> 9 19 10026
> 10 16 10035
>
> This is my first time to analyze gene expression of idat file, so please let
> me ask a few naive questions.
>
> 1) Is CodesBinData (which is identical with IllumicodeBinData here) same as
> probe ID?
> 2) What value would be compared with other samples as gene expression level?
> MeanBinData or TrimmedMeanBinData?
> 3) It this data processed or pre-processed (i.e background correction or
> normalization)?
> 4) Is there any other known package I can work on from this point to proceed
> analysis?
>
> Thank you so much for your help.
>
> Best regards,
>
> Seungyeul
>
>
> On Aug 25, 2012, at 3:12 PM, Mike Smith wrote:
>
> Hi Seungyeul,
>
> IDATreader's a pretty experimental package, but it's worked on the few
> systems I've tested it on. The reliance on third party libraries is
> why I've not submitted it to Bioconductor, so they may be causing
> problems.
>
> I'm not sure why you're using source(), rather than loading the
> package via library(). Have you installed the package, or just
> unzipped it and grabbed the .R file? It need to be installed in order
> to build the DLL and link against the decryption routines. To do that
> you need to download the tar.gz from the website you linked to and
> then run
>
> R CMD INSTALL IDATreader_0.1.1.tar.gz
>
> in a terminal to installed the package. You then need to load it using
>
> library(IDATreader)
>
> in your R session, rather than using source().
>
> Hopefully that's of some help,
>
> Mike
>
>
>
> On 22 August 2012 23:35, Yoo, Seungyeul <seungyeul.yoo at mssm.edu> wrote:
>
> Hi all,
>
>
> I'm trying to analyze Illumina array of gene expression. The file format is
> .idat which is encrypted data.
>
>
> I downloaded IDATreader packages as Mike suggested in some of previous post.
> I also install openssl for my mac os x.
> http://www.compbio.group.cam.ac.uk/software/idatreader
>
>
> But when I tried to read idat file error message is prompted.
>
>
> source("/Library/R/readIDAT.R")
>
> filenames<-dir(patter="idat")
>
> idat<-readIDAT(file=filenames[1])
>
>
> idat<-readIDAT(file=filenames[1])
>
> Decrypting to XML
>
> Error in .C("decryptSSL", as.character(file), as.character(tempFile), :
>
> C symbol name "decryptSSL" not in DLL for package "IDATreader"
>
>
> sessionInfo()
>
> R version 2.15.0 (2012-03-30)
>
> Platform: i386-apple-darwin9.8.0/i386 (32-bit)
>
>
> locale:
>
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
>
> attached base packages:
>
> [1] stats graphics grDevices utils datasets methods base
>
>
> other attached packages:
>
> [1] BiocInstaller_1.4.7
>
>
> loaded via a namespace (and not attached):
>
> [1] tools_2.15.0
>
>
> Please let me have any advices to solve this problem.
>
>
> Seungyeul Yoo
>
>
> Postdoctoral Fellow
>
> Jun Zhu's Laboratory
>
> Institute of Genomics and Multiscale Biology
>
> Department of Genetics and Genomic Sciences
>
> Mount Sinai School of Medicine
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
>
> Bioconductor mailing list
>
> Bioconductor at r-project.org
>
> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
> --
> Mike Smith
> PhD Student
> Computational Biology Group
> Cambridge University
>
>
--
Mike Smith
PhD Student
Computational Biology Group
Cambridge University
More information about the Bioconductor
mailing list