[BioC] Missing probesets when creating Affymetrix GeneChip miRNA 4.0 CDF package using makecdfenv package
Isaac Neuhaus
isaac.neuhaus at bms.com
Fri Jan 24 20:34:05 CET 2014
Lei Huang [guest] <guest at ...> writes:
>
>
> Dear all,
>
> I am working on a set of Affymetrix GeneChip miRNA 4.0 microarray data and
would like to perform
> differential expression analysis using Bioconductor packages. Since this
is a fairly new platform, no
> CDF and annotation packages are available in bioconductor repository at
the moment. Affymetrix folks
> kindly provided me miRNA 4.0 CDF file as well as sample CEL data. So I
desided to create a CDF package by my own
> using make.cdf.package() from makecdfenv package. I was able to make the
package and install it without
> trouble. However, after I read the raw CEL files and normalized the
affybatch with vsnrma()/rma(), I
> found the number of probesets is only 25065 while the number is 36249 in
original Affymetrix miRNA 4.0 CDF
> file. I am aware that from version 4, Affymetrix changed their naming
conve
> ntion for the probeset IDs, but this shouldn't cause the problem of
missing probesets. What I did wrong? I
> would really appreciate if anyone could give me some hints/advices on
solving this
> problem.
>
> -Lei
>
> --
> Lei Huang
> Center for Research Informatics
> Biological Science Division
> University of Chicago
> http://cri.uchicago.edu
> --
>
> P.S. The following are the code and output from my R session:
>
> > setwd("~/Documents/Project/mirna/GeneChip 4-0 Array Sample Data")
> > library(affy)
> > library(makecdfenv)
> Loading required package: affyio
> > pkgpath <- tempdir()
> > pname <- cleancdfname(whatcdf("20131118_Human-Brain-AM7962-
130ng_rep1_(miRNA-4_0).CEL"))
> > make.cdf.package("miRNA-4_0-st-v1.cdf",
> cdf.path="~/Documents/Project/mirna/miRNA-4_0-st-v1_CDF",
> + compress=FALSE, species = "", packagename=pname,
package.path = pkgpath)
> Reading CDF file.
> Creating CDF environment
> Wait for about 251
dots........................................................................
............................................................................
............................................................................
.............................
> Creating package in
/var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cdf
>
> README PLEASE:
> A source package has now been produced in
> /var/folders/rh/rrlg3bcs6kgcj89zm4mgjjxh0000gq/T//RtmpRos3Be/mirna40cdf.
> Before using this package it must be installed via 'R CMD INSTALL'
> at a terminal prompt (or DOS command shell).
> If you are using Windows, you will need to get set up to install packages.
> See the 'R Installation and Administration' manual, specifically
> Section 6 'Add-on Packages' as well as 'Appendix E: The Windows Toolset'
> for more information.
>
> Alternatively, you could use make.cdf.env(), which will not require you to
install a package.
> However, this environment will only persist for the current R session
> unless you save() it.
>
> ## install the cdf package from shell
> ## cd to mirna40cdf location
> ## R CMD INSTALL mirna40cdf
>
> > library(limma)
> > library(vsn)
> > library(mirna40cdf)
> >
> > affybatch <- ReadAffy(filenames=list.files())
> > affybatch <at> cdfName
> [1] "miRNA-4_0"
>
> ## normalization
> > eset.norm <- vsnrma(affybatch)
> vsn2: 292681 x 8 matrix (1 stratum).
> Please use 'meanSdPlot' to verify the fit.
> Calculating Expression
>
> ## only 25,065 probesets, the original Affymetrix cdf file contains 36,249
probesets
> > dim(eset.norm)
> Features Samples
> 25065 8
>
> -- output of sessionInfo():
>
> > sessionInfo()
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
base
>
> other attached packages:
> [1] mirna40cdf_1.38.0 AnnotationDbi_1.24.0 vsn_3.30.0
> [4] limma_3.18.9 makecdfenv_1.38.0 affyio_1.30.0
> [7] affy_1.40.0 Biobase_2.22.0 BiocGenerics_0.8.0
>
> loaded via a namespace (and not attached):
> [1] BiocInstaller_1.12.0 compiler_3.0.2 DBI_0.2-7
> [4] grid_3.0.2 IRanges_1.20.6 lattice_0.20-24
> [7] preprocessCore_1.24.0 RSQLite_0.11.4 stats4_3.0.2
> [10] tools_3.0.2 zlibbioc_1.8.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at ...
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
I came across a similar problem with a brainCDF where makecdfenv was
producing a package with less probesets. I believe the problem is in the c
code that does the parser of ASCII files since I was able to correct the
problem by converting the text CDF into binary and then read it with the
makecdfenv package
library("affxparser")
library(makecdfenv)
convertCdf("HGU133PLUS2_HS_REFSEQ.CDF", "hgu133plus2hsrefseqcdf", version=4,
verbose=TRUE)
make.cdf.package("hgu133plus2hsrefseqcdf", version =
packageDescription("makecdfenv", field = "Version"), species = "H. sapiens",
unlink = TRUE)
I hope this helps.
Isaac
More information about the Bioconductor
mailing list