[BioC] CDF for GeneChip miRNA 2 array - Is there a miRNA 3 CDF?

Fri Nov 9 00:01:28 CET 2012

Hi Stephen,

On 11/8/2012 5:25 PM, Stephen Turner wrote:
> Thanks much. I used read.celfiles() and rma() worked perfectly at this
> point. I will definitely take you up on help getting this to gel with
> the rest of my workflow.
>
> My next step with gene ST arrays is to annotate the expressionset
> object with fData, such that when I use topTable() later on, all my
> results are annotated. E.g.:
>
> ## Which annotation package are you using?
> eset at annotation
> annodb<- "hugene10sttranscriptcluster.db"
>
> ## Annotate the features
> ls(paste("package:", annodb, sep=""))
> ID<- featureNames(eset)
> Symbol<- as.character(lookUp(ID, annodb, "SYMBOL"))
> Name<- as.character(lookUp(ID, annodb, "GENENAME"))
> Entrez<- as.character(lookUp(ID, annodb, "ENTREZID"))
> tmp<- data.frame(ID=ID, Entrez=Entrez, Symbol=Symbol, Name=Name,
> stringsAsFactors=F)
> tmp[tmp=="NA"]<- NA
> fData(eset)<- tmp
>
> But I'm not sure what to do here because ls("package:pd.mirna.3.0")
> doesn't return what the typical hu/mogene10sttranscriptcluster.db DBs
> return.

Right. Note that something like the MoGene ST chip measures mRNA, 
whereas the mirna 3.0 measures miRNA, which is a completely different 
class of RNA. While some miRNAs have Entrez Gene IDs, they don't have 
symbols or names that I know of.

miRNAs target various mRNA species for either silencing (by binding to 
the mRNA transcript, making it double stranded in a particular region, 
thereby eliminating translation to protein) or for premature degradation.

To make things more complicated, the mRNA that are thought to be 
targeted by a given miRNA are based on one or more of sequence homology, 
conservation, thermodynamic properties and something else that escapes 
me right now. In other words, the targeting of mRNA by miRNA is almost 
always computationally derived. So depending on which algorithm (and 
what cutoffs you use), you can get from zero to thousands of mRNAs 
targeted by a given miRNA.

As an example, go here:

http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0003205

this is just some random miRNA I searched for. Now scroll down to the 
'Mature sequence' section, and click on some of the links for Predicted 
targets. Fun, huh?

Also note that the miR 3.0 chip has miRNA for lots of different species, 
as well as the hairpin configuration (which AFAICT is all garbage, but 
YMMV). So you may or may not want to be filtering out miRNA for 
uninteresting species, depending on whether or not you (or your PI) 
think a particular miRNA from say M. nemestrina is also expressed in the 
species you are working with.

Also note that RMA is sort of silly for these arrays anyway. A mature 
miRNA is 21-23 bases long, and the affy chip uses 25 mers. So the 
replicate probes in a probeset are usually just the same thing in a 
different place on the chip. You could make the argument that the 
algorithm used in the miRNA QC tool that Affy will give you for free 
does a better job.

So is the goal here to just find differentially expressed miRNAs?

Best,

Jim

>
> Many thanks,
>
> Stephen
>
> On Thu, Nov 8, 2012 at 10:32 AM, Benilton Carvalho
> <beniltoncarvalho at gmail.com>  wrote:
>> The problem is that you have both affy and oligo loaded simultaneously (I'll
>> add this to my todo list, so in the future users do not need to worry about
>> it).
>>
>> Option 1)  (don't load oligo)
>>
>> By using ReadAffy(), you're importing the data via affy package, which does
>> not know how to handle miRNA-3.0 arrays.
>>
>> If you rather stick to your original workflow, you'd need to follow the
>> "unrecommended" path of converting a PGF to a CDF (I rather not say much
>> about this), and then build the required annotation packages yourself.
>>
>>
>> Option 2) (don't load affy)  (disclaimer: I'm the author of oligo)
>>
>> If you don't load affy and use read.celfiles (from oligo), you'll get the
>> rma() part done easily. At this point, I'd be happy to work with you to
>> incorporate tools to simplify the use of the other packages that you have in
>> your workflow.
>>
>>
>> best,
>> benilton
>>
>>
>> On 8 November 2012 15:12, Stephen Turner<vustephen at gmail.com>  wrote:
>>> Just wanted to resurrect this issue. I routinely analyze gene 1.0 ST
>>> chips in my core, but this is the first time I'm looking at the miRNA
>>> 3.0 chip (or any Affy miRNA chip for that matter).
>>>
>>> I understand that there's no 3.0 CDF environment available. How might
>>> I go about building one and incorporating that into my workflow?
>>>
>>> My typical [Hu/Mo]Gene 1.0 ST workflow goes something like this:
>>>
>>> ############################################
>>> ## Load data
>>> affybatch<- ReadAffy(filenames)
>>> eset<- rma(affybatch)
>>>
>>> ## Annotate
>>> ID<- featureNames(eset)
>>> Symbol<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>> "SYMBOL"))
>>> Name<- as.character(lookUp(ID, "hugene10sttranscriptcluster.db",
>>> "GENENAME"))
>>> fData(eset)<- data.frame(ID=ID, Symbol=Symbol, Name=Name)
>>>
>>> ## Typical QC with arrayQualityMetrics and analysis with limma
>>> ############################################
>>>
>>> I'm getting this error when using rma() on the affybatch object:
>>>
>>>> rma(affybatch)
>>> Error in function (classes, fdef, mtable)  :
>>>    unable to find an inherited method for function "rma", for signature
>>> "AffyBatch"
>>>
>>> And additionally when I try to view the affybatch:
>>>
>>> AffyBatch object
>>> size of arrays=541x541 features (19 kb)
>>> cdf=miRNA-3_0 (??? affyids)
>>> number of samples=6
>>> Error in getCdfInfo(object) :
>>>    Could not obtain CDF environment, problems encountered:
>>> Specified environment does not contain miRNA-3_0
>>> Library - package mirna30cdf not installed
>>> Bioconductor - mirna30cdf not available
>>>
>>> Thanks.
>>>
>>>
>>>> sessionInfo()
>>> R version 2.15.0 (2012-03-30)
>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>
>>> locale:
>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>
>>> attached base packages:
>>> [1] grid      stats     graphics  grDevices utils     datasets
>>> methods   base
>>>
>>> other attached packages:
>>>   [1] pd.mirna.3.0_3.6.0         oligo_1.22.0
>>> oligoClasses_1.20.0
>>>   [4] RSQLite_0.11.2             DBI_0.2-5
>>> biomaRt_2.14.0
>>>   [7] VennDiagram_1.5.1          SPIA_2.8.0
>>> pvclust_1.2-2
>>> [10] genefilter_1.40.0          gplots_2.11.0              MASS_7.3-22
>>> [13] KernSmooth_2.23-8          caTools_1.13
>>> bitops_1.0-4.1
>>> [16] gdata_2.12.0               gtools_2.7.0
>>> limma_3.14.1
>>> [19] arrayQualityMetrics_3.14.0 annotate_1.36.0
>>> AnnotationDbi_1.20.2
>>> [22] affy_1.36.0                Biobase_2.18.0
>>> BiocGenerics_0.4.0
>>> [25] BiocInstaller_1.8.3
>>>
>>> loaded via a namespace (and not attached):
>>>   [1] affxparser_1.30.0     affyio_1.26.0         affyPLM_1.34.0
>>> beadarray_2.8.1
>>>   [5] BeadDataPackR_1.10.0  Biostrings_2.26.2     bit_1.1-9
>>> Cairo_1.5-1
>>>   [9] cluster_1.14.3        codetools_0.2-8       colorspace_1.2-0
>>> ff_2.2-9
>>> [13] foreach_1.4.0         gcrma_2.30.0          GenomicRanges_1.10.3
>>> Hmisc_3.10-1
>>> [17] hwriter_1.3           IRanges_1.16.4        iterators_1.0.6
>>> lattice_0.20-10
>>> [21] latticeExtra_0.6-24   parallel_2.15.0       plyr_1.7.1
>>> preprocessCore_1.20.0
>>> [25] RColorBrewer_1.0-5    RCurl_1.95-1.1        reshape2_1.2.1
>>> setRNG_2011.11-2
>>> [29] splines_2.15.0        stats4_2.15.0         stringr_0.6.1
>>> survival_2.36-14
>>> [33] SVGAnnotation_0.93-1  tools_2.15.0          vsn_3.26.0
>>> XML_3.95-0.1
>>> [37] xtable_1.7-0          zlibbioc_1.4.0
>>>
>>>
>>> On Sat, Oct 13, 2012 at 12:56 AM, Dana Most<danamost at gmail.com>  wrote:
>>>> Hi All,
>>>>
>>>> Have you managed to find a cdf for the miRNA 3.0?
>>>> I keep getting the error : "...cdf=miRNA-3_0 (??? affyids)..."
>>>>
>>>> When I spoke to Affymetrix they said that the 3.0 version doesn't have a
>>>> .cdf and that a .cdf format wouldn't be compatible...
>>>> They said I should use the 'xps' package on the bioconductor website
>>>> together with a .pgf from their website.
>>>> 'xps' doesn't work with Windows 7, which unfortunately is what I have.
>>>>
>>>> Can anyone help me?
>>>>
>>>> Thanks,
>>>>
>>>> Dana
>>>>
>>>>          [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099