[BioC] how to extract probes for a probeset from PdInfo database?

Hooiveld, Guido guido.hooiveld at wur.nl
Tue Jan 21 22:38:46 CET 2014

Hi Di,
Thanks a lot for your feedback regarding this query.
I am indeed aware of the annotation file you mentioned, from which I already had extracted the IDs of the relevant *probe sets*.
The problem I am (was?) facing now is how to get a list the corresponding *probes* that comprise those sets. A CDF is unfortunately not provided for this array, so I have to get used to working with the PdInfo packages.


-----Original Message-----
From: Wu, Di [mailto:dwu at fas.harvard.edu] 
Sent: Tuesday, January 21, 2014 18:49
To: Hooiveld, Guido; bioconductor at r-project.org
Subject: RE: how to extract probes for a probeset from PdInfo database?

Hi Guido,

See if the following annotation file is what can help you. 

(Additinal Support)
miRNA 3.1 Annotations, Unsupported, CSV format

Di Wu
Postdoctoral fellow
Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA

From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Hooiveld, Guido [guido.hooiveld at wur.nl]
Sent: Tuesday, January 21, 2014 11:50 AM
To: bioconductor at r-project.org
Subject: [BioC] how to extract probes for a probeset from PdInfo database?

I would like to extract the probes that belong to a set of probesets from a PdInfo database, but despite searching the archives I got stuck... I would appreciate some hints.

To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would like to extract all probes that belong to e.g. a set of affy control probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX-BkGr17-GC11_st.
This is my approach:
> library(pd.mirna.3.1)
> con <- db(pd.mirna.3.1)

> affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st")
> affy.probesets
[1] "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st"

> #check available tables/information
> dbGetQuery(con, "select name, sql from sqlite_master where 
> type='table'")
        name                                                                                                                    sql
1  type_dict                                                        CREATE TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT)
2 featureSet         CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, man_fsetid TEXT, type INTEGER REFERENCES type_dict(type))
3  pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER)
4  mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER)
5 table_info                                                          CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n)

So far so good.
However, how now to continue?
For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I would do something like this (although now only the probes for the 1st probeset in affy.probesets would be extracted, but that's now not the main question) :
> get(affy.probesets, mirna10cdf)
         pm mm
[1,] 34705 NA
[2,] 46085 NA
[3,] 20445 NA
[4,] 26368 NA

Main question: how could I achieve this when using a PdInfo object?

Related to this, how can I get more info on what the various keys represent? E.g. what does 'man_fsetid' represent?
[From the mailing list I meanwhile now these represent the Affymetrix "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"].

-->> Reason I am asking all this is because I would like to analyze (normalize) my miRNA 3.1 dataset using the normexp-by-control background correction (nec function in limma), essentially as described in:


Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Wageningen University
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
the Netherlands
tel: (+)31 317 485788
fax: (+)31 317 483342
email:      guido.hooiveld at wur.nl
internet:   http://nutrigene.4t.com

        [[alternative HTML version deleted]]

Bioconductor mailing list
Bioconductor at r-project.org
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list