[BioC] how to extract probes for a probeset from PdInfo database?

Tue Jan 21 22:38:46 CET 2014

Hi Di,
Thanks a lot for your feedback regarding this query.
I am indeed aware of the annotation file you mentioned, from which I already had extracted the IDs of the relevant *probe sets*.
The problem I am (was?) facing now is how to get a list the corresponding *probes* that comprise those sets. A CDF is unfortunately not provided for this array, so I have to get used to working with the PdInfo packages.

Regards,
Guido

-----Original Message-----
From: Wu, Di [mailto:dwu at fas.harvard.edu] 
Sent: Tuesday, January 21, 2014 18:49
To: Hooiveld, Guido; bioconductor at r-project.org
Subject: RE: how to extract probes for a probeset from PdInfo database?

Hi Guido,

See if the following annotation file is what can help you. 

http://www.affymetrix.com/support/technical/byproduct.affx?product=mirna_array_strip
(Additinal Support)
miRNA 3.1 Annotations, Unsupported, CSV format

Di
----
Di Wu
Postdoctoral fellow
Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA

________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Hooiveld, Guido [guido.hooiveld at wur.nl]
Sent: Tuesday, January 21, 2014 11:50 AM
To: bioconductor at r-project.org
Subject: [BioC] how to extract probes for a probeset from PdInfo database?

Hello,
I would like to extract the probes that belong to a set of probesets from a PdInfo database, but despite searching the archives I got stuck... I would appreciate some hints.

To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would like to extract all probes that belong to e.g. a set of affy control probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX-BkGr17-GC11_st.
This is my approach:
> library(pd.mirna.3.1)
> con <- db(pd.mirna.3.1)

> affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st")
> affy.probesets
[1] "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st"
>

> #check available tables/information
> dbGetQuery(con, "select name, sql from sqlite_master where 
> type='table'")
        name                                                                                                                    sql
1  type_dict                                                        CREATE TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT)
2 featureSet         CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, man_fsetid TEXT, type INTEGER REFERENCES type_dict(type))
3  pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER)
4  mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER)
5 table_info                                                          CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n)
>

So far so good.
However, how now to continue?
For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I would do something like this (although now only the probes for the 1st probeset in affy.probesets would be extracted, but that's now not the main question) :
> get(affy.probesets, mirna10cdf)
         pm mm
[1,] 34705 NA
[2,] 46085 NA
[3,] 20445 NA
[4,] 26368 NA
<<snip>>

Main question: how could I achieve this when using a PdInfo object?

Related to this, how can I get more info on what the various keys represent? E.g. what does 'man_fsetid' represent?
[From the mailing list I meanwhile now these represent the Affymetrix "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"].

-->> Reason I am asking all this is because I would like to analyze (normalize) my miRNA 3.1 dataset using the normexp-by-control background correction (nec function in limma), essentially as described in:
http://www.pubmed.org/23709276.

Thanks,
Guido

---------------------------------------------------------
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Wageningen University
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
the Netherlands
tel: (+)31 317 485788
fax: (+)31 317 483342
email:      guido.hooiveld at wur.nl
internet:   http://nutrigene.4t.com
http://scholar.google.com/citations?user=qFHaMnoAAAAJ
http://www.researcherid.com/rid/F-4912-2010

        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor