[BioC] how to extract probes for a probeset from PdInfo database?
guido.hooiveld at wur.nl
Tue Jan 21 22:38:46 CET 2014
Thanks a lot for your feedback regarding this query.
I am indeed aware of the annotation file you mentioned, from which I already had extracted the IDs of the relevant *probe sets*.
The problem I am (was?) facing now is how to get a list the corresponding *probes* that comprise those sets. A CDF is unfortunately not provided for this array, so I have to get used to working with the PdInfo packages.
From: Wu, Di [mailto:dwu at fas.harvard.edu]
Sent: Tuesday, January 21, 2014 18:49
To: Hooiveld, Guido; bioconductor at r-project.org
Subject: RE: how to extract probes for a probeset from PdInfo database?
See if the following annotation file is what can help you.
miRNA 3.1 Annotations, Unsupported, CSV format
Harvard University, Statistics Department Harvard Medical School Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Hooiveld, Guido [guido.hooiveld at wur.nl]
Sent: Tuesday, January 21, 2014 11:50 AM
To: bioconductor at r-project.org
Subject: [BioC] how to extract probes for a probeset from PdInfo database?
I would like to extract the probes that belong to a set of probesets from a PdInfo database, but despite searching the archives I got stuck... I would appreciate some hints.
To be specific: I am working with an Affymetrix miRNA 3.1 dataset. I would like to extract all probes that belong to e.g. a set of affy control probesets, such as e.g. AFFX-BkGr17-GC10_st and AFFX-BkGr17-GC11_st.
This is my approach:
> con <- db(pd.mirna.3.1)
> affy.probesets <- c("AFFX-BkGr17-GC10_st","AFFX-BkGr17-GC11_st")
 "AFFX-BkGr17-GC10_st" "AFFX-BkGr17-GC11_st"
> #check available tables/information
> dbGetQuery(con, "select name, sql from sqlite_master where
1 type_dict CREATE TABLE type_dict (type INTEGER PRIMARY KEY, type_id TEXT)
2 featureSet CREATE TABLE featureSet (fsetid INTEGER PRIMARY KEY, man_fsetid TEXT, type INTEGER REFERENCES type_dict(type))
3 pmfeature CREATE TABLE pmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER)
4 mmfeature CREATE TABLE mmfeature (fid INTEGER, fsetid INTEGER REFERENCES featureSet(fsetid), atom INTEGER, x INTEGER, y INTEGER)
5 table_info CREATE TABLE table_info \n( tbl TEXT,\n\trow_count INTEGER \n)
So far so good.
However, how now to continue?
For arrays for which a CDF is available, for e.g. the miRNA 1.0 array I would do something like this (although now only the probes for the 1st probeset in affy.probesets would be extracted, but that's now not the main question) :
> get(affy.probesets, mirna10cdf)
[1,] 34705 NA
[2,] 46085 NA
[3,] 20445 NA
[4,] 26368 NA
Main question: how could I achieve this when using a PdInfo object?
Related to this, how can I get more info on what the various keys represent? E.g. what does 'man_fsetid' represent?
[From the mailing list I meanwhile now these represent the Affymetrix "probeset_name", and the 'fsetid' the Affymetrix "probeset_id"].
-->> Reason I am asking all this is because I would like to analyze (normalize) my miRNA 3.1 dataset using the normexp-by-control background correction (nec function in limma), essentially as described in:
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
tel: (+)31 317 485788
fax: (+)31 317 483342
email: guido.hooiveld at wur.nl
[[alternative HTML version deleted]]
Bioconductor mailing list
Bioconductor at r-project.org
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor