[BioC] biomaRt and Ensembl probe set filter....
Jesper Ryge
Jesper.Ryge at ki.se
Mon Jan 26 11:20:26 CET 2009
Hi everybody
q1. I have been using biomaRt to filter Affymetrix probe sets prior to statistical testing such
as limma or cyberT. That is, I only include probe sets that are annotated in ensembl. In this
sense I get rid of probe set that do not align correctly to the intended genes - at least that
was my intention. I know this has been debated before, i.e. cdf file and probe set filtering of
miss-aligned probe set and I find this to be the easiest way to exclude probes that might
hybridize to wrong transcripts.
I now find that since 2007 the amount of annotated probe sets on the Affymetrix Rat 230_2
has decreased from 17931 -> 12919 out of 31099 (i was redoing some analysis and found
this discrepancy between the analysis i did in 2007 and the one conducted on the new
ensembl database). I find that to be a rather drastic decrease, but perhaps thats not so? In
essence I "loose" a lot of probes, but if those that are filtered are "false positives" it is of
course worth it! that was my logic so forth at least... So, first i would like to know if anybody
considers this strategy wise/unwise? it just seems to me a bit surprising that the probe sets
on the affy chips mismatch to such a large extend that only roughly a third of the probes
remain in the analysis?
I then wanted to check this decrease in affy annotated probe sets which leads me to question
2, a pure biomaRt issue:
q2. I wish to access earlier ensembl versions to check and possible make a graph of the
decrease of the annotated probe sets for the rat 230_2 chip over time. but i run into a
problem:
> mart <- useMart("ensembl_mart_46",dataset="rnorvegicus_gene_ensembl",archive=T)
Error in useMart("ensembl_mart_46", dataset = "rnorvegicus_gene_ensembl", :
Incorrect BioMart name, use the listMarts function to see which BioMart databases are
available
though they are listed in the archive:
> listMarts(archive=T)
biomart version
1 ensembl_mart_47 ENSEMBL GENES 47 (SANGER)
2 genomic_features_mart_47 Genomic Features
3 snp_mart_47 SNP
4 vega_mart_47 Vega
5 compara_mart_homology_47 Compara homology
6 compara_mart_multiple_ga_47 Compara multiple alignments
7 compara_mart_pairwise_ga_47 Compara pairwise alignments
8 ensembl_mart_46 ENSEMBL GENES 46 (SANGER)
9 genomic_features_mart_46 Genomic Features
10 snp_mart_46 SNP
11 vega_mart_46 Vega
12 compara_mart_homology_46 Compara homology
13 compara_mart_multiple_ga_46 Compara multiple alignments
14 compara_mart_pairwise_ga_46 Compara pairwise alignments
15 ensembl_mart_45 ENSEMBL GENES 45 (SANGER)
16 snp_mart_45 SNP
17 vega_mart_45 Vega
18 compara_mart_homology_45 Compara homology
19 compara_mart_multiple_ga_45 Compara multiple alignments
20 compara_mart_pairwise_ga_45 Compara pairwise alignments
21 ensembl_mart_44 ENSEMBL GENES 44 (SANGER)
22 snp_mart_44 SNP
23 vega_mart_44 Vega
24 compara_mart_homology_44 Compara homology
25 compara_mart_pairwise_ga_44 Compara pairwise alignments
26 ensembl_mart_43 ENSEMBL GENES 43 (SANGER)
27 snp_mart_43 SNP
28 vega_mart_43 Vega
29 compara_mart_homology_43 Compara homology
30 compara_mart_pairwise_ga_43 Compara pairwise alignments
> sessionInfo()
R version 2.8.0 (2008-10-20)
i386-apple-darwin9.5.0
locale:
C
attached base packages:
[1] tools stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] rat2302cdf_2.3.0 biomaRt_1.99.2 affy_1.20.0 Biobase_2.2.1
loaded via a namespace (and not attached):
[1] RCurl_0.94-0 XML_1.98-1 affyio_1.10.1
[4] preprocessCore_1.4.0
>
cheers,
Jesper Ryge, PhD student
karolinska Institutet
Dep. of Neuroscience
More information about the Bioconductor
mailing list