[BioC] biomaRt and Ensembl probe set filter....

Jesper Ryge Jesper.Ryge at ki.se
Mon Jan 26 18:22:16 CET 2009


hi jim

thanx for the fast answer:-) still small bumps on the way but getting better....
installed biomaRt_1.16.0 and now i can run:

 mart <- useMart("ensembl_mart_47", dataset="rnorvegicus_gene_ensembl",archive=T)

 also for ensembl_mart_46, but not for mart_45 and downwards... even thought they are 
listed in listlistMarts(mart, archive=T)? down to ensembl_mart_43, so the functionality seems 
a little reduced???

cheers,
jesper

jesper Ryge
karolinska Institutet
tlf: +46 707 146 879

----- Original Message -----
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Date: Monday, January 26, 2009 3:16 pm
Subject: Re: [BioC] biomaRt and Ensembl probe set filter....
To: Jesper.Ryge at ki.se
Cc: bioC <bioconductor at stat.math.ethz.ch>

> HI Jesper,
> 
> Jesper Ryge wrote:
> > Hi everybody
> > 
> > q1. I have been using biomaRt to filter Affymetrix probe sets 
> prior to statistical testing such 
> > as limma or cyberT. That is, I only include  probe sets that are 
> annotated in ensembl. In this 
> > sense I get rid of probe set that do not  align correctly to the 
> intended genes - at least that 
> > was my intention.   I know this has been debated before, i.e. cdf 
> file and probe set filtering of 
> > miss-aligned probe set and I find this to be the easiest way to 
> exclude probes that might 
> > hybridize to wrong transcripts.
> > I now find that since 2007 the amount of annotated probe sets on 
> the Affymetrix Rat 230_2 
> > has decreased from 17931 -> 12919 out of 31099 (i was redoing 
> some analysis and found 
> > this discrepancy between the analysis i did in 2007 and the one 
> conducted on the new 
> > ensembl database). I find that to be a rather drastic decrease, 
> but perhaps thats not so? In 
> > essence I "loose" a lot of probes, but if those that are filtered 
> are "false positives" it is of 
> > course worth it!  that was my logic so forth at least... So, 
> first i would like to know if anybody 
> > considers this strategy wise/unwise? it just seems to me a bit 
> surprising that the probe sets 
> > on the affy chips mismatch to such a large extend that only 
> roughly a third of the probes 
> > remain in the analysis? 
> 
> I think you are making a pretty strong assumption here. Do you know 
> how 
> Ensembl is annotating Affy Probe IDs to transcript? It seems to me 
> that 
> you are assuming that Ensembl is somehow checking to see what 
> transcript 
> the probes are complementary to, whereas they may in fact be simply 
> taking data from Affy and accepting them verbatim. I personally 
> have no 
> idea, but would want to know that before I filtered data in this way.
> 
> 
> > 
> > I then wanted to check this decrease in affy annotated probe sets 
> which leads me to question 
> > 2, a pure biomaRt issue:
> > 
> > q2. I wish to access earlier ensembl versions to check and 
> possible make a graph of the 
> > decrease of the annotated probe sets for the rat 230_2 chip over 
> time. but i run into a 
> > problem:
> > 
> >> mart <- 
> useMart("ensembl_mart_46",dataset="rnorvegicus_gene_ensembl",archive=T)> Error in 
useMart("ensembl_mart_46", dataset = "rnorvegicus_gene_ensembl",  : 
> >   Incorrect BioMart name, use the listMarts function to see which 
> BioMart databases are 
> > available
> > 
> > though they are listed in the archive:
> 
> I don't know if this is the problem, but you have mixed a devel 
> version 
> of biomaRt in your release version of R. This works for me with a 
> release version of biomaRt:
> 
> mart <- 
> useMart("ensembl_mart_46",dataset="rnorvegicus_gene_ensembl",archive=T)
> Checking attributes and filters ... ok
> >
> > sessionInfo()
> R version 2.8.0 (2008-10-20)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
> States.1252;LC_MONETARY=English_United 
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] biomaRt_1.16.0                     fortunes_1.3-6
> [3] RMySQL_0.6-1                       DBI_0.2-4
> [5] BSgenome.Hsapiens.UCSC.hg18_1.3.11 BSgenome_1.10.1
> [7] Biostrings_2.10.1                  IRanges_1.0.2
> 
> loaded via a namespace (and not attached):
> [1] grid_2.8.0         lattice_0.17-15    Matrix_0.999375-16 
> RCurl_0.92-0
> [5] tools_2.8.0        XML_1.94-0.1
> 
> Best,
> 
> Jim
> 
> 
> > 
> >> listMarts(archive=T)
> >                        biomart                     version
> > 1              ensembl_mart_47   ENSEMBL GENES 47 (SANGER)
> > 2     genomic_features_mart_47            Genomic Features
> > 3                  snp_mart_47                         SNP
> > 4                 vega_mart_47                        Vega
> > 5     compara_mart_homology_47            Compara homology
> > 6  compara_mart_multiple_ga_47 Compara multiple alignments
> > 7  compara_mart_pairwise_ga_47 Compara pairwise alignments
> > 8              ensembl_mart_46   ENSEMBL GENES 46 (SANGER)
> > 9     genomic_features_mart_46            Genomic Features
> > 10                 snp_mart_46                         SNP
> > 11                vega_mart_46                        Vega
> > 12    compara_mart_homology_46            Compara homology
> > 13 compara_mart_multiple_ga_46 Compara multiple alignments
> > 14 compara_mart_pairwise_ga_46 Compara pairwise alignments
> > 15             ensembl_mart_45   ENSEMBL GENES 45 (SANGER)
> > 16                 snp_mart_45                         SNP
> > 17                vega_mart_45                        Vega
> > 18    compara_mart_homology_45            Compara homology
> > 19 compara_mart_multiple_ga_45 Compara multiple alignments
> > 20 compara_mart_pairwise_ga_45 Compara pairwise alignments
> > 21             ensembl_mart_44   ENSEMBL GENES 44 (SANGER)
> > 22                 snp_mart_44                         SNP
> > 23                vega_mart_44                        Vega
> > 24    compara_mart_homology_44            Compara homology
> > 25 compara_mart_pairwise_ga_44 Compara pairwise alignments
> > 26             ensembl_mart_43   ENSEMBL GENES 43 (SANGER)
> > 27                 snp_mart_43                         SNP
> > 28                vega_mart_43                        Vega
> > 29    compara_mart_homology_43            Compara homology
> > 30 compara_mart_pairwise_ga_43 Compara pairwise alignments
> > 
> >> sessionInfo()
> > R version 2.8.0 (2008-10-20) 
> > i386-apple-darwin9.5.0 
> > 
> > locale:
> > C
> > 
> > attached base packages:
> > [1] tools     stats     graphics  grDevices utils     datasets  
> methods  
> > [8] base     
> > 
> > other attached packages:
> > [1] rat2302cdf_2.3.0 biomaRt_1.99.2   affy_1.20.0      
> Biobase_2.2.1   
> > 
> > loaded via a namespace (and not attached):
> > [1] RCurl_0.94-0         XML_1.98-1           affyio_1.10.1       
> > [4] preprocessCore_1.4.0
> > 
> > cheers,
> > Jesper Ryge, PhD student
> > karolinska Institutet
> > Dep. of Neuroscience
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-0646
> 734-936-8662
>



More information about the Bioconductor mailing list