[BioC] biomaRt and Ensembl probe set filter....
Jesper Ryge
Jesper.Ryge at ki.se
Mon Jan 26 18:22:16 CET 2009
hi jim
thanx for the fast answer:-) still small bumps on the way but getting better....
installed biomaRt_1.16.0 and now i can run:
mart <- useMart("ensembl_mart_47", dataset="rnorvegicus_gene_ensembl",archive=T)
also for ensembl_mart_46, but not for mart_45 and downwards... even thought they are
listed in listlistMarts(mart, archive=T)? down to ensembl_mart_43, so the functionality seems
a little reduced???
cheers,
jesper
jesper Ryge
karolinska Institutet
tlf: +46 707 146 879
----- Original Message -----
From: "James W. MacDonald" <jmacdon at med.umich.edu>
Date: Monday, January 26, 2009 3:16 pm
Subject: Re: [BioC] biomaRt and Ensembl probe set filter....
To: Jesper.Ryge at ki.se
Cc: bioC <bioconductor at stat.math.ethz.ch>
> HI Jesper,
>
> Jesper Ryge wrote:
> > Hi everybody
> >
> > q1. I have been using biomaRt to filter Affymetrix probe sets
> prior to statistical testing such
> > as limma or cyberT. That is, I only include probe sets that are
> annotated in ensembl. In this
> > sense I get rid of probe set that do not align correctly to the
> intended genes - at least that
> > was my intention. I know this has been debated before, i.e. cdf
> file and probe set filtering of
> > miss-aligned probe set and I find this to be the easiest way to
> exclude probes that might
> > hybridize to wrong transcripts.
> > I now find that since 2007 the amount of annotated probe sets on
> the Affymetrix Rat 230_2
> > has decreased from 17931 -> 12919 out of 31099 (i was redoing
> some analysis and found
> > this discrepancy between the analysis i did in 2007 and the one
> conducted on the new
> > ensembl database). I find that to be a rather drastic decrease,
> but perhaps thats not so? In
> > essence I "loose" a lot of probes, but if those that are filtered
> are "false positives" it is of
> > course worth it! that was my logic so forth at least... So,
> first i would like to know if anybody
> > considers this strategy wise/unwise? it just seems to me a bit
> surprising that the probe sets
> > on the affy chips mismatch to such a large extend that only
> roughly a third of the probes
> > remain in the analysis?
>
> I think you are making a pretty strong assumption here. Do you know
> how
> Ensembl is annotating Affy Probe IDs to transcript? It seems to me
> that
> you are assuming that Ensembl is somehow checking to see what
> transcript
> the probes are complementary to, whereas they may in fact be simply
> taking data from Affy and accepting them verbatim. I personally
> have no
> idea, but would want to know that before I filtered data in this way.
>
>
> >
> > I then wanted to check this decrease in affy annotated probe sets
> which leads me to question
> > 2, a pure biomaRt issue:
> >
> > q2. I wish to access earlier ensembl versions to check and
> possible make a graph of the
> > decrease of the annotated probe sets for the rat 230_2 chip over
> time. but i run into a
> > problem:
> >
> >> mart <-
> useMart("ensembl_mart_46",dataset="rnorvegicus_gene_ensembl",archive=T)> Error in
useMart("ensembl_mart_46", dataset = "rnorvegicus_gene_ensembl", :
> > Incorrect BioMart name, use the listMarts function to see which
> BioMart databases are
> > available
> >
> > though they are listed in the archive:
>
> I don't know if this is the problem, but you have mixed a devel
> version
> of biomaRt in your release version of R. This works for me with a
> release version of biomaRt:
>
> mart <-
> useMart("ensembl_mart_46",dataset="rnorvegicus_gene_ensembl",archive=T)
> Checking attributes and filters ... ok
> >
> > sessionInfo()
> R version 2.8.0 (2008-10-20)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_1.16.0 fortunes_1.3-6
> [3] RMySQL_0.6-1 DBI_0.2-4
> [5] BSgenome.Hsapiens.UCSC.hg18_1.3.11 BSgenome_1.10.1
> [7] Biostrings_2.10.1 IRanges_1.0.2
>
> loaded via a namespace (and not attached):
> [1] grid_2.8.0 lattice_0.17-15 Matrix_0.999375-16
> RCurl_0.92-0
> [5] tools_2.8.0 XML_1.94-0.1
>
> Best,
>
> Jim
>
>
> >
> >> listMarts(archive=T)
> > biomart version
> > 1 ensembl_mart_47 ENSEMBL GENES 47 (SANGER)
> > 2 genomic_features_mart_47 Genomic Features
> > 3 snp_mart_47 SNP
> > 4 vega_mart_47 Vega
> > 5 compara_mart_homology_47 Compara homology
> > 6 compara_mart_multiple_ga_47 Compara multiple alignments
> > 7 compara_mart_pairwise_ga_47 Compara pairwise alignments
> > 8 ensembl_mart_46 ENSEMBL GENES 46 (SANGER)
> > 9 genomic_features_mart_46 Genomic Features
> > 10 snp_mart_46 SNP
> > 11 vega_mart_46 Vega
> > 12 compara_mart_homology_46 Compara homology
> > 13 compara_mart_multiple_ga_46 Compara multiple alignments
> > 14 compara_mart_pairwise_ga_46 Compara pairwise alignments
> > 15 ensembl_mart_45 ENSEMBL GENES 45 (SANGER)
> > 16 snp_mart_45 SNP
> > 17 vega_mart_45 Vega
> > 18 compara_mart_homology_45 Compara homology
> > 19 compara_mart_multiple_ga_45 Compara multiple alignments
> > 20 compara_mart_pairwise_ga_45 Compara pairwise alignments
> > 21 ensembl_mart_44 ENSEMBL GENES 44 (SANGER)
> > 22 snp_mart_44 SNP
> > 23 vega_mart_44 Vega
> > 24 compara_mart_homology_44 Compara homology
> > 25 compara_mart_pairwise_ga_44 Compara pairwise alignments
> > 26 ensembl_mart_43 ENSEMBL GENES 43 (SANGER)
> > 27 snp_mart_43 SNP
> > 28 vega_mart_43 Vega
> > 29 compara_mart_homology_43 Compara homology
> > 30 compara_mart_pairwise_ga_43 Compara pairwise alignments
> >
> >> sessionInfo()
> > R version 2.8.0 (2008-10-20)
> > i386-apple-darwin9.5.0
> >
> > locale:
> > C
> >
> > attached base packages:
> > [1] tools stats graphics grDevices utils datasets
> methods
> > [8] base
> >
> > other attached packages:
> > [1] rat2302cdf_2.3.0 biomaRt_1.99.2 affy_1.20.0
> Biobase_2.2.1
> >
> > loaded via a namespace (and not attached):
> > [1] RCurl_0.94-0 XML_1.98-1 affyio_1.10.1
> > [4] preprocessCore_1.4.0
> >
> > cheers,
> > Jesper Ryge, PhD student
> > karolinska Institutet
> > Dep. of Neuroscience
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> --
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-0646
> 734-936-8662
>
More information about the Bioconductor
mailing list