[BioC] Biomart query in Web interface Vs. biomaRt package?
Steffen
sdurinck at lbl.gov
Mon Oct 8 18:13:42 CEST 2007
Hi Jeremie,
Below the answer from the Ensembl helpdesk, in short the 'go' filter
will retrieve all genes associated with a particular GO identifier and
the 'biol_process' filter will retrieve all genes associated with a
particular GO identifier and all of it's children thus explaining why
one gets more genes when using 'biol_process' compared to 'go' as
filter. (the Ensembl BioMart Web interface uses 'biol_process' and you
used 'go' in your biomaRt query)
Cheers,
Steffen
-----
When you query BioMart filtering a specific GO term (GO:0006996, or a
list) you can retrieve all those entries associated to that/those GO
term(s)... But if you filter using a 'Biological process' and then add
an ID, in this case you get all the entries matching that ID and all the
children...
organelle organization and biogenesis [GO:0006996]
autophagic vacuole formation [GO:0000045]
chromosome organization and biogenesis [GO:0051276]
chromosome condensation [GO:0030261]
chromosome decondensation [GO:0051312]
chromosome organization and biogenesis (sensu Bacteria) [GO:0051277]
chromosome organization and biogenesis (sensu Eukaryota) [GO:0007001]
chromosome breakage [GO:0031052]
establishment and/or maintenance of chromatin architecture [GO:0006325]
karyosome formation [GO:0030717]
....
As seen here:
http://www.ensembl.org/Homo_sapiens/goview?depth=2;query=organelle+organization+and+biogenesis
I hope this explains,
-- Xose M Fernandez (Ensembl User Support)
J.J.P.Lebrec at lumc.nl wrote:
> Hi,
>
> Using the web based Biomart tool (
> http://www.ensembl.org/biomart/martview/ ) in database=Ensembl 46,
> dataset=Homo sapiens Genes (NCBI 36), I have manually extracted all
> unique genes' 'External Gene ID' using GO pathway GO:0006996 as a
> filter. I obtained 1141 unique genes.
>
> I tried to automate the process using the BiomaRt package with the below
> query which only yielded 9 unique genes!
>
>
>> human = useMart("ensembl", dataset = "hsapiens_gene_ensembl")
>>
> Checking attributes and filters ... ok
>
>> getBM(attributes = "external_gene_id", filters = "go", values =
>>
> "GO:0006996", mart = human)
> external_gene_id
> 1 KIF3A
> 2 HPS3
> 3 HPS3
> 4 DTNBP1
> 5 DTNBP1
> 6 KIF5C
> 7 KIF4A
> 8 HPS1
> 9 HPS6
> 10 HPS6
> 11 HPS6
> 12 KIF25
> 13 HPS4
>
>> sessionInfo()
>>
> R version 2.5.1 (2007-06-27)
> i386-pc-mingw32
>
> locale:
> LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=Fr
> ench_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
>
> attached base packages:
> [1] "stats" "graphics" "grDevices" "utils" "datasets"
> "methods"
> [7] "base"
>
> other attached packages:
> biomaRt RCurl XML
> "1.10.1" "0.8-0" "1.9-0"
>
>
> I thought the two queries to be equivalent, could you please tell me
> what I am doing wrong here?
>
> Many thanks in advance,
>
> Jeremie
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
More information about the Bioconductor
mailing list