[BioC] can not link the genomic positions queried and their specific annotation, when getting genomic variants annotated by biomaRt package
Steve Lianoglou
mailinglist.honeypot at gmail.com
Tue Feb 8 18:02:25 CET 2011
Hi,
On Tue, Feb 8, 2011 at 11:47 AM, Mao Jianfeng <jianfeng.mao at gmail.com> wrote:
> Dear listers, Sean and Steve,
>
> I have posted a similar question in this list. But, I am still
> confused. So I try to describe my question more detail, in order to
> let it more clear for you. PLEASE read all the 6 sections followed.
>
> Thanks a lot. My question is not a student's homework. And, I have
> only one way to get helps on R and bioconductor. I learned all of them
> by myself, in a somewhat isolated environment. So, your any helps are
> very very valuable for me.
>
> Jian-Feng,
>
>
>
> (1) the genomic variants data I need to be annotated:
> # SNPs,chromosome,start,end
> SNP_1,1,43,43
> SNP_2,2,56,56
>
> (2) I want to get (annotation), there maybe multiples term for a
> specific annotation column, they need be combined in one cell. Or they
> need be in different rows of the same column. Whatever they are, the
> genomic positions should go along with their specific annotations.
>
> # SNPs,chromosome,start,end,annotation_term
> SNP_1,1,43,43,go_1:go_3
> SNP_2,2,56,56,go_100:go_1000
>
> or
>
> # SNPs,chromosome,start,end,go_term
> SNP_1,1,43,43,go_1
> SNP_1,1,43,43,go_3
> SNP_2,2,56,56,go_100
> SNP_2,2,56,56,go_1000
>
> (3) It was said that biomaRt package have such functionalities,
>
> (4) what I have got using the biomaRt package,
> library(biomaRt)
> listMarts()
> plant = useMart("plant_mart_7")
> alyr=useDataset("alyrata_eg_gene", mart=plant)
> atha = useDataset ("athaliana_eg_gene",mart=plant)
>
> listAttributes(alyr)
> listFilters(alyr)
>
> chr<-c(rep(1, 10))
> start<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
> end<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
>
> getBM(attributes =
> c("chromosome_name","start_position","ensembl_gene_id",
> "go_biological_process_linkage_type"), filters = c("chromosome_name",
> "start", "end"), values = list(chr, start, end), mart=alyr, uniqueRows
> = TRUE)
>
> (5) what I got
>
> chromosome_name start_position end_position ensembl_gene_id
> 1 1 48875 49123 Al_scaffold_0001_16
> 2 1 72255 72617 Al_scaffold_0001_21
> 3 1 10652 11944 Al_scaffold_0001_4
> 4 1 82573 83367 fgenesh1_pg.C_scaffold_1000018
> 5 1 87206 90301 fgenesh1_pg.C_scaffold_1000020
> 6 1 29681 31614 fgenesh1_pm.C_scaffold_1000009
> 7 1 51526 52636 fgenesh1_pm.C_scaffold_1000016
> 8 1 78367 80505 fgenesh1_pm.C_scaffold_1000020
> 9 1 35461 39593 fgenesh2_kg.1__12__AT1G02120.1
> 10 1 39949 42531 fgenesh2_kg.1__13__AT1G02110.1
> 11 1 46396 48761 fgenesh2_kg.1__19__AT1G02090.1
> 12 1 55814 56468 fgenesh2_kg.1__20__AT1G02070.1
> 13 1 74785 76652 fgenesh2_kg.1__23__AT1G02065.1
> 14 1 80941 82330 fgenesh2_kg.1__25__AT1G02050.1
> 15 1 80941 82330 fgenesh2_kg.1__25__AT1G02050.1
> 16 1 90714 113497 fgenesh2_kg.1__28__AT1G02010.1
> 17 1 90714 113497 fgenesh2_kg.1__28__AT1G02010.1
> 18 1 3311 6198 fgenesh2_kg.1__2__AT1G02190.2
> 19 1 3311 6198 fgenesh2_kg.1__2__AT1G02190.2
> 20 1 9512 10567 fgenesh2_kg.1__3__AT1G02180.1
> 21 1 12552 13416 fgenesh2_kg.1__5__AT1G02160.2
> 22 1 47 2523 scaffold_100001.1
> 23 1 47 2523 scaffold_100001.1
> 24 1 7429 7630 scaffold_100003.1
> 25 1 13702 15386 scaffold_100007.1
> 26 1 15665 19464 scaffold_100008.1
> 27 1 19692 20609 scaffold_100009.1
> 28 1 24515 27497 scaffold_100010.1
> 29 1 33055 34772 scaffold_100013.1
> 30 1 33055 34772 scaffold_100013.1
> 31 1 33055 34772 scaffold_100013.1
> 32 1 33055 34772 scaffold_100013.1
> 33 1 33055 34772 scaffold_100013.1
> 34 1 33055 34772 scaffold_100013.1
> 35 1 43130 46178 scaffold_100016.1
> 36 1 49553 51020 scaffold_100018.1
> 37 1 49553 51020 scaffold_100018.1
> 38 1 57579 57871 scaffold_100022.1
> 39 1 58865 72177 scaffold_100023.1
> go_biological_process_linkage_type
> 1
> 2
> 3 IEA
> 4
> 5
> 6
> 7
> 8
> 9
> 10
> 11
> 12
> 13
> 14 IEA
> 15 IEA
> 16 IEA
> 17 IEA
> 18 IEA
> 19 IEA
> 20
> 21
> 22 IEA
> 23 IEA
> 24
> 25
> 26 IEA
> 27
> 28
> 29 IEA
> 30 IEA
> 31 IEA
> 32 IEA
> 33 IEA
> 34 IEA
> 35
> 36 IEA
> 37 IEA
> 38
> 39
>
> (6) my problem is I can not link the genomic positions I queried and
> their specific annotation.
I don't understand what you're asking. But, as I pointed out in my
original email, your "getBM" call doesn't return any annotations, it
only returns the "type" of annotation evidence each gene has. "IEA"
tells you what is the source of the annotation you would have received
had you included a column for that annotation.
You'll note that my first email, I changed your getBM slightly:
result <- getBM(attributes=c("chromosome_name","start_position","ensembl_gene_id",
"go_biological_process_linkage_type", "go_biological_process_id"),
filters = c("chromosome_name", "start", "end"),
values = list(chr, start, end), mart=alyr, uniqueRows = TRUE)
See how I added a "go_biological_process_id" as one of the
`attributes` to return? You should, too.
-steve
--
Steve Lianoglou
Graduate Student: Computational Systems Biology
| Memorial Sloan-Kettering Cancer Center
| Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact
More information about the Bioconductor
mailing list