[BioC] can not link the genomic positions queried and their specific annotation, when getting genomic variants annotated by biomaRt package
Mao Jianfeng
jianfeng.mao at gmail.com
Tue Feb 8 17:47:59 CET 2011
Dear listers, Sean and Steve,
I have posted a similar question in this list. But, I am still
confused. So I try to describe my question more detail, in order to
let it more clear for you. PLEASE read all the 6 sections followed.
Thanks a lot. My question is not a student's homework. And, I have
only one way to get helps on R and bioconductor. I learned all of them
by myself, in a somewhat isolated environment. So, your any helps are
very very valuable for me.
Jian-Feng,
(1) the genomic variants data I need to be annotated:
# SNPs,chromosome,start,end
SNP_1,1,43,43
SNP_2,2,56,56
(2) I want to get (annotation), there maybe multiples term for a
specific annotation column, they need be combined in one cell. Or they
need be in different rows of the same column. Whatever they are, the
genomic positions should go along with their specific annotations.
# SNPs,chromosome,start,end,annotation_term
SNP_1,1,43,43,go_1:go_3
SNP_2,2,56,56,go_100:go_1000
or
# SNPs,chromosome,start,end,go_term
SNP_1,1,43,43,go_1
SNP_1,1,43,43,go_3
SNP_2,2,56,56,go_100
SNP_2,2,56,56,go_1000
(3) It was said that biomaRt package have such functionalities,
(4) what I have got using the biomaRt package,
library(biomaRt)
listMarts()
plant = useMart("plant_mart_7")
alyr=useDataset("alyrata_eg_gene", mart=plant)
atha = useDataset ("athaliana_eg_gene",mart=plant)
listAttributes(alyr)
listFilters(alyr)
chr<-c(rep(1, 10))
start<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
end<-c(33, 999, 3000, 7000, 9000, 10000, 12000, 19000, 80000, 100000)
getBM(attributes =
c("chromosome_name","start_position","ensembl_gene_id",
"go_biological_process_linkage_type"), filters = c("chromosome_name",
"start", "end"), values = list(chr, start, end), mart=alyr, uniqueRows
= TRUE)
(5) what I got
chromosome_name start_position end_position ensembl_gene_id
1 1 48875 49123 Al_scaffold_0001_16
2 1 72255 72617 Al_scaffold_0001_21
3 1 10652 11944 Al_scaffold_0001_4
4 1 82573 83367 fgenesh1_pg.C_scaffold_1000018
5 1 87206 90301 fgenesh1_pg.C_scaffold_1000020
6 1 29681 31614 fgenesh1_pm.C_scaffold_1000009
7 1 51526 52636 fgenesh1_pm.C_scaffold_1000016
8 1 78367 80505 fgenesh1_pm.C_scaffold_1000020
9 1 35461 39593 fgenesh2_kg.1__12__AT1G02120.1
10 1 39949 42531 fgenesh2_kg.1__13__AT1G02110.1
11 1 46396 48761 fgenesh2_kg.1__19__AT1G02090.1
12 1 55814 56468 fgenesh2_kg.1__20__AT1G02070.1
13 1 74785 76652 fgenesh2_kg.1__23__AT1G02065.1
14 1 80941 82330 fgenesh2_kg.1__25__AT1G02050.1
15 1 80941 82330 fgenesh2_kg.1__25__AT1G02050.1
16 1 90714 113497 fgenesh2_kg.1__28__AT1G02010.1
17 1 90714 113497 fgenesh2_kg.1__28__AT1G02010.1
18 1 3311 6198 fgenesh2_kg.1__2__AT1G02190.2
19 1 3311 6198 fgenesh2_kg.1__2__AT1G02190.2
20 1 9512 10567 fgenesh2_kg.1__3__AT1G02180.1
21 1 12552 13416 fgenesh2_kg.1__5__AT1G02160.2
22 1 47 2523 scaffold_100001.1
23 1 47 2523 scaffold_100001.1
24 1 7429 7630 scaffold_100003.1
25 1 13702 15386 scaffold_100007.1
26 1 15665 19464 scaffold_100008.1
27 1 19692 20609 scaffold_100009.1
28 1 24515 27497 scaffold_100010.1
29 1 33055 34772 scaffold_100013.1
30 1 33055 34772 scaffold_100013.1
31 1 33055 34772 scaffold_100013.1
32 1 33055 34772 scaffold_100013.1
33 1 33055 34772 scaffold_100013.1
34 1 33055 34772 scaffold_100013.1
35 1 43130 46178 scaffold_100016.1
36 1 49553 51020 scaffold_100018.1
37 1 49553 51020 scaffold_100018.1
38 1 57579 57871 scaffold_100022.1
39 1 58865 72177 scaffold_100023.1
go_biological_process_linkage_type
1
2
3 IEA
4
5
6
7
8
9
10
11
12
13
14 IEA
15 IEA
16 IEA
17 IEA
18 IEA
19 IEA
20
21
22 IEA
23 IEA
24
25
26 IEA
27
28
29 IEA
30 IEA
31 IEA
32 IEA
33 IEA
34 IEA
35
36 IEA
37 IEA
38
39
(6) my problem is I can not link the genomic positions I queried and
their specific annotation.
--
Jian-Feng, Mao
the Institute of Botany,
Chinese Academy of Botany,
More information about the Bioconductor
mailing list