[BioC] from rat codelink to human locuslink
Steffen Durinck
durincks at mail.nih.gov
Fri Nov 3 22:26:36 CET 2006
Hi Weiwei,
By default biomaRt runs in webservice mode. Doing queries in a large
loop in webservice mode do crash and in this case it is better to use
the package in MySQL mode. In webservice mode you could make your
look-up table by doing just the two queries that I suggested in the
first solution.
However there is an easier way to get what you want as the output of
getHomolog, when using biomaRt in MySQL mode, does contain the query
ids (rat unigene ids) and the result (human entrezgene ids) so no need
for time consuming big loops.
Try the following:
human = useMart("ensembl", dataset="hsapiens_gene_ensembl", mysql=TRUE)
rat = useMart("ensembl", dataset="rnorvegicus_gene_ensembl", mysql=TRUE)
ratUnigene = c("Rn.32316","Rn.171821")
getHomolog(id = ratUnigene, from.type="unigene",
to.type="entrezgene",from.mart=rat, to.mart=human)
It should give:
id MappedID
1 Rn.32316 10402
2 Rn.171821 7058
Note that Ensembl maps everything to the transcript level, which
explains why you might find redundant information in the output.
Cheers,
Steffen
Weiwei Shi wrote:
> Hi, there:
>
> I like the getHomolog solution (since the first one seems not workable
> for me) but i need to do some modification since there is an issue
> like this
>> getHomolog(id=ratUnigene[5], from.type="unigene", to.type="entrezgene",
> + from.mart=rat, to.mart=human)
> V1 V2 V3
> 1 ENSG00000095397 ENST00000362057 25861
> 2 ENSG00000095397 ENST00000265134 NA
> 3 ENSG00000095397 ENST00000361938 25861
> 4 ENSG00000095397 ENST00000374059 NA
> 5 ENSG00000095397 ENST00000374057 NA
>
> For one ratUnigene, there are five $V3.
> t1 <- sapply(ratUnigene, function(i) unique(getHomolog(id=i,
> from.type="unigene", to.type="entrezgene",
> from.mart=rat, to.mart=human)$V3)[1])
>
>> as.character(t1)
> [1] "NULL" "10402" "NULL" "NULL" "25861" "8706" "195827"
> [8] "NULL" "NULL" "NULL" "NULL" "NULL" "55884" "NULL"
> [15] "NULL" "3898" "23324" "NULL" "NULL" "NULL"
>
> Of course, I assume, there are only the same id and NA for $V3.
>
> However, since I have ~7400 unigenes, it is supposed to end after 78
> min. However, I run into a connection issue:
>
>> system.time(t1 <- sapply(ratUnigene, function(i)
>> unique(getHomolog(id=i, from.type="unigen
> e", to.type="entrezgene",from.mart=rat, to.mart=human)$V3)[1]))
> Error in postForm(paste(to.mart at host, "?", sep = ""), query = xmlQuery) :
> couldn't connect to host
> In addition: There were 50 or more warnings (use warnings() to see the
> first 50)
> Timing stopped at: 1.641 0.22 444.603 0 0
>
> So, I am wondering if there is a way to download a lookup table and do
> it locally. By the way, 78 minutes to do 7400 times' conversions.
>
>
>
> Weiwei
--
Steffen Durinck, Ph.D.
Oncogenomics Section
Pediatric Oncology Branch
National Cancer Institute, National Institutes of Health
URL: http://home.ccr.cancer.gov/oncology/oncogenomics/
Phone: 301-402-8103
Address:
Advanced Technology Center,
8717 Grovemont Circle
Gaithersburg, MD 20877
More information about the Bioconductor
mailing list