[BioC] Is retrieving exon sequences with biomaRt a random process?
Wolfgang Huber
huber at ebi.ac.uk
Mon Apr 13 11:30:14 CEST 2009
Dear Jürg
thank you for the feedback! Can you send us a reproducible example -
this may better help us figuring what is going on. In the example you
posted, what is the object "ensembl" and how did you generate it?
I tried the following example, which is as similar to yours as I could
think of. I could not reproduce your problem, i.e. I got consistent
(i.e. non-random) results, as shown below:
library("biomaRt")
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
res = lapply(sequence(50), function(i)
getSequence(id=21419,type="entrezgene",
seqType="gene_exon",mart=ensembl))
R 2.8.1
==========
> res
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
.... (46 more times NULL)
[[50]]
NULL
> sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32
locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
Kingdom.1252;LC_MONETARY=English_United
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] biomaRt_1.16.0
loaded via a namespace (and not attached):
[1] RCurl_0.92-0 XML_1.99-0
Today's R + bioC devel
======================
> res
[[1]]
[1] gene_exon entrezgene
<0 rows> (or 0-length row.names)
[[2]]
[1] gene_exon entrezgene
<0 rows> (or 0-length row.names)
[[3]]
[1] gene_exon entrezgene
<0 rows> (or 0-length row.names)
.... (46 times the same)
[[50]]
[1] gene_exon entrezgene
<0 rows> (or 0-length row.names)
Also, when using a different Entrez Gene ID, I get a non-trivial result,
e.g. with
> g=
getSequence(id=1499,type="entrezgene",seqType="gene_exon",mart=ensembl)
> str(g)
'data.frame': 23 obs. of 2 variables:
$ gene_exon : chr
"GTGGTGGTTAATAAGGCTGCAGTTATGGTCCATCAGCTTTCTAAAAAGGAAGCTTCCAGACACGCTATCATGCGTTCTCCTCAGATGGTGTCTGCTATTGTACGTACCATGCAGAATACAAATGATG"|
__truncated__
"GCCGGTGGCGGCAGGATACAGCGGCTTCTGCGCGACTTATAAGAGCTCCTTGTGCGGCGCCATTTTAAGCCTCTCGGTCTGTGGCAGCAGCGTTGGCCCGGCCCCGGGAGCGGAGAGCGAGGGGAGG"|
__truncated__
"GGTATTTGAAGTATACCATACAACTGTTTTGAAAATCCAGCGTGGACAATGGCTACTCAAG"
"CTGCTTTATTCTCCCATTGAAAACATCCAAAGAGTAGCTGCAGGGGTCCTCTGTGAACTTGCTCAGGACAAGGAAGCTGCAGAAGCTATTGAAGCTGAGGGAGCCACAGCTCCTCTGACAGAGTTAC"|
__truncated__ ...
$ entrezgene: int 1499 1499 1499 1499 1499 1499 1499 1499 1499 1499 ...
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-04-12 r48319)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] biomaRt_1.99.9
loaded via a namespace (and not attached):
[1] RCurl_0.94-1 XML_2.3-0 tools_2.10.0
Straubhaar, Juerg wrote:
> I am using the following code to retrieve the exon sequences of gene Tcfap2b with GeneID:21419. There are 8 exons for this gene.
>
>
> for (i in sequence(50)) {
> + x <- getSequence(id=21419,type="entrezgene",seqType="gene_exon",mart=ensembl)
> + if (is.null(x)) print('NULL result')
> + if (!is.null(x)) print("Correct result")
> + }
>
> This gives 44 NULL results and 6 correct results. 'correct' means getSequence() outputs the sequences of the exons.
>
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> x86_64-pc-linux-gnu
>
> locale:
> C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_1.16.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_0.94-0 XML_1.99-0 tools_2.8.1
>
> Thank you,
>
> Juerg Straubhaar, Umass Med School
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
----------------------------------------------------
Wolfgang Huber EMBL-EBI http://www.ebi.ac.uk/huber
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: biomaRt-Straubhaar.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20090413/8db958c1/attachment.txt>
More information about the Bioconductor
mailing list