[BioC] Is retrieving exon sequences with biomaRt a random process?

Wolfgang Huber huber at ebi.ac.uk
Mon Apr 13 11:30:14 CEST 2009


Dear Jürg

thank you for the feedback! Can you send us a reproducible example - 
this may better help us figuring what is going on. In the example you 
posted, what is the object "ensembl" and how did you generate it?

I tried the following example, which is as similar to yours as I could 
think of. I could not reproduce your problem, i.e. I got consistent 
(i.e. non-random) results, as shown below:

   library("biomaRt")
   ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
   res = lapply(sequence(50), function(i)
         getSequence(id=21419,type="entrezgene",
                seqType="gene_exon",mart=ensembl))

R 2.8.1
==========
 > res
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

  .... (46 more times NULL)

[[50]]
NULL

 > sessionInfo()
R version 2.8.1 (2008-12-22)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] biomaRt_1.16.0

loaded via a namespace (and not attached):
[1] RCurl_0.92-0 XML_1.99-0


Today's R + bioC devel
======================

 > res
[[1]]
[1] gene_exon  entrezgene
<0 rows> (or 0-length row.names)

[[2]]
[1] gene_exon  entrezgene
<0 rows> (or 0-length row.names)

[[3]]
[1] gene_exon  entrezgene
<0 rows> (or 0-length row.names)

  .... (46 times the same)


[[50]]
[1] gene_exon  entrezgene
<0 rows> (or 0-length row.names)



Also, when using a different Entrez Gene ID, I get a non-trivial result, 
e.g. with

 > g= 
getSequence(id=1499,type="entrezgene",seqType="gene_exon",mart=ensembl)

 > str(g)
'data.frame':   23 obs. of  2 variables:
  $ gene_exon : chr 
"GTGGTGGTTAATAAGGCTGCAGTTATGGTCCATCAGCTTTCTAAAAAGGAAGCTTCCAGACACGCTATCATGCGTTCTCCTCAGATGGTGTCTGCTATTGTACGTACCATGCAGAATACAAATGATG"| 
__truncated__ 
"GCCGGTGGCGGCAGGATACAGCGGCTTCTGCGCGACTTATAAGAGCTCCTTGTGCGGCGCCATTTTAAGCCTCTCGGTCTGTGGCAGCAGCGTTGGCCCGGCCCCGGGAGCGGAGAGCGAGGGGAGG"| 
__truncated__ 
"GGTATTTGAAGTATACCATACAACTGTTTTGAAAATCCAGCGTGGACAATGGCTACTCAAG" 
"CTGCTTTATTCTCCCATTGAAAACATCCAAAGAGTAGCTGCAGGGGTCCTCTGTGAACTTGCTCAGGACAAGGAAGCTGCAGAAGCTATTGAAGCTGAGGGAGCCACAGCTCCTCTGACAGAGTTAC"| 
__truncated__ ...
  $ entrezgene: int  1499 1499 1499 1499 1499 1499 1499 1499 1499 1499 ...


 > sessionInfo()
R version 2.10.0 Under development (unstable) (2009-04-12 r48319)
x86_64-unknown-linux-gnu

locale:
LC_CTYPE=C;LC_NUMERIC=C;LC_TIME=C;LC_COLLATE=C;LC_MONETARY=C;LC_MESSAGES=it_IT;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] biomaRt_1.99.9

loaded via a namespace (and not attached):
[1] RCurl_0.94-1 XML_2.3-0    tools_2.10.0

 


Straubhaar, Juerg wrote:
> I am using the following code to retrieve the exon sequences of gene Tcfap2b with GeneID:21419. There are 8 exons for this gene.
> 
> 
> for (i in sequence(50)) {
> + x <- getSequence(id=21419,type="entrezgene",seqType="gene_exon",mart=ensembl)
> + if (is.null(x)) print('NULL result')
> + if (!is.null(x)) print("Correct result")
> + }
> 
> This gives 44 NULL results and 6 correct results. 'correct' means getSequence() outputs the sequences of the exons.
> 
>> sessionInfo()
> R version 2.8.1 (2008-12-22) 
> x86_64-pc-linux-gnu 
> 
> locale:
> C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] biomaRt_1.16.0
> 
> loaded via a namespace (and not attached):
> [1] RCurl_0.94-0 XML_1.99-0   tools_2.8.1
> 
> Thank you,
> 
> Juerg Straubhaar, Umass Med School
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
----------------------------------------------------
Wolfgang Huber  EMBL-EBI  http://www.ebi.ac.uk/huber
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: biomaRt-Straubhaar.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20090413/8db958c1/attachment.txt>


More information about the Bioconductor mailing list