[BioC] biomaRt 3'UTR coordinates

Iain Gallagher iaingallagher at btopenworld.com
Sat Dec 6 19:18:38 CET 2008


Hi Wolfgang.

Sorry. I should have enclosed a portion of the file or at least a clearer explanation of what it contained. It is simply a list of ENST ids the first few of which I have detailed below.

ENST00000000233
ENST00000000412
ENST00000000442
ENST00000001008
ENST00000002125
ENST00000002165
ENST00000002501
ENST00000002829
ENST00000003100
ENST00000003302
ENST00000003583
ENST00000003607
ENST00000003912

For you information here is the reply I received from the ENSEMBL help desk regading this problem.

''Currently these attributes are not available from BioMart. They have
been dropped when we moved to an automated Mart building process a few
months ago. However, as many people have asked for these attributes,
they have been added again to our v52 release which, if everything goes
according to plan, should go live coming week. ''

So hopefully  at some point next week I'll be able to carry out the query.

Thanks

Iain


--- On Sat, 6/12/08, Wolfgang Huber <huber at ebi.ac.uk> wrote:

> From: Wolfgang Huber <huber at ebi.ac.uk>
> Subject: Re: [BioC] biomaRt 3'UTR coordinates
> To: iaingallagher at btopenworld.com
> Cc: Bioconductor at stat.math.ethz.ch
> Date: Saturday, 6 December, 2008, 2:11 PM
> Dear Iain
> 
> thank you for providing this feedback! In order to do
> something about 
> it, can you provide us with a reproducible example?
> 
> You could do this, for example, by defining the content of
> your vector 
> "present" in the script, rather than reading a
> file from your file 
> system that nobody else can see, or by putting it on a
> webserver and use 
> a file connection to its URL in your call to read.table.
> 
> Best wishes
>       Wolfgang
> 
> ----------------------------------------------------
> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
> 
> Iain Gallagher ha scritto:
> > Hello list.
> > 
> > I'm using the following script to try and retrieve
> the 3'UTR start and end coordinates from Ensembl.
> > 
> > rm(list=ls())
> > library(biomaRt)
> > 
> > #read in probes called present on affy array (CPH in
> this script)
> > 
> > present <-
> read.table('cph_present_probes.txt', header=F,
> sep='\t')
> > present<-as.character(present[,1])
> > 
> > #present is a set of transcript ids
> > 
> > #get DB connection to retrieve required info
> > 
> > ensmart=useMart("ensembl",
> dataset="hsapiens_gene_ensembl")
> > 
> > #get 3'utr coords
> > 
> >
> utr_coords<-getBM(attributes=c('ensembl_gene_id',
> 'sequence_3utr_start', 'sequence_3utr_end'),
> filters='ensembl_transcript_id', values=present,
> mart=ensmart)
> > 
> > Running the script gives the following error.
> > 
> >                                                       
>                       V1
> > 1 Query ERROR: caught BioMart::Exception::Usage:
> Attribute 3utr_start NOT FOUND
> > Error in getBM(attributes =
> c("ensembl_gene_id",
> "sequence_3utr_start",  : 
> >   Number of columns in the query result doesn't
> equal number of attributes in query.  This is probably an
> internal error, please report.
> > 
> > Presumably some transcripts have more than 1 3'UTR
> (hence the number of columns difference described above)
> > 
> > Can anyone suggest a solution? Either a way to
> retrieve the start and end coords of the 3'UTRs or the
> length of the 3'UTRs (my real objective).
> > 
> > I have a separate script which will download the
> 3'UTR sequences and then count the characters but the
> datasets are large and that process seems somewhat laborious
> if the information is directly available.
> > 
> > Thanks
> > 
> > Iain
> > 
> >> sessionInfo()
> > R version 2.8.0 (2008-10-20) 
> > x86_64-pc-linux-gnu 
> > 
> > locale:
> >
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
> > 
> > attached base packages:
> > [1] stats     graphics  grDevices utils     datasets 
> methods   base     
> > 
> > other attached packages:
> > [1] biomaRt_1.16.0
> > 
> > loaded via a namespace (and not attached):
> > [1] RCurl_0.91-0 XML_1.95-3  
> >



More information about the Bioconductor mailing list