[BioC] biomaRt 3'UTR coordinates
Wolfgang Huber
huber at ebi.ac.uk
Sat Dec 6 15:11:15 CET 2008
Dear Iain
thank you for providing this feedback! In order to do something about
it, can you provide us with a reproducible example?
You could do this, for example, by defining the content of your vector
"present" in the script, rather than reading a file from your file
system that nobody else can see, or by putting it on a webserver and use
a file connection to its URL in your call to read.table.
Best wishes
Wolfgang
----------------------------------------------------
Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
Iain Gallagher ha scritto:
> Hello list.
>
> I'm using the following script to try and retrieve the 3'UTR start and end coordinates from Ensembl.
>
> rm(list=ls())
> library(biomaRt)
>
> #read in probes called present on affy array (CPH in this script)
>
> present <- read.table('cph_present_probes.txt', header=F, sep='\t')
> present<-as.character(present[,1])
>
> #present is a set of transcript ids
>
> #get DB connection to retrieve required info
>
> ensmart=useMart("ensembl", dataset="hsapiens_gene_ensembl")
>
> #get 3'utr coords
>
> utr_coords<-getBM(attributes=c('ensembl_gene_id', 'sequence_3utr_start', 'sequence_3utr_end'), filters='ensembl_transcript_id', values=present, mart=ensmart)
>
> Running the script gives the following error.
>
> V1
> 1 Query ERROR: caught BioMart::Exception::Usage: Attribute 3utr_start NOT FOUND
> Error in getBM(attributes = c("ensembl_gene_id", "sequence_3utr_start", :
> Number of columns in the query result doesn't equal number of attributes in query. This is probably an internal error, please report.
>
> Presumably some transcripts have more than 1 3'UTR (hence the number of columns difference described above)
>
> Can anyone suggest a solution? Either a way to retrieve the start and end coords of the 3'UTRs or the length of the 3'UTRs (my real objective).
>
> I have a separate script which will download the 3'UTR sequences and then count the characters but the datasets are large and that process seems somewhat laborious if the information is directly available.
>
> Thanks
>
> Iain
>
>> sessionInfo()
> R version 2.8.0 (2008-10-20)
> x86_64-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_GB.UTF-8;LC_NUMERIC=C;LC_TIME=en_GB.UTF-8;LC_COLLATE=en_GB.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_GB.UTF-8;LC_PAPER=en_GB.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_GB.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] biomaRt_1.16.0
>
> loaded via a namespace (and not attached):
> [1] RCurl_0.91-0 XML_1.95-3
>
More information about the Bioconductor
mailing list