[BioC] retrieving mRNA sequences via biomaRt
Simon
simon212 at gmx.de
Thu Aug 6 19:02:08 CEST 2009
Thanks, for the recommendation.
So far, I just read Steffen's and your biomaRt user’s guide and had a
look at the BioMart 0.7 Documentation, since I needed quick results.
I'm going to have a look at the recommended book and paper, now.
In the meantime, I got to a solution - but not a very satisfying one:
ensembl = useDataset("hsapiens_gene_ensembl", mart=ensembl)
myAttributes = c("embl", "cdna", "5utr", "coding", "3utr", "5_utr_end",
"3_utr_start", "sequence_cdna_length","cds_length")
...
qresult = getBM(attributes=myAttributes,
filters=...,
values=...,
mart=ensembl)
finalResult = mySeqCdsLengthFilter(qresult, c(3000, 5000), c(2000, 3000))
For now, I parse my query results manually, using
the values for "sequence_cdna_length" and "cds_length" as limits.
I wish these attributes were filters ...
or there was a BioMart and a database, I could use in a linked query via
getLDS.
I'm still curious for a smarter solution.
Best regards,
Simon
Wolfgang Huber wrote:
>
> Hi Simon,
>
> with all respect, for a first contact with the Bioconductor project I'd
> also recommend studying some of the documentation.
>
> A (slightly biased) set of points to start with are the "Bioconductor
> Case Studies" book by Hahne, Huber, Gentleman, Falcon and the paper
> "Mapping identifiers for the integration of genomic datasets with the
> R/Bioconductor package biomaRt." by Durinck et al. in Nature Protocols
> 2009;4(8):1184-91.
>
> Best wishes
> Wolfgang
>
>
>
>
> Simon ha scritto:
>> Hello everybody,
>>
>> I am trying to solve the following tasks as a first contact with the
>> bioconductor project:
>>
>> # Task 1:
>> # find:
>> # * mRNA sequence (5'UTR, Coding region, 3'UTR)
>> # * position of start codon in sequence
>> # * position of stop codon in sequence
>> # * ID (Which ID(s) would I choose to reference my
>> # sequence hits? Embl, ensembl transcript id,
>> # Entrez Gene id, RefSeq, etc.?)
>> # * name of associated protein product
>> #
>> # where:
>> # * origin is human
>> # Entrez Search would be: human[ORGN]
>> # * sequence is mRNA transcript
>> # Entrez Search for Molecule Type: biomol_mRNA[PROP]?
>> # * mRNA sequence length is 3000 to 5000 nts
>> # * Entrez Search for Sequence Length: 3000:5000[SLEN]
>> # * coding region of mRNA length is 2000 to 3000 nts
>> # * Entrez Search Field for stop and start of
>> # coding region: start:stop[CDS]
>> #
>> #
>> # Task 2:
>> # store the retrieved information to file for the first 200 hits
>> # (Which would be a suitable file formate?)
>>
>> I started by using and playing around with the biomaRt package for R,
>> but I got overwhelmed by its many possibilities.
>>
>> I would be glad to get any feedback, on how to start or even solve my
>> tasks.
>>
>> Best regards,
>> Simon
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list