[BioC] retrieving mRNA sequences via biomaRt
Simon
simon212 at gmx.de
Fri Aug 7 07:58:36 CEST 2009
Hi Steffen,
Thanks for the information.
Best regards,
Simon
Steffen at stat.Berkeley.EDU wrote:
> Hi Simon,
>
> The cdna attribute is the combination of 5utr + coding + 3utr so you can
> remove 5utr, coding and 3utr from your list of attributes to retrieve. I
> would take ensembl_transcript_id instead of embl.
>
> Cheers,
> Steffen
>
>> Thanks, for the recommendation.
>>
>> So far, I just read Steffen's and your biomaRt user’s guide and had a
>> look at the BioMart 0.7 Documentation, since I needed quick results.
>> I'm going to have a look at the recommended book and paper, now.
>>
>>
>> In the meantime, I got to a solution - but not a very satisfying one:
>>
>> ensembl = useDataset("hsapiens_gene_ensembl", mart=ensembl)
>>
>> myAttributes = c("embl", "cdna", "5utr", "coding", "3utr", "5_utr_end",
>> "3_utr_start", "sequence_cdna_length","cds_length")
>>
>> ...
>>
>> qresult = getBM(attributes=myAttributes,
>> filters=...,
>> values=...,
>> mart=ensembl)
>>
>> finalResult = mySeqCdsLengthFilter(qresult, c(3000, 5000), c(2000, 3000))
>>
>> For now, I parse my query results manually, using
>> the values for "sequence_cdna_length" and "cds_length" as limits.
>> I wish these attributes were filters ...
>> or there was a BioMart and a database, I could use in a linked query via
>> getLDS.
>>
>> I'm still curious for a smarter solution.
>>
>>
>> Best regards,
>> Simon
>>
>>
>> Wolfgang Huber wrote:
>>> Hi Simon,
>>>
>>> with all respect, for a first contact with the Bioconductor project I'd
>>> also recommend studying some of the documentation.
>>>
>>> A (slightly biased) set of points to start with are the "Bioconductor
>>> Case Studies" book by Hahne, Huber, Gentleman, Falcon and the paper
>>> "Mapping identifiers for the integration of genomic datasets with the
>>> R/Bioconductor package biomaRt." by Durinck et al. in Nature Protocols
>>> 2009;4(8):1184-91.
>>>
>>> Best wishes
>>> Wolfgang
>>>
>>>
>>>
>>>
>>> Simon ha scritto:
>>>> Hello everybody,
>>>>
>>>> I am trying to solve the following tasks as a first contact with the
>>>> bioconductor project:
>>>>
>>>> # Task 1:
>>>> # find:
>>>> # * mRNA sequence (5'UTR, Coding region, 3'UTR)
>>>> # * position of start codon in sequence
>>>> # * position of stop codon in sequence
>>>> # * ID (Which ID(s) would I choose to reference my
>>>> # sequence hits? Embl, ensembl transcript id,
>>>> # Entrez Gene id, RefSeq, etc.?)
>>>> # * name of associated protein product
>>>> #
>>>> # where:
>>>> # * origin is human
>>>> # Entrez Search would be: human[ORGN]
>>>> # * sequence is mRNA transcript
>>>> # Entrez Search for Molecule Type: biomol_mRNA[PROP]?
>>>> # * mRNA sequence length is 3000 to 5000 nts
>>>> # * Entrez Search for Sequence Length: 3000:5000[SLEN]
>>>> # * coding region of mRNA length is 2000 to 3000 nts
>>>> # * Entrez Search Field for stop and start of
>>>> # coding region: start:stop[CDS]
>>>> #
>>>> #
>>>> # Task 2:
>>>> # store the retrieved information to file for the first 200 hits
>>>> # (Which would be a suitable file formate?)
>>>>
>>>> I started by using and playing around with the biomaRt package for R,
>>>> but I got overwhelmed by its many possibilities.
>>>>
>>>> I would be glad to get any feedback, on how to start or even solve my
>>>> tasks.
>>>>
>>>> Best regards,
>>>> Simon
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>
More information about the Bioconductor
mailing list