[BioC] retrieving mRNA sequences via biomaRt
Simon
simon212 at gmx.de
Thu Aug 6 16:35:46 CEST 2009
Hello everybody,
I am trying to solve the following tasks as a first contact with the
bioconductor project:
# Task 1:
# find:
# * mRNA sequence (5'UTR, Coding region, 3'UTR)
# * position of start codon in sequence
# * position of stop codon in sequence
# * ID (Which ID(s) would I choose to reference my
# sequence hits? Embl, ensembl transcript id,
# Entrez Gene id, RefSeq, etc.?)
# * name of associated protein product
#
# where:
# * origin is human
# Entrez Search would be: human[ORGN]
# * sequence is mRNA transcript
# Entrez Search for Molecule Type: biomol_mRNA[PROP]?
# * mRNA sequence length is 3000 to 5000 nts
# * Entrez Search for Sequence Length: 3000:5000[SLEN]
# * coding region of mRNA length is 2000 to 3000 nts
# * Entrez Search Field for stop and start of
# coding region: start:stop[CDS]
#
#
# Task 2:
# store the retrieved information to file for the first 200 hits
# (Which would be a suitable file formate?)
I started by using and playing around with the biomaRt package for R,
but I got overwhelmed by its many possibilities.
I would be glad to get any feedback, on how to start or even solve my tasks.
Best regards,
Simon
More information about the Bioconductor
mailing list