[BioC] Human genomic sequences
Vincent Carey
stvjc at channing.harvard.edu
Mon Oct 18 23:26:51 CEST 2010
On Fri, Oct 15, 2010 at 2:48 PM, Sheena Scroggins
<sheena.scroggins at gmail.com> wrote:
> Hi,
>
>
> I'm working on a project that is reviewing genes in the human genome
> project. I'm hoping you can point me to some BioConductor packages that will
> help me in my quest. The goal is:
>
>
> - Search through hg19 at UCSC (or other genome database) for particular
> genes of interest.
org.Hs.eg.db can be used. if you are interested in, say, FLT1, you determine
aspects of its location via
> library(org.Hs.eg.db)
> get("FLT1", revmap(org.Hs.egSYMBOL))
[1] "2321"
> get("2321", org.Hs.egCHRLOC)
13 13 13 13
-28973181 -28959687 -28942233 -28874482
> get("2321", org.Hs.egCHRLOCEND)
13 13 13 13
-29069265 -29069265 -29069265 -29069265
The multiplicities of addresses are common, and you will need rules to resolve.
If you wish to work at the transcript level, the GenomicFeatures
package is relevant;
makeTranscriptDbFromUCSC is a relevant function.
> - Increase the search by looking both upstream and downstream until
> another gene is hit
This is a programming task.
> - Be able to sort by the conservation of the upstream and downstream
> data.
You can use rtracklayer to import conservation scores into R. Example:
> library(rtracklayer)
> s2 = browserSession("UCSC")
> ct = track(s2, "cons46way")
> ct
UCSC track 'Primate Cons'
UCSCData with 9974 rows and 1 value column across 1 space
space ranges | score
<character> <IRanges> | <numeric>
1 chr21 [33031597, 33031597] | 0.560087
2 chr21 [33031598, 33031598] | 0.560087
3 chr21 [33031599, 33031599] | 0.435717
4 chr21 [33031600, 33031600] | 0.435717
5 chr21 [33031601, 33031601] | 0.560087
6 chr21 [33031602, 33031602] | 0.560087
7 chr21 [33031603, 33031603] | 0.435717
8 chr21 [33031604, 33031604] | 0.435717
9 chr21 [33031605, 33031605] | 0.560087
... ... ... ... ...
9966 chr21 [33041562, 33041562] | -0.266457
9967 chr21 [33041563, 33041563] | 0.433850
9968 chr21 [33041564, 33041564] | 0.507567
9969 chr21 [33041565, 33041565] | 0.655000
9970 chr21 [33041566, 33041566] | -0.155882
9971 chr21 [33041567, 33041567] | -0.340173
9972 chr21 [33041568, 33041568] | 0.507567
9973 chr21 [33041569, 33041569] | -1.740790
9974 chr21 [33041570, 33041570] | -1.925080
> length(ct$score)
[1] 9974
> summary(ct$score)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-4.62200 -0.33040 0.36470 0.03753 0.50520 0.65500
what is returned at this point depends on the state of the browser
which you can set manually or programmatically; see the rtracklayer
vignette.
biomaRt package is also relevant.
>
> We are essentially looking for unknown promoters and other key pieces of the
> DNA that is not in the gene itself, but is conserved through the different
> Mammal genomes that are available at the different browsers. I found a
> program that does this (kind of ) with BLAST data, so I'm hoping you can
> point me to any useful package or other places I can search for a way to do
> this.
>
> Thanks for your time,
>
> Sheena
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
More information about the Bioconductor
mailing list