[BioC] Distance from TSS and CPG
Tim Triche, Jr.
tim.triche at gmail.com
Wed Dec 14 20:51:50 CET 2011
I wrote this up as an example in the IlluminaHumanMethylation450kprobe
package... which seemingly disappeared into thin air after uploading it!
Oh well. IlluminaHumanMethylation450kprobe for this and several other
common use cases, otherwise here's the man page and data.frame... hopefully
it makes sense. (There is a similar object in the .db package but without
any sequences)
For what you want, you could just do (even with the crufty old 1.4.6 .db
package)
> library(IlluminaHumanMethylation450k.db)
> sites <- toTable(IlluminaHumanMethylation450kCPG37) # or CPG36 if using
hg18
> chrs <- toTable(IlluminaHumanMethylation450kCHR37) # or CHR36 if using
hg18
> coords <- merge(sites, chrs, by='Probe_ID')
> names(coords) <- c('probe','site','chr')
> head(coords)
probe site chr
1 cg00000029 53468112 16
2 cg00000108 37459206 3
3 cg00000109 171916037 3
4 cg00000165 91194674 1
5 cg00000236 42263294 8
6 cg00000289 69341139 14
> library(GenomicFeatures)
> CpGs.unstranded <- with(coords,
GRanges(paste('chr',chr,sep=''),
IRanges(start=site, width=1,
names=probe)))
> refgene.TxDb = makeTranscriptDbFromUCSC('refGene', genome='hg19')
> TSS.forward = transcripts(refgene.TxDb,
vals=list(tx_strand='+'),
columns='gene_id')
> nearest.fwd = precede(CpGs.unstranded, TSS.forward)
> nearest.fwd.eg = nearest.fwd # to keep dimensions right
> notfound = which(is.na(nearest.fwd)) # track for later
> nearest.fwd.eg[-notfound] =
as.character(elementMetadata(TSS.forward)$gene_id[nearest.fwd[-notfound]])
> TSSs.fwd = start(TSS.forward[nearest.fwd[-notfound]])
> distToTSS.fwd = nearest.fwd # to keep dimensions right
> distToTSS.fwd[-notfound] = start(CpGs.unstranded)[-notfound] - TSSs.fwd
And likewise with vals=list(tx_strand='-') for the reverse strand.
For CpG island distance you will need to decide which CpG island definition
to use. Personally I like Irizarry's. Once you have constructed a GRanges
object with the start and end coordinates of the CpG islands, most of it
will be equally straightforward.
On Wed, Dec 7, 2011 at 2:25 AM, Khadeeja Ismail <hajjja at yahoo.com> wrote:
> Hi,
>
> I have a list of probes from IlluminaHumanMethylation450k array, and I
> need to
> find the distance from TSS and also the distance from CpG island for each.
> Is
> there a simple way to do this?
>
> Thanks in advance,
> Khadeeja
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
--
If people do not believe that mathematics is simple,
it is only because they do not realize how complicated life is. John von
Neumann<http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Von_Neumann.html>
More information about the Bioconductor
mailing list