[BioC] ChIPpeakAnno results
Marc Noguera
mnoguera at imppc.org
Tue Aug 10 16:26:37 CEST 2010
My explanation wasn't quite clear, excuse me.
I have a RangedData object, as obtained from ChipPeakAnno, with lots of
peaks which are annotated like this:
>table <- read.table("H3K4me3vsIgG_peaks.bed")
>bed <- BED2RangedData(bed,header=T)
>annotatedPeaks<-annotatePeakInBatch(bed,AnnotationData=TSS.mouse.NCBIM37)
> annotatedPeaks[1:2,]
RangedData with 2 rows and 9 value columns across 21 spaces
space ranges |
<character> <IRanges> |
MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] |
MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] |
peak strand feature
<character> <character> <character>
MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1
ENSMUSG00000025907
MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1
ENSMUSG00000061518
start_position end_position insideFeature
<numeric> <numeric> <character>
MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream
MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart
distancetoFeature shortestDistance
<numeric> <numeric>
MACS_peak_10 ENSMUSG00000025907 -604 2
MACS_peak_102 ENSMUSG00000061518 -73 73
fromOverlappingOrNearest
<character>
MACS_peak_10 ENSMUSG00000025907 NearestStart
MACS_peak_102 ENSMUSG00000061518 NearestStart
If I do like this:
> org.Mm.egENSEMBL2EG$`ENSMUSG00000025907`
[1] "12421"
> org.Mm.egSYMBOL$`12421`
[1] "Rb1cc1"
Which is the symbol I am itnerested in. I would like to add the symbol
corresponding to each row in the RangedData Object as a new column.
I have tried this, with mget:
>
annotatedPeaks$entrez<-mget(annotatedPeaks$feature,org.Mm.egENSEMBL2EG,ifnotfound=NA)
but:
> annotatedPeaks[1:2,]
RangedData with 2 rows and 10 value columns across 21 spaces
space ranges |
<character> <IRanges> |
MACS_peak_10 ENSMUSG00000025907 1 [ 6204139, 6204741] |
MACS_peak_102 ENSMUSG00000061518 1 [36748352, 36749344] |
peak strand
feature
<character> <character>
<character>
MACS_peak_10 ENSMUSG00000025907 MACS_peak_10 1
ENSMUSG00000025907
MACS_peak_102 ENSMUSG00000061518 MACS_peak_102 1
ENSMUSG00000061518
start_position end_position insideFeature
<numeric> <numeric> <character>
MACS_peak_10 ENSMUSG00000025907 6204743 6265656 upstream
MACS_peak_102 ENSMUSG00000061518 36748425 36750230 overlapStart
distancetoFeature shortestDistance
<numeric> <numeric>
MACS_peak_10 ENSMUSG00000025907 -604 2
MACS_peak_102 ENSMUSG00000061518 -73 73
fromOverlappingOrNearest entrez
<character> <list>
MACS_peak_10 ENSMUSG00000025907 NearestStart ########
MACS_peak_102 ENSMUSG00000061518 NearestStart ########
as mget returns a list. Note that some ensemble IDs map to more than one
gene ID.
Also, using the convert2EntrezID from the same ChIPpeakAnno package:
>
annotatedPeaks$EntrezID<-convert2EntrezID(IDs=annotatedPeaks$feature,orgAnn="org.Mm.eg.db",ID_type="ensembl_gene_id")
Error in `[[<-`(`*tmp*`, name, value = c("12421", "12859", "67387",
"623661", :
9633 elements in value to replace 13721 elements
>
Which returns a matrix (dim 9633,1) as some ensemblID map to same gene ID.
As far as I understand I need to get use the geneID to map ensemblID to
SYMBOL. So i cannot get the Symbols.
So, I am stuck here.
Thanks again,
Marc
> sessionInfo()
R version 2.11.0 (2010-04-22)
x86_64-unknown-linux-gnu
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] org.Mm.eg.db_2.4.1 ChIPpeakAnno_1.4.1
[3] limma_3.4.0 org.Hs.eg.db_2.4.1
[5] GO.db_2.4.1 RSQLite_0.9-0
[7] DBI_0.2-5 AnnotationDbi_1.10.1
[9] BSgenome.Ecoli.NCBI.20080805_1.3.16 BSgenome_1.16.1
[11] GenomicRanges_1.0.1 Biostrings_2.16.0
[13] IRanges_1.6.1 multtest_2.4.0
[15] Biobase_2.8.0 biomaRt_2.4.0
loaded via a namespace (and not attached):
[1] MASS_7.3-5 RCurl_1.4-2 splines_2.11.0 survival_2.35-8
[5] tools_2.11.0 XML_3.1-0
>
James W. MacDonald wrote:
> Hi Marc,
>
> On 8/10/2010 6:26 AM, Marc Noguera wrote:
>
>> Hi all,
>> this may be a very naive question by I have been trying to solve it
>> myself and i can't get through it.
>>
>> I have this RangedData object obtained from ChIPpeakAnno Package, which
>> correspond toa Chipseq experiment with annotated peaks, with ENSEMBL
>> identificators.
>> I can use this output already but like to transform the ENSEMBLID to a
>> gene symbol id
>>
>> for instance: ENSMUSG00000025907 to "Rb1cc1" symbol. It also would be
>> useful to add a field linking to a entrez gene web url.
>>
>> I have been looking at the org.Mm.eg.db package and although I can
>> retrieve the symbol for a particular ENSEMBLID can't get it for all the
>> elements in the object.
>>
>
> What have you tried so far? Unless you give an example of what you have
> done and how it didn't perform as you expect, it is very difficult for
> anybody to help.
>
> As a shot in the dark, have you looked at the help page for mget()?
>
> I don't really understand how the field linking to Entrez Gene would
> work, considering a RangedData object isn't an HTML page. However,
> building a URL to Entrez Gene isn't that difficult. You can hijack some
> internal code from the annotate package:
>
> > suppressMessages(library(annotate))
> > egids <- 1:5
> > annotate:::.repositories[["en"]](egids)
> [1]
> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=1"
> [2]
> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=2"
> [3]
> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=3"
> [4]
> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=4"
> [5]
> "http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=5"
>
> But doing it by hand wouldn't be that much more difficult. If you strip
> out the error checking from the above function, all it really consists of is
>
> thefunction <- function(ids){
> paste("http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=",
>
> ids, sep = "")
> }
>
> Best,
>
> Jim
>
>
>
>> Many thanks
>> Marc
>>
>>
>
>
--
-----------------------------------------------------
Marc Noguera i Julian, PhD
Genomics unit / Bioinformatics
Institut de Medicina Predictiva i Personalitzada
del Càncer (IMPPC)
B-10 Office
Carretera de Can Ruti
Camí de les Escoles s/n
08916 Badalona, Barcelona
More information about the Bioconductor
mailing list