[BioC] ChIPpeakAnno to find peaks nearest to miRNA

Fri Jul 27 16:52:04 CEST 2012

Hi Paolo,

Because the org database do not contain the info for ENSMUSG00000089245, there will show an error by addGeneIDs.
In this case, you'd better use biomaRt to get the annotation, please try,

feature_ids <- unique(annotatedPeak$feature)
feature_ids<-feature_ids[!is.na(feature_ids)]
feature_ids<-feature_ids[feature_ids!=""]
mart<-useMart(biomart="ensembl",dataset="mmusculus_gene_ensembl")
IDs2Add<-getBM(attributes=c("ensembl_gene_id","mirbase_transcript_name","mirbase_id","mirbase_accession","external_gene_id"),filters = "ensembl_gene_id", values = feature_ids, mart=mart)
duplicated_ids<-IDs2Add[duplicated(IDs2Add[,"ensembl_gene_id"]),"ensembl_gene_id"]
if(length(duplicated_ids)>0){
	IDs2Add.duplicated<-IDs2Add[IDs2Add[,"ensembl_gene_id"] %in% duplicated_ids,]
	IDs2Add.duplicated<-condenseMatrixByColnames(as.matrix(IDs2Add.duplicated),"ensembl_gene_id")
	IDs2Add<-IDs2Add[!(IDs2Add[,"ensembl_gene_id"] %in% duplicated_ids),]
	IDs2Add<-rbind(IDs2Add,IDs2Add.duplicated)
}

And then merge the useful information to the annotatedPeak.

If you have any questions, please let me know.

Yours sincerely,

Jianhong Ou

jianhong.ou at umassmed.edu

On Jul 27, 2012, at 9:57 AM, Zhu, Lihua (Julie) wrote:

> Paolo,
> 
> Could you please send us a few rows of miRNAs in annotatedPeaks? Thanks!
> 
> Best regards,
> 
> Julie 
> ________________________________________
> From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Paolo Kunderfranco [paolo.kunderfranco at gmail.com]
> Sent: Friday, July 27, 2012 5:50 AM
> To: bioconductor at r-project.org
> Subject: [BioC] ChIPpeakAnno to find peaks nearest to miRNA
> 
> Dear All,
> I would like to use ChIPpeakAnno to find peaks nearest to miRNA.
> 
> I loaded my bed file and created a ranged data, load
> mmusculus_gene_ensembl dataset through mart and annotated my peaks,
> and it seems ok,
> 
> test.rangedData = BED2RangedData(test.bed)
> mart<-useMart(biomart="ensembl",dataset="mmusculus_gene_ensembl")
> Annotation = getAnnotation(mart, featureType="miRNA")
> annotatedPeak = annotatePeakInBatch(test.rangedData, AnnotationData=Annotation)
> as.data.frame(annotatedPeak)
> 
> <factor>            <IRanges> |   <character> <character>
> <character>      <numeric>    <numeric>   <character>
> MACS_peak_109 ENSMUSG00000089245        1 [54494876, 54496209] |
> MACS_peak_109           + ENSMUSG00000089245       54826062
> 54826166      upstream
> numeric>        <numeric>              <character>
> -331186           329853             NearestStart
> 
> 
> Now I would like to add miRNA Id as I already did when I annotated for
> TSS, but something goes wrong, any ideas how to solve it?
> 
> library("org.Mm.eg.db")
> b<- addGeneIDs(annotatedPeak,"org.Mm.eg.db",c("symbol"))
> Error: No entrez identifier can be mapped by input data based on the
> feature_id_type. Please consider to use correct feature_id_type,
> orgAnn or annotatedPeak
> 
> 
> Thanks,
> 
> Paolo
> 
> 
>> traceback()
> 2: stop("No entrez identifier can be mapped by input data based on the
> feature_id_type.\nPlease consider to use correct feature_id_type,
> orgAnn or annotatedPeak\n",
>       call. = FALSE)
> 1: addGeneIDs(annotatedPeak, "org.Mm.eg.db", c("symbol"))
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=Italian_Italy.1252  LC_CTYPE=Italian_Italy.1252
> LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C
> [5] LC_TIME=Italian_Italy.1252
> 
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets
> methods   base
> 
> other attached packages:
> [1] targetscan.Mm.eg.db_0.5.0           BiocInstaller_1.4.7
>      org.Mm.eg.db_2.7.1                  ChIPpeakAnno_2.4.0
> [5] limma_3.12.1                        org.Hs.eg.db_2.7.1
>      GO.db_2.7.1                         RSQLite_0.11.1
> [9] DBI_0.2-5                           AnnotationDbi_1.18.1
>      BSgenome.Ecoli.NCBI.20080805_1.3.17 BSgenome_1.24.0
> [13] GenomicRanges_1.8.7                 Biostrings_2.24.1
>      IRanges_1.14.4                      multtest_2.12.0
> [17] Biobase_2.16.0                      biomaRt_2.12.0
>      BiocGenerics_0.2.0                  gplots_2.11.0
> [21] MASS_7.3-19                         KernSmooth_2.23-8
>      caTools_1.13                        bitops_1.0-4.1
> [25] gdata_2.11.0                        gtools_2.7.0
> 
> loaded via a namespace (and not attached):
> [1] RCurl_1.91-1.1   splines_2.15.0   stats4_2.15.0
> survival_2.36-14 tools_2.15.0     XML_3.9-4.1
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor