[BioC] ChIPpeakAnno annotatePeakInBatch error message

Fri May 28 04:36:45 CEST 2010

Oh, thanks for this fix. I forgot to remove the chr*_random rows when I loaded the CpG Island BED file into R.

Just one more point though. I just found that after chromosome 1, the annotated peaks and features were on different chromosomes in the spreadsheet you sent to me. I suppose this is because the CpG islands file is ordered chr1, chr2, chr3, ..., whereas the genes file is ASCII ordered (i.e. chr1, chr10, chr11, ...), and you merge the overlaps by list position. It would be important to make this requirement clear in the documentation (annotatePeakInBatch.Rd), or alternatively to make it not depend on these two tables having the same chromosome ordering.

- Dario.

---- Original message ----
>Date: Thu, 27 May 2010 14:26:12 -0400
>From: "Zhu, Julie" <Julie.Zhu at umassmed.edu>  
>Subject: Re: [BioC] ChIPpeakAnno annotatePeakInBatch error message  
>To: "D.Strbenac at garvan.org.au" <D.Strbenac at garvan.org.au>, "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>
>   Hi Dario,
>
>   Thanks for the vigorous test of the new feature!
>
>   The peak dataset contains chrX_random that is not in
>   the feature dataset. I added is.na check on the
>   strand which should fix the problem. I also attached
>   the annotated Dataset. Please let me know if you
>   encounter any problem.
>
>   Best regards,
>
>   Julie
>
>   On 5/26/10 11:00 PM, "Dario Strbenac"
>   <D.Strbenac at garvan.org.au> wrote:
>
>     Hello,
>
>     Yes, I encountered the same problem again. This
>     time I tried the code on my full table of data.
>     This is my script. All the files it refers to are
>     web accessible, so that you can replicate it too.
>     I am definitely using version 1.5.3 of the
>     package.
>
>     CpGIslandsTable <-
>     read.table("http://129.94.136.7/file_dump/dario/hg18_CpG_Islands.bed",
>     sep = '\t', stringsAsFactors = FALSE)
>     genesTable <-
>     read.csv("http://129.94.136.7/file_dump/dario/humanGenomeAnnotation.csv",
>     stringsAsFactors = FALSE)
>     colnames(CpGIslandsTable) <- c("chr", "start",
>     "end", "name")
>
>     peaksRangedData <- RangedData(space =
>     CpGIslandsTable$chr, ranges = IRanges(start =
>     CpGIslandsTable$start, end = CpGIslandsTable$end))
>     featuresRangedData <- RangedData(name =
>     genesTable$name, space = genesTable$chr, strand =
>     genesTable$strand, ranges = IRanges(start =
>     genesTable$start, end = genesTable$end))
>     featureLoc <- "TSS"
>
>     annotatePeakInBatch(peaksRangedData,
>     AnnotationData = featuresRangedData,
>     PeakLocForDistance = "middle")
>
>     > sessionInfo()
>     R version 2.11.0 (2010-04-22)
>     x86_64-pc-mingw32
>
>     locale:
>     [1] LC_COLLATE=English_Australia.1252
>      LC_CTYPE=English_Australia.1252
>        LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
>                           LC_TIME=English_Australia.1252
>       
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils
>         datasets  methods   base    
>
>     other attached packages:
>      [1] ChIPpeakAnno_1.5.3
>                      limma_3.4.0
>                             org.Hs.eg.db_2.4.1
>                      GO.db_2.4.1
>                             RSQLite_0.9-0
>                         
>      [6] DBI_0.2-5
>                               AnnotationDbi_1.10.1
>                    BSgenome.Ecoli.NCBI.20080805_1.3.16
>     BSgenome_1.16.0
>                         GenomicRanges_1.0.1
>                   
>     [11] Biostrings_2.16.0
>                       IRanges_1.6.0
>                           multtest_2.4.0
>                          Biobase_2.8.0
>                           biomaRt_2.4.0
>                         
>
>     loaded via a namespace (and not attached):
>     [1] MASS_7.3-5      RCurl_1.3-1     splines_2.11.0
>      survival_2.35-8 XML_2.8-1
>
>     ---- Original message ----
>     >Date: Mon, 24 May 2010 22:57:47 -0400
>     >From: "Zhu, Julie" <Julie.Zhu at umassmed.edu>
>     >Subject: Re: [BioC] ChIPpeakAnno
>     annotatePeakInBatch error message
>     >To: "D.Strbenac at garvan.org.au"
>     <D.Strbenac at garvan.org.au>,
>     "bioconductor at stat.math.ethz.ch"
>     <bioconductor at stat.math.ethz.ch>
>     >
>     >   Hi Dario,
>     >
>     >   Please download dev 1.5.3 version of
>     ChIPpeakAnno
>     >   and let me know if you encounter any problem.
>     >   Thanks!
>     >
>     >   Best regards,
>     >
>     >   Julie
>     >
>     >   annotatePeakInBatch(peaksRangedData,
>     AnnotationData
>     >   = featuresRangedData, PeakLocForDistance =
>     "middle")
>     >   RangedData with 6 rows and 9 value columns
>     across 2
>     >   spaces
>     >             space               ranges |
>            peak
>     >        strand     feature start_position
>     end_position
>     >   insideFeature distancetoFeature
>     >       <character>            <IRanges> |
>     <character>
>     >   <character> <character>      <numeric>
>        <numeric>
>     >     <character>         <numeric>
>     >   1 1        chr1 [ 2000010,  2000310] |
>               1
>     >             +           1          1e+06
>          2.0e+06
>     >      downstream           1000160
>     >   2 2        chr1 [19000000, 19000300] |
>               2
>     >             -           2          1e+07
>          2.0e+07
>     >          inside            999850
>     >   3 2        chr1 [30000000, 30000300] |
>               3
>     >             -           2          1e+07
>          2.0e+07
>     >        upstream         -10000150
>     >   4 4        chr2 [     300,      600] |
>               4
>     >             +           4          1e+03
>          5.0e+03
>     >        upstream              -550
>     >   6 6        chr2 [  100000,   100300] |
>               6
>     >             +           6          1e+04
>          1.5e+04
>     >      downstream             90150
>     >   5 5        chr2 [    5500,     5800] |
>               5
>     >             -           5          6e+03
>          7.0e+03
>     >      downstream              1350
>     >       shortestDistance fromOverlappingOrNearest
>     >              <numeric>              <character>
>     >   1 1               10             NearestStart
>     >   2 2           999700             NearestStart
>     >   3 2         10000000             NearestStart
>     >   4 4              400             NearestStart
>     >   6 6            85000             NearestStart
>     >   5 5              200             NearestStart
>     >
>     >   > sessionInfo()
>     >   R version 2.11.0 (2010-04-22)
>     >   i386-apple-darwin9.8.0
>     >
>     >   locale:
>     >   [1]
>     >
>       en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>     >
>     >   attached base packages:
>     >   [1] stats     graphics  grDevices utils
>         datasets
>     >    methods   base    
>     >
>     >   other attached packages:
>     >    [1] ChIPpeakAnno_1.5.3
>                      limma_3.4.0
>     >                           org.Hs.eg.db_2.4.1
>     >                  
>     >    [4] GO.db_2.4.1
>     >                           RSQLite_0.9-0
>     >                         DBI_0.2-5
>     >                           
>     >    [7] AnnotationDbi_1.10.1
>     >
>                      BSgenome.Ecoli.NCBI.20080805_1.3.16
>     >   BSgenome_1.16.1                   
>     >   [10] GenomicRanges_1.0.1
>     >                   Biostrings_2.16.0
>     >                     IRanges_1.6.1
>     >                       
>     >   [13] multtest_2.4.0
>     >                        Biobase_2.8.0
>     >                         biomaRt_2.4.0
>     >                       
>     >
>     >   On 5/24/10 5:10 AM, "Dario Strbenac"
>     >   <D.Strbenac at garvan.org.au> wrote:
>     >
>     >     Hello,
>     >
>     >     I made another small example of using
>     >     annoPeakInBatch to demonstrate to a friend,
>     but it
>     >     has crashed. It's similar to the other
>     example but
>     >     with different data. I'm not sure why it is
>     >     happening.
>     >
>     >     Here is my small example:
>     >
>     >     peaksT <- data.frame(chr = c("chr1", "chr1",
>     >     "chr1", "chr2", "chr2", "chr2"), start =
>     >     c(2000010, 19000000, 30000000, 300, 5500,
>     100000),
>     >     end = c(2000310, 19000300, 30000300, 600,
>     5800,
>     >     100300))
>     >     featuresT <- data.frame(name = c("gene1",
>     "gene2",
>     >     "gene3", "gene4", "gene5", "gene6"), chr =
>     >     c("chr1", "chr1", "chr1", "chr2", "chr2",
>     "chr2"),
>     >     start = c(1000000, 10000000, 15000000, 1000,
>     6000,
>     >     10000), end = c(2000000, 20000000, 22000000,
>     5000,
>     >     7000, 15000), strand = c('+', '-', '+', '+',
>     '-',
>     >     '+'))
>     >
>     >     require(ChIPpeakAnno)
>     >           
>     >     peaksRangedData <- RangedData(space =
>     peaksT$chr,
>     >     ranges = IRanges(start = peaksT$start, end =
>     >     peaksT$end))
>     >     featuresRangedData <- RangedData(name =
>     >     featuresT$name, space = featuresT$chr,
>     strand =
>     >     featuresT$strand, ranges = IRanges(start =
>     >     featuresT$start, end = featuresT$end))
>     >     featureLoc <- "TSS"
>     >
>     >     annotatePeakInBatch(peaksRangedData,
>     >     AnnotationData = featuresRangedData,
>     >     PeakLocForDistance = "middle")
>     >
>     >     Error in if (as.character(r.n$strand[i]) ==
>     "1" ||
>     >     as.character(r.n$strand[i]) ==  :
>     >       missing value where TRUE/FALSE needed
>     >
>     >     My sessionInfo is :
>     >
>     >     R version 2.11.0 (2010-04-22)
>     >     x86_64-unknown-linux-gnu
>     >
>     >     locale:
>     >      [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C
>     >                
>     >      [3] LC_TIME=en_AU.UTF-8
>     >            LC_COLLATE=en_AU.UTF-8  
>     >      [5] LC_MONETARY=C
>     >                  LC_MESSAGES=en_AU.UTF-8
>     >      [7] LC_PAPER=en_AU.UTF-8       LC_NAME=C
>     >                   
>     >      [9] LC_ADDRESS=C
>                   LC_TELEPHONE=C
>     >              
>     >     [11] LC_MEASUREMENT=en_AU.UTF-8
>     >     LC_IDENTIFICATION=C     
>     >
>     >     attached base packages:
>     >     [1] stats     graphics  grDevices utils
>     >         datasets  methods   base   
>     >
>     >     other attached packages:
>     >      [1] ChIPpeakAnno_1.5.2
>     >                      limma_3.4.0
>     >                          
>     >      [3] org.Hs.eg.db_2.4.1
>     >                      GO.db_2.4.1
>     >                          
>     >      [5] RSQLite_0.9-0
>                           DBI_0.2-5
>     >                            
>     >      [7] AnnotationDbi_1.10.0
>     >
>                        BSgenome.Ecoli.NCBI.20080805_1.3.16
>     >      [9] BSgenome_1.16.1
>     >                         GenomicRanges_1.0.1
>     >                  
>     >     [11] Biostrings_2.16.0
>     >                       IRanges_1.6.2
>     >                        
>     >     [13] multtest_2.4.0
>     >                          Biobase_2.8.0
>     >                        
>     >     [15] biomaRt_2.4.0                    
>     >
>     >     loaded via a namespace (and not attached):
>     >     [1] MASS_7.3-6      RCurl_1.4-2
>         splines_2.11.0
>     >      survival_2.35-8
>     >     [5] XML_3.1-0    
>     >
>     >     Thanks,
>     >            Dario.
>     >
>     >     --------------------------------------
>     >     Dario Strbenac
>     >     Research Assistant
>     >     Cancer Epigenetics
>     >     Garvan Institute of Medical Research
>     >     Darlinghurst NSW 2010
>     >     Australia
>     >
>     >
>         _______________________________________________
>     >     Bioconductor mailing list
>     >     Bioconductor at stat.math.ethz.ch
>     >
>         https://stat.ethz.ch/mailman/listinfo/bioconductor
>     >     Search the archives:
>     >
>         http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>     --------------------------------------
>     Dario Strbenac
>     Research Assistant
>     Cancer Epigenetics
>     Garvan Institute of Medical Research
>     Darlinghurst NSW 2010
>     Australia
>________________
>ForDarioStrbenac.xls (4489k bytes)

--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia