[Bioc-devel] the character to collapse the geneNames when using the disjointExons function with aggregateGenes=TRUE

Wed Jul 31 15:29:09 CEST 2013

Hej Val, I believe that one is for you :-)

When using the aggregateGenes=TRUE parameter of the disjointExons function, the gene names are separated by a "+" character. Is there a particular reason for that? The reason I'm asking is that in the "transcripts" column the transcripts ID are separated by a semi-column and I was wondering if the "separator" could not be unified - i.e. using semi-colon for both the geneNames and transcripts column. Here a visual example of what I mean:

GRanges with 1 range and 4 metadata columns:
      seqnames             ranges strand |
         <Rle>          <IRanges>  <Rle> |
  [1]    Chr03 [4541747, 4541782]      - |
                                               geneNames
                                             <character>
  [1] Potri.003G035500+Potri.003G035600+Potri.003G035700
                                                           transcripts
                                                           <character>
  [1] PAC:26999771;PAC:26999331;PAC:26999330;PAC:26999332;PAC:26999333
      exonic_part_number      exonID
               <integer> <character>
  [1]                  1        E001
  ---
  seqlengths:
           Chr01         Chr02         Chr03 ...   scaffold_99  scaffold_991
              NA            NA            NA ...            NA            NA

What do you say?

Cheers,

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------

My sessionInfo()R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] Rsamtools_1.13.26       Biostrings_2.29.14      DEXSeq_1.7.6           
 [4] GenomicFeatures_1.13.21 AnnotationDbi_1.23.18   Biobase_2.21.6         
 [7] GenomicRanges_1.13.35   XVector_0.1.0           IRanges_1.19.19        
[10] BiocGenerics_0.7.3      BiocInstaller_1.11.4   

loaded via a namespace (and not attached):
 [1] biomaRt_2.17.2     bitops_1.0-5       BSgenome_1.29.1    DBI_0.2-7         
 [5] hwriter_1.3        RCurl_1.95-4.1     RSQLite_0.11.4     rtracklayer_1.21.9
 [9] statmod_1.4.17     stats4_3.0.1       stringr_0.6.2      tools_3.0.1       
[13] XML_3.98-1.1       zlibbioc_1.7.0