[Bioc-devel] the character to collapse the geneNames when using the disjointExons function with aggregateGenes=TRUE

Fri Aug 2 07:29:56 CEST 2013

These changes are implemented in GenomicFeatures 1.13.26.

Valerie

On 08/01/2013 08:45 AM, Nicolas Delhomme wrote:
> Fantastic!
>
> Cheers,
>
> Nico
>
> ---------------------------------------------------------------
> Nicolas Delhomme
>
> Genome Biology Computational Support
>
> European Molecular Biology Laboratory
>
> Tel: +49 6221 387 8310
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> ---------------------------------------------------------------
>
>
>
>
>
> On Jul 31, 2013, at 10:41 PM, Alejandro Reyes wrote:
>
>> Dear all,
>>
>> No problem from my side, I can adapt DEXSeq to those changes.
>>
>> Best regards,
>> Alejandro Reyes
>>
>>> Mike, Alejandro,
>>>
>>> I also wonder about getting rid of the 'exonID' metadata column. This is redundant with 'exonic_part_number'. Do you have a preference for keeping one or the other?
>>>
>>> Valerie
>>>
>>>
>>> On 07/31/2013 10:04 AM, Valerie Obenchain wrote:
>>>> Hi Nico,
>>>>
>>>> (Adding Mike and Alejandro.)
>>>>
>>>> Because disjointExons() came from DEXSeq I wanted to preserve the
>>>> behavior for backwards compatibility and familiarity to DEXSeq users.
>>>> There are a couple of changes I'd like to make so disjointExons() is
>>>> consistent with the other extractors in GenomicFeatures.
>>>>
>>>> (1) Change metadata column names from 'geneNames' and 'transcripts' to
>>>> 'gene_id' and tx_name'.
>>>>
>>>> (2) Instead of '+' or ';' to separate gene id's or transcript names,
>>>> these columns would each be a CharacterList.
>>>>
>>>> If Mike and Alejandro are ok with these I'll go ahead and implement them.
>>>>
>>>> Valerie
>>>>
>>>>
>>>>
>>>> On 07/31/2013 06:29 AM, Nicolas Delhomme wrote:
>>>>> Hej Val, I believe that one is for you :-)
>>>>>
>>>>> When using the aggregateGenes=TRUE parameter of the disjointExons
>>>>> function, the gene names are separated by a "+" character. Is there a
>>>>> particular reason for that? The reason I'm asking is that in the
>>>>> "transcripts" column the transcripts ID are separated by a semi-column
>>>>> and I was wondering if the "separator" could not be unified - i.e.
>>>>> using semi-colon for both the geneNames and transcripts column. Here a
>>>>> visual example of what I mean:
>>>>>
>>>>> GRanges with 1 range and 4 metadata columns:
>>>>>        seqnames             ranges strand |
>>>>>           <Rle>          <IRanges>  <Rle> |
>>>>>    [1]    Chr03 [4541747, 4541782]      - |
>>>>>                                                 geneNames
>>>>> <character>
>>>>>    [1] Potri.003G035500+Potri.003G035600+Potri.003G035700
>>>>> transcripts
>>>>> <character>
>>>>>    [1] PAC:26999771;PAC:26999331;PAC:26999330;PAC:26999332;PAC:26999333
>>>>>        exonic_part_number      exonID
>>>>>                 <integer> <character>
>>>>>    [1]                  1        E001
>>>>>    ---
>>>>>    seqlengths:
>>>>>             Chr01         Chr02         Chr03 ... scaffold_99
>>>>> scaffold_991
>>>>>                NA            NA            NA ...
>>>>> NA            NA
>>>>>
>>>>>
>>>>> What do you say?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Nico
>>>>>
>>>>> ---------------------------------------------------------------
>>>>> Nicolas Delhomme
>>>>>
>>>>> Genome Biology Computational Support
>>>>>
>>>>> European Molecular Biology Laboratory
>>>>>
>>>>> Tel: +49 6221 387 8310
>>>>> Email: nicolas.delhomme at embl.de
>>>>> Meyerhofstrasse 1 - Postfach 10.2209
>>>>> 69102 Heidelberg, Germany
>>>>> ---------------------------------------------------------------
>>>>>
>>>>> My sessionInfo()R version 3.0.1 (2013-05-16)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>   [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>>>>>   [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>>>>>   [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>>>>>   [7] LC_PAPER=C                 LC_NAME=C
>>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel  stats     graphics  grDevices utils datasets  methods
>>>>> [8] base
>>>>>
>>>>> other attached packages:
>>>>>   [1] Rsamtools_1.13.26       Biostrings_2.29.14 DEXSeq_1.7.6
>>>>>   [4] GenomicFeatures_1.13.21 AnnotationDbi_1.23.18 Biobase_2.21.6
>>>>>   [7] GenomicRanges_1.13.35   XVector_0.1.0 IRanges_1.19.19
>>>>> [10] BiocGenerics_0.7.3      BiocInstaller_1.11.4
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>   [1] biomaRt_2.17.2     bitops_1.0-5       BSgenome_1.29.1 DBI_0.2-7
>>>>>   [5] hwriter_1.3        RCurl_1.95-4.1     RSQLite_0.11.4
>>>>> rtracklayer_1.21.9
>>>>>   [9] statmod_1.4.17     stats4_3.0.1       stringr_0.6.2
>>>>> tools_3.0.1
>>>>> [13] XML_3.98-1.1       zlibbioc_1.7.0
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>