[Bioc-devel] the character to collapse the geneNames when using the disjointExons function with aggregateGenes=TRUE

Nicolas Delhomme delhomme at embl.de
Fri Aug 2 12:47:34 CEST 2013


Thanks Val!

I've been playing some more with that function and I was wondering if it would make sense to introduce an arguments to filter out synthetic exons if they are smaller than a given size. I.e. In my data I often get 1bp exons, which are obviously of no interest in the downstream analysis. I'm currently doing a post-filtering, but as that could benefit others it may be better if it's directly a function argument. The only issue is that I can't think of a decent default value; i.e. it depends much on the aligner used and on the kind of sequencing data, so it might have to be set to "NULL" by default.

Alejandro - I've seen that in DEXSeq you conserve only these exons you can test - what you call testable exons - does that include a size filter?

What's your take?

Nico

---------------------------------------------------------------
Nicolas Delhomme

Genome Biology Computational Support

European Molecular Biology Laboratory

Tel: +49 6221 387 8310
Email: nicolas.delhomme at embl.de
Meyerhofstrasse 1 - Postfach 10.2209
69102 Heidelberg, Germany
---------------------------------------------------------------





On Aug 2, 2013, at 7:29 AM, Valerie Obenchain wrote:

> These changes are implemented in GenomicFeatures 1.13.26.
> 
> Valerie
> 
> 
> On 08/01/2013 08:45 AM, Nicolas Delhomme wrote:
>> Fantastic!
>> 
>> Cheers,
>> 
>> Nico
>> 
>> ---------------------------------------------------------------
>> Nicolas Delhomme
>> 
>> Genome Biology Computational Support
>> 
>> European Molecular Biology Laboratory
>> 
>> Tel: +49 6221 387 8310
>> Email: nicolas.delhomme at embl.de
>> Meyerhofstrasse 1 - Postfach 10.2209
>> 69102 Heidelberg, Germany
>> ---------------------------------------------------------------
>> 
>> 
>> 
>> 
>> 
>> On Jul 31, 2013, at 10:41 PM, Alejandro Reyes wrote:
>> 
>>> Dear all,
>>> 
>>> No problem from my side, I can adapt DEXSeq to those changes.
>>> 
>>> Best regards,
>>> Alejandro Reyes
>>> 
>>>> Mike, Alejandro,
>>>> 
>>>> I also wonder about getting rid of the 'exonID' metadata column. This is redundant with 'exonic_part_number'. Do you have a preference for keeping one or the other?
>>>> 
>>>> Valerie
>>>> 
>>>> 
>>>> On 07/31/2013 10:04 AM, Valerie Obenchain wrote:
>>>>> Hi Nico,
>>>>> 
>>>>> (Adding Mike and Alejandro.)
>>>>> 
>>>>> Because disjointExons() came from DEXSeq I wanted to preserve the
>>>>> behavior for backwards compatibility and familiarity to DEXSeq users.
>>>>> There are a couple of changes I'd like to make so disjointExons() is
>>>>> consistent with the other extractors in GenomicFeatures.
>>>>> 
>>>>> (1) Change metadata column names from 'geneNames' and 'transcripts' to
>>>>> 'gene_id' and tx_name'.
>>>>> 
>>>>> (2) Instead of '+' or ';' to separate gene id's or transcript names,
>>>>> these columns would each be a CharacterList.
>>>>> 
>>>>> If Mike and Alejandro are ok with these I'll go ahead and implement them.
>>>>> 
>>>>> Valerie
>>>>> 
>>>>> 
>>>>> 
>>>>> On 07/31/2013 06:29 AM, Nicolas Delhomme wrote:
>>>>>> Hej Val, I believe that one is for you :-)
>>>>>> 
>>>>>> When using the aggregateGenes=TRUE parameter of the disjointExons
>>>>>> function, the gene names are separated by a "+" character. Is there a
>>>>>> particular reason for that? The reason I'm asking is that in the
>>>>>> "transcripts" column the transcripts ID are separated by a semi-column
>>>>>> and I was wondering if the "separator" could not be unified - i.e.
>>>>>> using semi-colon for both the geneNames and transcripts column. Here a
>>>>>> visual example of what I mean:
>>>>>> 
>>>>>> GRanges with 1 range and 4 metadata columns:
>>>>>>       seqnames             ranges strand |
>>>>>>          <Rle>          <IRanges>  <Rle> |
>>>>>>   [1]    Chr03 [4541747, 4541782]      - |
>>>>>>                                                geneNames
>>>>>> <character>
>>>>>>   [1] Potri.003G035500+Potri.003G035600+Potri.003G035700
>>>>>> transcripts
>>>>>> <character>
>>>>>>   [1] PAC:26999771;PAC:26999331;PAC:26999330;PAC:26999332;PAC:26999333
>>>>>>       exonic_part_number      exonID
>>>>>>                <integer> <character>
>>>>>>   [1]                  1        E001
>>>>>>   ---
>>>>>>   seqlengths:
>>>>>>            Chr01         Chr02         Chr03 ... scaffold_99
>>>>>> scaffold_991
>>>>>>               NA            NA            NA ...
>>>>>> NA            NA
>>>>>> 
>>>>>> 
>>>>>> What do you say?
>>>>>> 
>>>>>> Cheers,
>>>>>> 
>>>>>> Nico
>>>>>> 
>>>>>> ---------------------------------------------------------------
>>>>>> Nicolas Delhomme
>>>>>> 
>>>>>> Genome Biology Computational Support
>>>>>> 
>>>>>> European Molecular Biology Laboratory
>>>>>> 
>>>>>> Tel: +49 6221 387 8310
>>>>>> Email: nicolas.delhomme at embl.de
>>>>>> Meyerhofstrasse 1 - Postfach 10.2209
>>>>>> 69102 Heidelberg, Germany
>>>>>> ---------------------------------------------------------------
>>>>>> 
>>>>>> My sessionInfo()R version 3.0.1 (2013-05-16)
>>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>> 
>>>>>> locale:
>>>>>>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>>>>>>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>>>>>>  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8
>>>>>>  [7] LC_PAPER=C                 LC_NAME=C
>>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>>>>>> 
>>>>>> attached base packages:
>>>>>> [1] parallel  stats     graphics  grDevices utils datasets  methods
>>>>>> [8] base
>>>>>> 
>>>>>> other attached packages:
>>>>>>  [1] Rsamtools_1.13.26       Biostrings_2.29.14 DEXSeq_1.7.6
>>>>>>  [4] GenomicFeatures_1.13.21 AnnotationDbi_1.23.18 Biobase_2.21.6
>>>>>>  [7] GenomicRanges_1.13.35   XVector_0.1.0 IRanges_1.19.19
>>>>>> [10] BiocGenerics_0.7.3      BiocInstaller_1.11.4
>>>>>> 
>>>>>> loaded via a namespace (and not attached):
>>>>>>  [1] biomaRt_2.17.2     bitops_1.0-5       BSgenome_1.29.1 DBI_0.2-7
>>>>>>  [5] hwriter_1.3        RCurl_1.95-4.1     RSQLite_0.11.4
>>>>>> rtracklayer_1.21.9
>>>>>>  [9] statmod_1.4.17     stats4_3.0.1       stringr_0.6.2
>>>>>> tools_3.0.1
>>>>>> [13] XML_3.98-1.1       zlibbioc_1.7.0
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioc-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>> 
>>> 
>> 
> 



More information about the Bioc-devel mailing list