[BioC] append on DNAStringSet produces an empty DNAString as last element

Hervé Pagès hpages at fhcrc.org
Wed Aug 11 20:53:03 CEST 2010


Hi Philip,

On 08/11/2010 04:25 AM, Philip Kensche wrote:
> Dear Martin,
>
>> On 08/10/2010 03:01 AM, Philip Kensche wrote:
>>> Hi,
>>>
>>> I noticed that following:
>>>
>>>> append(DNAStringSet(), list(DNAString("aaaa"), DNAString("catc")))
>>>
>>> [[1]]
>>>    4-letter "DNAString" instance
>>> seq: AAAA
>>>
>>> [[3A2]]
>>>    4-letter "DNAString" instance
>>> seq: CATC
>>>
>>> [[3]]
>>>    A DNAStringSet instance of length 0
>>>
>>> I guess, the last element shouldn't be there -- or not?
>
>> this has to do with what base::append does when the first argument is
>> zero length,
>
>>> base::append
>> function (x, values, after = length(x))
>> {
>>      lengx<- length(x)
>>      if (!after)
>>          c(values, x)
>>      else if (after>= lengx)
>>          c(x, values)
>>      else c(x[1L:after], values, x[(after + 1L):lengx])
>> }
>> <environment: namespace:base>
>
>> which leads to some inconsistent behavior, e.g., dropping zero-length
>> atomic vectors but not other data structures
>
>>> append(numeric(), list(1))
>> [[1]]
>> [1] 1
>
>>> append(new.env(), list(1))
>> [[1]]
>> [1] 1
>
>> [[2]]
>> <environment: 0x461a508>
>
>> I'm not sure what the reason for this behavior is; I might have expected
>> list(numeric(), 1) in the first case, list(new.env(), 1) in the second.
>
> If I see that right, it is a problem of the append function from package base, i.e. of an R core package.
>
> Actually, I noticed that function base::append called on c("DNAStringSet", "list") returns a list. I would expect it to return an extended DNAStringSet.

Combining objects of mixed types will most of the time lead to
surprises. Things are much more predictable when the objects to
combine have the same type. For example, with 2 DNAStringSet objects:

   > append(DNAStringSet(), DNAStringSet(c("AA", "TGGG")))
     A DNAStringSet instance of length 2
       width seq
   [1]     2 AA
   [2]     4 TGGG

   > append(DNAStringSet(c("AA", "TGGG")), DNAStringSet())
     A DNAStringSet instance of length 2
       width seq
   [1]     2 AA
   [2]     4 TGGG

But there is no obvious/natural thing to do when combining a
DNAStringSet object with a list. I would argue that in that case
append() should raise an error but that's not how R tends to handle
things in general.

Cheers,
H.

>
> Thanks, Martin!
>
> 	Philip
>
> P.S.:
>
>> is that '[[3A2]]' in your output correct? It suggests some kind of
>> memory corruption (in R?) but I can't reproduce it.
>
> It's not because of R. It must have happened in the editor -- so nothing to worry about :-)
>
>
>> Martin
>
>>>
>>>
>>> Regards,
>>>
>>> 	Philip
>>>
>>>
>>>
>>>
>>> P.S.:
>>>
>>>
>>>> sessionInfo()
>>> R version 2.11.1 (2010-05-31)
>>> x86_64-pc-linux-gnu
>>>
>>> locale:
>>>   [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
>>>   [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
>>>   [5] LC_MONETARY=C              LC_MESSAGES=de_DE.UTF-8
>>>   [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C
>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] GenomicRanges_1.0.7 Biostrings_2.16.9   IRanges_1.6.6
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.8.0   BSgenome_1.16.2
>>>
>>>
>
>
>> --
>> Martin Morgan
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
>
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list