[Bioc-sig-seq] complement and reverseComplement from a character vector

Hervé Pagès hpages at fhcrc.org
Tue Jun 2 20:36:14 CEST 2009


Hi Nicolas,

Nicolas Delhomme wrote:
> Hi all,
> 
> As described in the doc for reverse, complement and reverseComplement, x 
> can be a character vector.
> 
> When I tried, it turned out that these functions are not implemented for 
> complement and reverseComplement.
> 
> There are of some use for me, so I just wrote them up:
> 
> setMethod("complement", "character",
>     function(x, ...){
>             if(length(x)==1) x<-DNAString(x)
>         else x<-DNAStringSet(x)
>         complement(x)
>     }
> )
> 
> setMethod("reverseComplement", "character",
>     function(x, ...){
>             if(length(x)==1) x<-DNAString(x)
>             else x<- DNAStringSet(x)
>             reverseComplement(x)
>     }
> )
> 
> I just post them in case there would be of use for someone else. I 
> recognize that it does not save much compared to first converting the 
> character vector into a DNAString or DNAStringSet. At least, for me, it 
> allows to skip some "input" evaluation test checking whether I start 
> with a character vector or a DNAString.

Thanks for the feedback!

The reason these method were not defined is that when the input is character
it's not clear whether it should be treated as DNA or RNA input. However
choosing to treat it as DNA is probably what the user will want 99% of the
time so they could indeed be implemented as in your code above. So if the
input contains the "U" letter, they will simply fail (without trying to
be smart).

Note that there is no method for BString/BStringSet objects for exactly the
same reason.

A question subsists though: should these methods return a DNAString/DNAStringSet
object or should the result be turned back into an ordinary character vector
before it's returned to the user? The latter would make these methods
"endomorphisms" (i.e. the output has the same type as the input) which is
more consistent with what the other methods do but I'm not against making
an exception when the input is character (not a big deal as long as this is
clearly documented). Then if the user really want this result to be character
then s/he can always apply as.character() to it.

If there are no objections, I will add these methods to the Biostrings package.

Cheers,
H.

> 
> Best,
> 
> ---------------------------------------------------------------
> Nicolas Delhomme
> 
> High Throughput Functional Genomics Center
> 
> European Molecular Biology Laboratory
> 
> Tel: +49 6221 387 8426
> Email: nicolas.delhomme at embl.de
> Meyerhofstrasse 1 - Postfach 10.2209
> 69102 Heidelberg, Germany
> 
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-sig-sequencing mailing list