[Bioc-devel] AAString - Amino acid code enforced?

Hervé Pagès hpages at fredhutch.org
Sat Apr 14 08:33:32 CEST 2018


Hi Felix,

Please see my answer in the issue you opened on GitHub:

   https://github.com/Bioconductor/Biostrings/issues/10

Cheers,
H.


On 04/02/2018 06:07 AM, Felix Ernst wrote:
> Dear all,
> 
> probably this is for Hervé Pagès:
> 
> I tried the following code, which should according to ?AAString not work, since ÜÖÄ are not part of any AA code.
> 
>> AAString("ÜÄÖ")
>    3-letter "AAString" instance
> seq: ÜÄÖ
>> sessionInfo()
> R version 3.4.4 (2018-03-15)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows >= 8 x64 (build 9200)
> 
> Matrix products: default
> 
> locale:
> [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
> 
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] Biostrings_2.46.0   XVector_0.18.0      IRanges_2.12.0      S4Vectors_0.16.0    BiocGenerics_0.24.0
> 
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.24.0 compiler_3.4.4  tools_3.4.4     yaml_2.1.18
> 
> 
> I don’t have access right now to the devel version of Biostrings, bit I checked out the current Code in the github repo and its recent changes. I am pretty sure, that this behavior is also in the current devel branch. Can someone confirm this?
> 
> My current interest is in using the XString classes and methods for an additional biological string representation. The initial question was, how can I restrict this to a certain character set, if the characters are not saved byte encoded? The latter option is not available to me, since characters like ‚«‘ or ‚=‘ result in a two byte code using the charToRaw function. This trips up the build of the internal lookup table, which are passed down to the C backend.
> 
> Therefore I looked into, how this is done for an AAString differing from a BString. I discovered, that it currently doesn‘t. I also looked into the current 2.47.12 repo, which as far as I can tell does not use the AMINO_ACID_CODE constant in the creation of an AAString object.
> 
> So my questions are:
> - What is the best practice for extending a class from XString with a restricted character set, which is not byte encoded?
> - Is there a way to use byte encoding for chars with two ore more bytes?
> 
>   Thanks in advance for any help and suggestions.
> 
> Best regards,
> Felix
> 
> PS: regarding the second question: One could change „as.integer(charToRaw(paste(letters, collapse="")))“ to „lapply(lapply(letters,charToRaw),as.integer)“ in .letterAsByteVal, but in any case it will not be atomic anymore, which I think is required to be excepted by the C backend. I didn’t test it.
> 
> 
> 
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=a9PVG834eyUM7vwSuw8Mtewx26gvgv4ZMOP3baqoUgI&s=49MtB5WcyN15mmFUV0rBOT2lMkEL51mvwbk01sYYhUU&e=
> 

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list