[Bioc-devel] AAString - Amino acid code enforced?
hpages at fredhutch.org
Sat Apr 14 08:33:32 CEST 2018
Please see my answer in the issue you opened on GitHub:
On 04/02/2018 06:07 AM, Felix Ernst wrote:
> Dear all,
> probably this is for Hervé Pagès:
> I tried the following code, which should according to ?AAString not work, since ÜÖÄ are not part of any AA code.
> 3-letter "AAString" instance
> seq: ÜÄÖ
> R version 3.4.4 (2018-03-15)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows >= 8 x64 (build 9200)
> Matrix products: default
>  LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>  LC_TIME=German_Germany.1252
> attached base packages:
>  stats4 parallel stats graphics grDevices utils datasets methods base
> other attached packages:
>  Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
> loaded via a namespace (and not attached):
>  zlibbioc_1.24.0 compiler_3.4.4 tools_3.4.4 yaml_2.1.18
> I don’t have access right now to the devel version of Biostrings, bit I checked out the current Code in the github repo and its recent changes. I am pretty sure, that this behavior is also in the current devel branch. Can someone confirm this?
> My current interest is in using the XString classes and methods for an additional biological string representation. The initial question was, how can I restrict this to a certain character set, if the characters are not saved byte encoded? The latter option is not available to me, since characters like ‚«‘ or ‚=‘ result in a two byte code using the charToRaw function. This trips up the build of the internal lookup table, which are passed down to the C backend.
> Therefore I looked into, how this is done for an AAString differing from a BString. I discovered, that it currently doesn‘t. I also looked into the current 2.47.12 repo, which as far as I can tell does not use the AMINO_ACID_CODE constant in the creation of an AAString object.
> So my questions are:
> - What is the best practice for extending a class from XString with a restricted character set, which is not byte encoded?
> - Is there a way to use byte encoding for chars with two ore more bytes?
> Thanks in advance for any help and suggestions.
> Best regards,
> PS: regarding the second question: One could change „as.integer(charToRaw(paste(letters, collapse="")))“ to „lapply(lapply(letters,charToRaw),as.integer)“ in .letterAsByteVal, but in any case it will not be atomic anymore, which I think is required to be excepted by the C backend. I didn’t test it.
> [[alternative HTML version deleted]]
> Bioc-devel at r-project.org mailing list
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel