[Bioc-devel] AAString - Amino acid code enforced?
Hervé Pagès
hpages at fredhutch.org
Sat Apr 14 08:33:32 CEST 2018
Hi Felix,
Please see my answer in the issue you opened on GitHub:
https://github.com/Bioconductor/Biostrings/issues/10
Cheers,
H.
On 04/02/2018 06:07 AM, Felix Ernst wrote:
> Dear all,
>
> probably this is for Hervé Pagès:
>
> I tried the following code, which should according to ?AAString not work, since ÜÖÄ are not part of any AA code.
>
>> AAString("ÜÄÖ")
> 3-letter "AAString" instance
> seq: ÜÄÖ
>> sessionInfo()
> R version 3.4.4 (2018-03-15)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows >= 8 x64 (build 9200)
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
> [5] LC_TIME=German_Germany.1252
>
> attached base packages:
> [1] stats4 parallel stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Biostrings_2.46.0 XVector_0.18.0 IRanges_2.12.0 S4Vectors_0.16.0 BiocGenerics_0.24.0
>
> loaded via a namespace (and not attached):
> [1] zlibbioc_1.24.0 compiler_3.4.4 tools_3.4.4 yaml_2.1.18
>
>
> I don’t have access right now to the devel version of Biostrings, bit I checked out the current Code in the github repo and its recent changes. I am pretty sure, that this behavior is also in the current devel branch. Can someone confirm this?
>
> My current interest is in using the XString classes and methods for an additional biological string representation. The initial question was, how can I restrict this to a certain character set, if the characters are not saved byte encoded? The latter option is not available to me, since characters like ‚«‘ or ‚=‘ result in a two byte code using the charToRaw function. This trips up the build of the internal lookup table, which are passed down to the C backend.
>
> Therefore I looked into, how this is done for an AAString differing from a BString. I discovered, that it currently doesn‘t. I also looked into the current 2.47.12 repo, which as far as I can tell does not use the AMINO_ACID_CODE constant in the creation of an AAString object.
>
> So my questions are:
> - What is the best practice for extending a class from XString with a restricted character set, which is not byte encoded?
> - Is there a way to use byte encoding for chars with two ore more bytes?
>
> Thanks in advance for any help and suggestions.
>
> Best regards,
> Felix
>
> PS: regarding the second question: One could change „as.integer(charToRaw(paste(letters, collapse="")))“ to „lapply(lapply(letters,charToRaw),as.integer)“ in .letterAsByteVal, but in any case it will not be atomic anymore, which I think is required to be excepted by the C backend. I didn’t test it.
>
>
>
>
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_bioc-2Ddevel&d=DwIFaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=a9PVG834eyUM7vwSuw8Mtewx26gvgv4ZMOP3baqoUgI&s=49MtB5WcyN15mmFUV0rBOT2lMkEL51mvwbk01sYYhUU&e=
>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioc-devel
mailing list