[R] extract fixed width fields from a string

jim holtman jholtman at gmail.com
Fri Jan 20 21:55:39 CET 2012


Here part of it.  This is the conversion of base 36 to numeric that is
case insensitive.  This makes use of mapping the alphabetics to
characters that start just after '9' and then doing the conversion.
You can extend it to base 64 using the same approach.


> base36ToInteger <- function (Str)
+ {
+     common <- chartr(
+         "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"  # input
+       , ":;<=>?@ABCDEFGHIJKLMNOPQRS:;<=>?@ABCDEFGHIJKLMNOPQRS"  #
'magic' translation
+       , Str
+       )
+     x <- as.numeric(charToRaw(common)) - 48
+     sum(x * 36 ^ rev(seq(length(x)) - 1))
+ }
> base36ToInteger('1')
[1] 1
> base36ToInteger('12')
[1] 38
> base36ToInteger('123')
[1] 1371
> base36ToInteger('1234')
[1] 49360
> base36ToInteger('12345')
[1] 1776965
> base36ToInteger('123456')
[1] 63970746
>



On Fri, Jan 20, 2012 at 3:25 PM, Sam Steingold <sds at gnu.org> wrote:
> On Fri, Jan 20, 2012 at 14:05, Sarah Goslee <sarah.goslee at gmail.com> wrote:
>> Reproducible example, please. This doesn't make a whole lot of sense
>> otherwise.
>
> here is the string:
> "1288915200|00000704000000905a00000A118"
>
> I want the following data extracted from it:
> 1. the decimal number before "|": 1288915200
> 2. the string after "|" split into 3 parts, each of length 9 bytes,
> and then split into 3 more parts:
> id: the first 6 bytes, int, base 36;
> count: the next 2 bytes, int, base 10;
> offset: the last 1 byte, int, base 64 (0-9a-zA-Z-_)
> i.e., the above line is:
> id=7, count=4, days=0
> id=9; count=5; offset=10
> id=10; count=11; offset=8
>
> thanks.
>
>> On Fri, Jan 20, 2012 at 1:52 PM, Sam Steingold <sds at gnu.org> wrote:
>>> Hi,
>>> I have a data frame with one column containing string of the form "ABC...|XYZ..."
>>> where ABC etc are fields of 6 alphanumeric characters each
>>> and XYZ etc are fields of 8 alphanumeric characters each;
>>> "|" is a mandatory separator;
>>> I do not know in advance how many fields of each kind will each row contain.
>>> I need to extract these fields from the string.
>>
>> This is already a data frame, so you don't need to import it into R,
>> just process it?
>
> yes.
>
>> I don't know. Save them as a list, most likely.
>
> can a column contain lists?
>
>>> First thing I want to do is to have a count table of them.
>>> Then I thought of adding an extra column for each field value and
>>> putting 0/1 there, e.g., frame
>>> 1,AB
>>> 2,BCD
>>
>> I thought we had integers at this point?
>
> yes, A..D are placeholders for integers
>
>>> What do people do?
>>> Can I have a columns of "sets" in data frame?
>>> Does R support the "set" data type?
>>
>> factor() seems to be what you're looking for.
>
> no, a column of factors will contain a single factor item in each row.
> e.g.:
> 1 A
> 2 B
> 3 A
> 4 C
> I want each row to contain a set of factor items:
> 1 A&B
> 2 A
> 3 C
> 4 <void>
>
>
> --
> Sam Steingold <http://sds.podval.org>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list