[R] extract fixed width fields from a string

Bert Gunter gunter.berton at gene.com
Fri Jan 20 20:06:31 CET 2012


Sam:

On Fri, Jan 20, 2012 at 10:52 AM, Sam Steingold <sds at gnu.org> wrote:
> Hi,
> I have a data frame with one column containing string of the form "ABC...|XYZ..."
> where ABC etc are fields of 6 alphanumeric characters each
> and XYZ etc are fields of 8 alphanumeric characters each;
> "|" is a mandatory separator;
> I do not know in advance how many fields of each kind will each row contain.
> I need to extract these fields from the string.
>
> === How do I do that?
>
> first I need to split the string in 2 on '|' - how?
?strsplit
strsplit(thecolumn, "|",fixed=TRUE)

> then I need to split the two strings by 6/8 characters -- how?
This makes no sense to me. strsplit takes care of this.

> then I need to convert each 6/8 character string into an integer base 36
> or 64 (depending on the field) - how?
No clue. Depends on the encoding AFAICS.

-- Bert

>
> === What do I do with them once I extract them?
>
> First thing I want to do is to have a count table of them.
> Then I thought of adding an extra column for each field value and
> putting 0/1 there, e.g., frame
> 1,AB
> 2,BCD
> will turn into
> 1,1,1,0,0
> 2,0,1,1,1
> however this would work only if the number of different field values is
> manageable.
> What do people do?
> Can I have a columns of "sets" in data frame?
> Does R support the "set" data type?
>
> Thanks!
>
> PS. thanks to Sarah Goslee who answered my previous question in so much detail!
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://camera.org http://openvotingconsortium.org http://iris.org.il
> http://mideasttruth.com http://memri.org http://honestreporting.com
> Don't take life too seriously, you'll never get out of it alive!
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list