[R] extract fixed width fields from a string
Sam Steingold
sds at gnu.org
Fri Jan 20 19:52:36 CET 2012
Hi,
I have a data frame with one column containing string of the form "ABC...|XYZ..."
where ABC etc are fields of 6 alphanumeric characters each
and XYZ etc are fields of 8 alphanumeric characters each;
"|" is a mandatory separator;
I do not know in advance how many fields of each kind will each row contain.
I need to extract these fields from the string.
=== How do I do that?
first I need to split the string in 2 on '|' - how?
then I need to split the two strings by 6/8 characters -- how?
then I need to convert each 6/8 character string into an integer base 36
or 64 (depending on the field) - how?
=== What do I do with them once I extract them?
First thing I want to do is to have a count table of them.
Then I thought of adding an extra column for each field value and
putting 0/1 there, e.g., frame
1,AB
2,BCD
will turn into
1,1,1,0,0
2,0,1,1,1
however this would work only if the number of different field values is
manageable.
What do people do?
Can I have a columns of "sets" in data frame?
Does R support the "set" data type?
Thanks!
PS. thanks to Sarah Goslee who answered my previous question in so much detail!
--
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://camera.org http://openvotingconsortium.org http://iris.org.il
http://mideasttruth.com http://memri.org http://honestreporting.com
Don't take life too seriously, you'll never get out of it alive!
More information about the R-help
mailing list