[R] parsing strings between [ ] in columns

Barry Rowlingson b.rowlingson at lancaster.ac.uk
Thu Feb 18 09:41:11 CET 2010


On Thu, Feb 18, 2010 at 8:29 AM, milton ruser <milton.ruser at gmail.com> wrote:
> Dear all,
>
> I have a data.frame with a column like the x shown below
> myDF<-data.frame(cbind(x=c("[[1, 0, 0], [0, 1]]",
>   "[[1, 1, 0], [0, 1]]","[[1, 0, 0], [1, 1]]",
>   "[[0, 0, 1], [0, 1]]")))
>> myDF
>                    x
> 1 [[1, 0, 0], [0, 1]]
> 2 [[1, 1, 0], [0, 1]]
> 3 [[1, 0, 0], [1, 1]]
> 4 [[0, 0, 1], [0, 1]]
>
> As you can see my x column is composed of some
> strings between [[]], and using colon to separate
> some "fields".
>
> I need to identify the numbers of
> groups inside the main [ ] and call each
> group with different sequential string.
> On the example above I would like to have:
>
>  A         B
> 1 [1, 0, 0] [0, 1]
> 2 [1, 1, 0] [0, 1]
> 3 [1, 0, 0] [1, 1]
> 4 [0, 0, 1] [0, 1]
> Although here I have only two groups, my
> real dataset will have much more (~30).
> After identify the groups I would like
> to idenfity the subgroups:
>  A1 A2 A3  B1 B2
> 1 1  0  0   0  1
> 2 1  1  0   0  1
> 3 1  0  0   1  1
> 4 0  0  1   0  1
>
> Any hint are welcome.
>


This looks like the same syntax as JSON, so you might be able to use
the fromJSON function from the rjson package:

> x="[[1, 0, 0], [0, 1]]"
> library(rjson)
> fromJSON(x)
[[1]]
[1] 1 0 0

[[2]]
[1] 0 1

> unlist(fromJSON(x))
[1] 1 0 0 0 1

 - so just apply that over your first dataframe and collect it all up
in a new dataframe. The plyr package may help.

All your data frame columns have to have the same name, so you only
need to parse the first one to work out your naming system. In this
case you can get it from the length of the list and its elements:

> l = fromJSON(x)
> unlist(lapply(l,length))

[1] 3 2

 so you want A1 to A3 and B1 to B2. Not sure what you want when you
get to the 27th group.... You can generate this with a bit of rep and
paste functionality. Bit early in the day to get my head round that at
the moment.

 But rjson will parse and split up your grouped numbers anyway.
Probably other solutions using split and sub and gsub.

Barry

-- 
blog: http://geospaced.blogspot.com/
web: http://www.maths.lancs.ac.uk/~rowlings
web: http://www.rowlingson.com/
twitter: http://twitter.com/geospacedman
pics: http://www.flickr.com/photos/spacedman



More information about the R-help mailing list