[R] Parsing Complex Text in Single Cell

arun smartpink111 at yahoo.com
Thu Jan 30 18:19:03 CET 2014


Another way would be:
library(qdap)
library(stringr)
x <- scan(what="character",....)

 x1 <- c(x,x) 

x1 <- paste(x1,collapse=" ")
 x2 <- gsub('"',"",bracketXtract(x1,"curly"))
 res2 <- as.data.frame(str_trim(do.call(rbind,genXtract(paste0(x2,","),":",","))),stringsAsFactors=FALSE)
res2[,1:3] <- lapply(res2[,1:3],as.numeric)
colnames(res2) <- str_trim(genXtract(paste0(",",x2),",",":")[[1]])
row.names(res2) <- 1:nrow(res2)

head(res2,3)
#  trial corr resp_dur                                     stim        cond
#1     1    1      799 â†*Â*â†*Â*â†*Â*â†*Â*â†*Â*   congruent
#2     2    1        0                             xx→xx        nogo
#3     3    0       NA â†*Â*â†*Â*→â†*Â*â†*Â* incongruent

A.K.




On Thursday, January 30, 2014 6:37 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,

Maybe something like the following.


x <- scan(what = "character", text = '
{"trial":1,"corr":1,"resp_dur":799,"stim":"â†*Â*â†*Â*â†
*Â*â†*Â*â†*Â*","cond
":"congruent"},{"trial":2,"corr":1,"resp_dur":0,"stim":"xxâ†
’xx","cond":"
nogo"},{"trial":3,"corr":0,"resp_dur":null,"stim":"â†*Â*â†
*Â*→â†*Â*â†*Â*
","cond":"incongruent"},{"trial":4,"corr":1,"resp_dur":528,"stim":"
→→→→â†
’","cond":"congruent"},{"trial":5,"corr":0,"resp_dur":null,"
stim":"â–¡â–¡â†
’â–¡â–¡","cond":"neutral"},{"trial":6,"corr":0,"resp_dur
":574,"stim":"→→â†*Â*→â†
’","cond":"incongruent"},{"trial":7,"corr":1,"
resp_dur":541,"stim":"â–¡â–¡â†
’â–¡â–¡","cond":"neutral"},{"trial":8,"corr
":1,"resp_dur":500,"stim":"â–¡â–¡â†
*Â*â–¡â–¡","cond":"neutral"},{"trial":9,"
corr":1,"resp_dur":0,"stim":"xxâ†
’xx","cond":"nogo"},{"trial":10,"corr":0,"
resp_dur":637,"stim":"â†*Â*â†*Â*→â†*Â*â†
*Â*","cond":"incongruent"}]')


x <- paste(x, collapse = ' ')
x <- gsub('"', '', x)
x <- gsub('\\]', '', x)

y <- unlist(strsplit(x, "\\{"))
y <- sub("\\}", "", y)
y <- y[y != ""]
y <- strsplit(y, ",")

fun <- function(x){
    y <- strsplit(x, ":")
    z <- lapply(y, '[[', 2)
    z[1:3] <- lapply(z[1:3], as.numeric)
    z <- as.data.frame(t(unlist(z)))
    z
}

res <- do.call(rbind, lapply(y, fun))
names(res) <- lapply(strsplit(y[[1]], ":"), '[[', 1)
res


Note that the two warnings are ok, they are due to the two values 'null' 
in your data, that are coerced to NA.

Hope this helps,

Rui Barradas

Em 29-01-2014 22:14, Patzelt, Edward escreveu:
> R Experts -
>
> We have a complex problem whereby Qualtrics exported our data into a single
> cell as seen below.
>
> We attempted to parse it using scan() without much success. Hoping to get a
> little nudge here. I've posted the full data set here:
> https://www.dropbox.com/s/e246uiui6jrux6c/CoopandSelfControl_N90_1.24.14_GNGData.csv
>
> {"trial":1,"corr":1,"resp_dur":799,"stim":"â†*Â*â†*Â*â†*Â*â†*Â*â†*Â*","cond
> ":"congruent"},{"trial":2,"corr":1,"resp_dur":0,"stim":"xx→xx","cond":"
> nogo"},{"trial":3,"corr":0,"resp_dur":null,"stim":"â†*Â*â†*Â*→â†*Â*â†*Â*
> ","cond":"incongruent"},{"trial":4,"corr":1,"resp_dur":528,"stim":"
> →→→→→","cond":"congruent"},{"trial":5,"corr":0,"resp_dur":null,"
> stim":"□□→□□","cond":"neutral"},{"trial":6,"corr":0,"resp_dur
> ":574,"stim":"→→â†*Â*→→","cond":"incongruent"},{"trial":7,"corr":1,"
> resp_dur":541,"stim":"□□→□□","cond":"neutral"},{"trial":8,"corr
> ":1,"resp_dur":500,"stim":"â–¡â–¡â†*Â*â–¡â–¡","cond":"neutral"},{"trial":9,"
> corr":1,"resp_dur":0,"stim":"xx→xx","cond":"nogo"},{"trial":10,"corr":0,"
> resp_dur":637,"stim":"â†*Â*â†*Â*→â†*Â*â†*Â*","cond":"incongruent"}]

>
>
> Cheers,
>
>
> Edward
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





More information about the R-help mailing list