[R] Using gregexpr and regmatches but getting Iconv error
Adel
adel.daoud at socav.gu.se
Thu Dec 11 16:40:30 CET 2014
Hi
I have stumbled upon a problem when using gregexpr and regmatches, with the
following error-message:
Error in iconv(x, "latin1", "ASCII") :
'x' must be a list of NULL or raw vectors
The data:
(1)
I have two journal articles and after some regex manipulation I am at the
following situation:
# manipluat only two full text articles
author.test <- articles1[1:2]
# extract author informaiton
r <- gregexpr("(\"authors\":(.*?)\"(.*?)\")|(\"authors\": \\[(.*?)\\],)",
author.test)
authors.raw <- regmatches(author.test, r)
authors.raw
[[1]]
[1] "\"authors\": [\"Allan G. KING\", \"B. Lindsay LOWELL\", \"Frank D.
BEAN\"],"
[[2]]
[1] "\"authors\": \"Chris Baldry\", \""
(2)
Now, if I want to conduct additional regex manipulation I get the Error
stated above.
r <- gregexpr("([^(\"authors\":)])(.*?)(\"(.*?)\")", authors.raw)
authors.raw <- regmatches(authors.raw, r)
Error in iconv(x, "latin1", "ASCII") :
'x' must be a list of NULL or raw vectors
(3)
One of the ways to avoid this is to unlist(authors.raw) - see below - but
the problem with this is that I lose some information which was contained in
the list. The first element contains three character elements and which are
the authors of the first paper. I want to keep them in that list format.
> authors.raw <- unlist(regmatches(authors.raw, r))
> authors.raw
[1] " [\"Allan G. KING\"" ", \"B. Lindsay LOWELL\"" ", \"Frank D.
BEAN\"" " \"Chris Baldry\""
(4)
So what I want to do is to avoid unlis() and apply the gregex() multiple
times in a row. Any ideas?
Thanks in advance
Adel
--
View this message in context: http://r.789695.n4.nabble.com/Using-gregexpr-and-regmatches-but-getting-Iconv-error-tp4700677.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list