[R] How to remove square brackets, etc. from address strings?
Sabina Arndt
sabina.arndt at hotmail.de
Fri May 25 22:31:22 CEST 2012
Hello r-help members,
the solutions which Sarah Goslee and arun sent to me in such a prompt
and helpful manner work well with the examples I cut from the data.frame
I'm analyzing. Thank you very much for that!
I incorporated them into my R-script and discovered that it still
doesn't work properly, unfortunately. I have no idea why that's the case.
You see, I want to extract country names from the contents of
tab-delimited text files. This is an example of the data I'm using:
http://pastebin.com/mYZNDXg6
This is the script I'm using to import the data:
http://pastebin.com/Z10UUH3z (It requires the text files to be in a
folder which doesn't contain any other .txt files.)
This is the script I'm using to extract the country names:
http://pastebin.com/G37fuPba
This is the string that's in the relevant field of the first record I'm
working on:
[Engel, Kathrin M. Y.; Schroeck, Kristin; Schoeneberg, Torsten; Schulz,
Angela] Univ Leipzig, Fac Med, Inst Biochem, Leipzig, Germany; [Teupser,
Daniel; Holdt, Lesca Miriam; Thiery, Joachim] Univ Leipzig, Fac Med,
Inst Lab Med Clin Chem & Mol Diagnost, Leipzig, Germany; [Toenjes, Anke;
Kern, Matthias; Blueher, Matthias; Stumvoll, Michael] Univ Leipzig, Fac
Med, Dept Internal Med, Leipzig, Germany; [Dietrich, Kerstin; Kovacs,
Peter] Univ Leipzig, Fac Med, Interdisciplinary Ctr Clin Res, Leipzig,
Germany; [Kruegel, Ute] Univ Leipzig, Fac Med, Rudolf Boehm Inst
Pharmacol & Toxicol, Leipzig, Germany; [Scheidt, Holger A.; Schiller,
Juergen; Huster, Daniel] Univ Leipzig, Fac Med, Inst Med Phys & Biophys,
Leipzig, Germany; [Brockmann, Gudrun A.] Humboldt Univ, Inst Anim Sci,
D-10099 Berlin, Germany; [Augustin, Martin] Ingenium Pharmaceut AG,
Martinsried, Germany
This is the incorrect result my extraction script gives me for the first
record:
> C1s[1]
[1] "[ENGEL, KATHRIN M. Y." "KRISTIN" "TORSTEN"
[4] "GERMANY" "DANIEL" "LESCA MIRIAM"
[7] "GERMANY" "ANKE" "MATTHIAS"
[10] "MATTHIAS" "GERMANY" "KERSTIN"
[13] "GERMANY" "GERMANY" "[SCHEIDT,
HOLGER A."
[16] "JUERGEN" "GERMANY" "HUMBOLDT"
[19] "GERMANY"
For some reason the first and sixth pair of the eight square brackets
are not removed ... Do you understand why?
Instead I'd like to get this result, though:
> C1s[1]
[1] "GERMANY" "GERMANY" "GERMANY"
[4] "GERMANY" "GERMANY" "GERMANY"
[7] "HUMBOLDT" "GERMANY"
What am I doing wrong? What are the errors in my R-script?
Would anybody be so kind as to take a look and help me out, please?
Thank you very much in advance!
Faithfully yours,
Sabina Arndt
More information about the R-help
mailing list