[R] XML to CSV
Franzini, Gabriele [Nervianoms]
Gabriele.Franzini at nervianoms.com
Wed Jan 25 15:43:00 CET 2017
They are attributes, not nodes so, if I understood the question:
"//DischargeMedication/Medication/@MedAdmin"
"//DischargeMedication/Medication/@MedID"
should do.
HTH,
Gabriele
From: Andrew Lachance [mailto:alachanc at bates.edu]
Sent: Wednesday, January 25, 2017 3:12 PM
To: Franzini, Gabriele [Nervianoms]
Cc: r-help at r-project.org
Subject: Re: [R] XML to CSV
Hello all,
Thank you for the extremely helpful information. As a follow up, some of the nested elements are of the form below:
-<DischargeMedication>
<Medication MedAdmin="0" MedID="10"/>
<Medication MedAdmin="0" MedID="11"/>
I've been having trouble extracting this information and was wondering if anyone had any suggestions.
Thank you,
Andrew
On Thu, Jan 5, 2017 at 7:39 AM, Franzini, Gabriele [Nervianoms] <Gabriele.Franzini at nervianoms.com> wrote:
Hello Andrew,
as you are "clean slate" anyway in handling XML files, you could take a look to XSLT processing -- also an off-topic area.
There are free tools available around, and many examples of "XML to CSV XSLT" on StackOverflow.
HTH,
Gabriele
-----Original Message-----
On January 4, 2017 12:45:08 PM PST, Ben Tupper <btupper at bigelow.org> wrote:
>Hi,
>
>You should keep replies on the list - you never know when someone will
>swoop in with the right answer to make your life easier.
>
>Below is a simple example that uses xpath syntax to identify (and in
>this case retrieve) children that match your xpath expression. xpath
>epxressions are sort of like /a/directory/structure/description so you
>can visualize elements of XML like nested folders or subdirectories.
>
>Hopefully this will get you started. A lot more on xpath here
>http://www.w3schools.com/xml/xml_xpath.asp There are other extraction
>tools in xml2 - just type ?xml2 at the command prompt to see more.
>
>Since you have more deeply nested elements you'll need to play with
>this a bit first.
>
>library(xml2)
>uri = 'http://www.w3schools.com/xml/simple.xml'
>x = read_xml(uri)
>
>name_nodes = xml_find_all(x, "//name")
>name = xml_text(name_nodes)
>
>price_nodes = xml_find_all(x, "//price")
>price = xml_text(price_nodes)
>
>calories_nodes = xml_find_all(x, "//calories")
>calories = xml_double(calories_nodes)
>
>X = data.frame(name, price, calories, stringsAsFactors = FALSE)
>write.csv(X, file = 'foo.csv')
>
>Cheers,
>Ben
>
>> On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alachanc at bates.edu>
>wrote:
>>
>> Hello Ben,
>>
>> Thank you for the advice. I am extremely new to any sort of coding so
>I have learned a lot already. Essentially, I was given an XML file and
>was told to convert all of it to a csv so that it could be uploaded
>into a database. Unfortunately the information I am working with is
>medical information and can't really share it. I initially tried to
>convert it using online programs, however that ended up with a large
>amount of blank spaces that wasn't useful for uploading into the
>database.
>>
>> So essentially, my goal is to parse all the data in the XML to a
>coherent, succinct CSV that could be uploaded. In the document, there
>are 361 patient files with 13 subcategories for each patient which
>further branches off to around 150 categories total. Since I am so new,
>I have been having a hard time seeing the bigger picture or knowing if
>there are any intermediary steps that will prevent all the blank spaces
>that the online conversion programs created.
>>
>> I will look through the information on the xml2 package. Any advice
>or recommendations would be greatly appreciated as I have felt fairly
>stuck. Once again, thank you very much for your help.
>>
>> Best,
>> Andrew
>>
>> On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btupper at bigelow.org
><mailto:btupper at bigelow.org>> wrote:
>> Hi,
>>
>> It's hard to know what to advise - much depends upon the XML data you
>have and what you want to extract from it. Without knowing about those
>two things there is little anyone could do to help. Can you post to
>the internet a to example data and provide the link here? Then state
>explicitly what you want to have in hand at the end.
>>
>> If you are just starting out I suggest that you try xml2 package (
>https://cran.r-project.org/web/packages/xml2/
><https://cran.r-project.org/web/packages/xml2/> ) rather than XML
>package ( https://cran.r-project.org/web/packages/XML/
><https://cran.r-project.org/web/packages/XML/> ). I have been using it
>much more since the authors added the ability to create xml nodes
>(rather than just extracting data from existing xml nodes).
>>
>> Cheers,
>> Ben
>>
>> P.S. Hello to my niece Olivia S on the Bates EMS team.
>>
>>
>> > On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alachanc at bates.edu
><mailto:alachanc at bates.edu>> wrote:
>> >
>> > up votdown votefavorite
>> >
><http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#
><http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
>> >
>> > I am completely new to R and have tried to use several functions
>within the
>> > xml packages to convert an XML to a csv and have had little
>success. Since
>> > I am so new, I am not sure what the necessary steps are to complete
>this
>> > conversion without a lot of NA.
>> >
>> > --
>> > Andrew D. Lachance
>> > Chief of Service, Bates Emergency Medical Service
>> > Residence Coordinator, Hopkins House
>> > Bates College Class of 2017
>> > alachanc at bates.edu <mailto:alachanc at bates.edu> <wcurley at bates.edu
><mailto:wcurley at bates.edu>>
>> > (207) 620-4854
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
><https://stat.ethz.ch/mailman/listinfo/r-help>
>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
><http://www.r-project.org/posting-guide.html>
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org <http://www.bigelow.org/>
>>
>>
>>
>>
>>
>>
>> --
>> Andrew D. Lachance
>> Chief of Service, Bates Emergency Medical Service
>> Residence Coordinator, Hopkins House
>> Bates College Class of 2017
>> alachanc at bates.edu <mailto:wcurley at bates.edu>
>> (207) 620-4854
>
>Ben Tupper
>Bigelow Laboratory for Ocean Sciences
>60 Bigelow Drive, P.O. Box 380
>East Boothbay, Maine 04544
>http://www.bigelow.org
>
>
>
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
Andrew D. Lachance
Chief of Service, Bates Emergency Medical Service
Residence Coordinator, Hopkins House
Bates College Class of 2017
alachanc at bates.edu
(207) 620-4854
More information about the R-help
mailing list