[R] XML to CSV

Franzini, Gabriele [Nervianoms] Gabriele.Franzini at nervianoms.com
Thu Jan 5 13:39:05 CET 2017


Hello Andrew,

as you are "clean slate" anyway in handling XML files, you could take a look to XSLT processing -- also an off-topic area. 
There are free tools available around, and many examples of "XML to CSV XSLT" on StackOverflow.

HTH,
Gabriele

-----Original Message-----

On January 4, 2017 12:45:08 PM PST, Ben Tupper <btupper at bigelow.org> wrote:
>Hi,
>
>You should keep replies on the list - you never know when someone will
>swoop in with the right answer to make your life easier.
>
>Below is a simple example that uses xpath syntax to identify (and in
>this case retrieve) children that match your xpath expression.  xpath
>epxressions are sort of like /a/directory/structure/description so you
>can visualize elements of XML like nested folders or subdirectories.
>
>Hopefully this will get you started.  A lot more on xpath here
>http://www.w3schools.com/xml/xml_xpath.asp  There are other extraction
>tools in xml2 - just type ?xml2 at the command prompt to see more.
>
>Since you have more deeply nested elements you'll need to play with
>this a bit first.
>
>library(xml2)
>uri = 'http://www.w3schools.com/xml/simple.xml'
>x = read_xml(uri)
>
>name_nodes = xml_find_all(x, "//name")
>name = xml_text(name_nodes)
>
>price_nodes = xml_find_all(x, "//price")
>price = xml_text(price_nodes)
>
>calories_nodes = xml_find_all(x, "//calories")
>calories = xml_double(calories_nodes)
>
>X = data.frame(name, price, calories, stringsAsFactors = FALSE)
>write.csv(X, file = 'foo.csv')
>
>Cheers,
>Ben
>
>> On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alachanc at bates.edu>
>wrote:
>> 
>> Hello Ben,
>> 
>> Thank you for the advice. I am extremely new to any sort of coding so
>I have learned a lot already. Essentially, I was given an XML file and
>was told to convert all of it to a csv so that it could be uploaded
>into a database. Unfortunately the information I am working with is
>medical information and can't really share it. I initially tried to
>convert it using online programs, however that ended up with a large
>amount of blank spaces that wasn't useful for uploading into the
>database.
>> 
>> So essentially, my goal is to parse all the data in the XML to a
>coherent, succinct CSV that could be uploaded. In the document, there
>are 361 patient files with 13 subcategories for each patient which
>further branches off to around 150 categories total. Since I am so new,
>I have been having a hard time seeing the bigger picture or knowing if
>there are any intermediary steps that will prevent all the blank spaces
>that the online conversion programs created.
>> 
>> I will look through the information on the xml2 package. Any advice
>or recommendations would be greatly appreciated as I have felt fairly
>stuck. Once again, thank you very much for your help.
>> 
>> Best,
>> Andrew
>> 
>> On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btupper at bigelow.org
><mailto:btupper at bigelow.org>> wrote:
>> Hi,
>> 
>> It's hard to know what to advise - much depends upon the XML data you
>have and what you want to extract from it. Without knowing about those
>two things there is little anyone could do to help.  Can you post to
>the internet a to example data and provide the link here?  Then state
>explicitly what you want to have in hand at the end.
>> 
>> If you are just starting out I suggest that you try xml2 package (
>https://cran.r-project.org/web/packages/xml2/
><https://cran.r-project.org/web/packages/xml2/> ) rather than XML
>package ( https://cran.r-project.org/web/packages/XML/
><https://cran.r-project.org/web/packages/XML/> ). I have been using it
>much more since the authors added the ability to create xml nodes
>(rather than just extracting data from existing xml nodes).
>> 
>> Cheers,
>> Ben
>> 
>> P.S.  Hello to my niece Olivia S on the Bates EMS team.
>> 
>> 
>> > On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alachanc at bates.edu
><mailto:alachanc at bates.edu>> wrote:
>> >
>> > up votdown votefavorite
>> >
><http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#
><http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
>> >
>> > I am completely new to R and have tried to use several functions
>within the
>> > xml packages to convert an XML to a csv and have had little
>success. Since
>> > I am so new, I am not sure what the necessary steps are to complete
>this
>> > conversion without a lot of NA.
>> >
>> > --
>> > Andrew D. Lachance
>> > Chief of Service, Bates Emergency Medical Service
>> > Residence Coordinator, Hopkins House
>> > Bates College Class of 2017
>> > alachanc at bates.edu <mailto:alachanc at bates.edu> <wcurley at bates.edu
><mailto:wcurley at bates.edu>>
>> > (207) 620-4854
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
>To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
><https://stat.ethz.ch/mailman/listinfo/r-help>
>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
><http://www.r-project.org/posting-guide.html>
>> > and provide commented, minimal, self-contained, reproducible code.
>> 
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org <http://www.bigelow.org/>
>> 
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Andrew D. Lachance
>> Chief of Service, Bates Emergency Medical Service
>> Residence Coordinator, Hopkins House
>> Bates College Class of 2017
>> alachanc at bates.edu <mailto:wcurley at bates.edu>
>> (207) 620-4854
>
>Ben Tupper
>Bigelow Laboratory for Ocean Sciences
>60 Bigelow Drive, P.O. Box 380
>East Boothbay, Maine 04544
>http://www.bigelow.org
>
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list