[R] XML to CSV

Andrew Lachance alachanc at bates.edu
Wed Jan 25 15:12:05 CET 2017


Hello all,

Thank you for the extremely helpful information. As a follow up, some of
the nested elements are of the form below:
-<DischargeMedication>
    <Medication MedAdmin="0" MedID="10"/>
    <Medication MedAdmin="0" MedID="11"/>

I've been having trouble extracting this information and was wondering if
anyone had any suggestions.

Thank you,
Andrew

On Thu, Jan 5, 2017 at 7:39 AM, Franzini, Gabriele [Nervianoms] <
Gabriele.Franzini at nervianoms.com> wrote:

> Hello Andrew,
>
> as you are "clean slate" anyway in handling XML files, you could take a
> look to XSLT processing -- also an off-topic area.
> There are free tools available around, and many examples of "XML to CSV
> XSLT" on StackOverflow.
>
> HTH,
> Gabriele
>
> -----Original Message-----
>
> On January 4, 2017 12:45:08 PM PST, Ben Tupper <btupper at bigelow.org>
> wrote:
> >Hi,
> >
> >You should keep replies on the list - you never know when someone will
> >swoop in with the right answer to make your life easier.
> >
> >Below is a simple example that uses xpath syntax to identify (and in
> >this case retrieve) children that match your xpath expression.  xpath
> >epxressions are sort of like /a/directory/structure/description so you
> >can visualize elements of XML like nested folders or subdirectories.
> >
> >Hopefully this will get you started.  A lot more on xpath here
> >http://www.w3schools.com/xml/xml_xpath.asp  There are other extraction
> >tools in xml2 - just type ?xml2 at the command prompt to see more.
> >
> >Since you have more deeply nested elements you'll need to play with
> >this a bit first.
> >
> >library(xml2)
> >uri = 'http://www.w3schools.com/xml/simple.xml'
> >x = read_xml(uri)
> >
> >name_nodes = xml_find_all(x, "//name")
> >name = xml_text(name_nodes)
> >
> >price_nodes = xml_find_all(x, "//price")
> >price = xml_text(price_nodes)
> >
> >calories_nodes = xml_find_all(x, "//calories")
> >calories = xml_double(calories_nodes)
> >
> >X = data.frame(name, price, calories, stringsAsFactors = FALSE)
> >write.csv(X, file = 'foo.csv')
> >
> >Cheers,
> >Ben
> >
> >> On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alachanc at bates.edu>
> >wrote:
> >>
> >> Hello Ben,
> >>
> >> Thank you for the advice. I am extremely new to any sort of coding so
> >I have learned a lot already. Essentially, I was given an XML file and
> >was told to convert all of it to a csv so that it could be uploaded
> >into a database. Unfortunately the information I am working with is
> >medical information and can't really share it. I initially tried to
> >convert it using online programs, however that ended up with a large
> >amount of blank spaces that wasn't useful for uploading into the
> >database.
> >>
> >> So essentially, my goal is to parse all the data in the XML to a
> >coherent, succinct CSV that could be uploaded. In the document, there
> >are 361 patient files with 13 subcategories for each patient which
> >further branches off to around 150 categories total. Since I am so new,
> >I have been having a hard time seeing the bigger picture or knowing if
> >there are any intermediary steps that will prevent all the blank spaces
> >that the online conversion programs created.
> >>
> >> I will look through the information on the xml2 package. Any advice
> >or recommendations would be greatly appreciated as I have felt fairly
> >stuck. Once again, thank you very much for your help.
> >>
> >> Best,
> >> Andrew
> >>
> >> On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btupper at bigelow.org
> ><mailto:btupper at bigelow.org>> wrote:
> >> Hi,
> >>
> >> It's hard to know what to advise - much depends upon the XML data you
> >have and what you want to extract from it. Without knowing about those
> >two things there is little anyone could do to help.  Can you post to
> >the internet a to example data and provide the link here?  Then state
> >explicitly what you want to have in hand at the end.
> >>
> >> If you are just starting out I suggest that you try xml2 package (
> >https://cran.r-project.org/web/packages/xml2/
> ><https://cran.r-project.org/web/packages/xml2/> ) rather than XML
> >package ( https://cran.r-project.org/web/packages/XML/
> ><https://cran.r-project.org/web/packages/XML/> ). I have been using it
> >much more since the authors added the ability to create xml nodes
> >(rather than just extracting data from existing xml nodes).
> >>
> >> Cheers,
> >> Ben
> >>
> >> P.S.  Hello to my niece Olivia S on the Bates EMS team.
> >>
> >>
> >> > On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alachanc at bates.edu
> ><mailto:alachanc at bates.edu>> wrote:
> >> >
> >> > up votdown votefavorite
> >> >
> ><http://stats.stackexchange.com/questions/254328/how-to-
> convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#
> ><http://stats.stackexchange.com/questions/254328/how-to-
> convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
> >> >
> >> > I am completely new to R and have tried to use several functions
> >within the
> >> > xml packages to convert an XML to a csv and have had little
> >success. Since
> >> > I am so new, I am not sure what the necessary steps are to complete
> >this
> >> > conversion without a lot of NA.
> >> >
> >> > --
> >> > Andrew D. Lachance
> >> > Chief of Service, Bates Emergency Medical Service
> >> > Residence Coordinator, Hopkins House
> >> > Bates College Class of 2017
> >> > alachanc at bates.edu <mailto:alachanc at bates.edu> <wcurley at bates.edu
> ><mailto:wcurley at bates.edu>>
> >> > (207) 620-4854
> >> >
> >> >       [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
> >To UNSUBSCRIBE and more, see
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> ><https://stat.ethz.ch/mailman/listinfo/r-help>
> >> > PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> ><http://www.r-project.org/posting-guide.html>
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >> Ben Tupper
> >> Bigelow Laboratory for Ocean Sciences
> >> 60 Bigelow Drive, P.O. Box 380
> >> East Boothbay, Maine 04544
> >> http://www.bigelow.org <http://www.bigelow.org/>
> >>
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Andrew D. Lachance
> >> Chief of Service, Bates Emergency Medical Service
> >> Residence Coordinator, Hopkins House
> >> Bates College Class of 2017
> >> alachanc at bates.edu <mailto:wcurley at bates.edu>
> >> (207) 620-4854
> >
> >Ben Tupper
> >Bigelow Laboratory for Ocean Sciences
> >60 Bigelow Drive, P.O. Box 380
> >East Boothbay, Maine 04544
> >http://www.bigelow.org
> >
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
>
>


-- 
Andrew D. Lachance
Chief of Service, Bates Emergency Medical Service
Residence Coordinator, Hopkins House
Bates College Class of 2017
alachanc at bates.edu <wcurley at bates.edu>
(207) 620-4854

	[[alternative HTML version deleted]]



More information about the R-help mailing list