[R] XML to CSV
btupper at bigelow.org
Wed Jan 4 21:45:08 CET 2017
You should keep replies on the list - you never know when someone will swoop in with the right answer to make your life easier.
Below is a simple example that uses xpath syntax to identify (and in this case retrieve) children that match your xpath expression. xpath epxressions are sort of like /a/directory/structure/description so you can visualize elements of XML like nested folders or subdirectories.
Hopefully this will get you started. A lot more on xpath here http://www.w3schools.com/xml/xml_xpath.asp There are other extraction tools in xml2 - just type ?xml2 at the command prompt to see more.
Since you have more deeply nested elements you'll need to play with this a bit first.
uri = 'http://www.w3schools.com/xml/simple.xml'
x = read_xml(uri)
name_nodes = xml_find_all(x, "//name")
name = xml_text(name_nodes)
price_nodes = xml_find_all(x, "//price")
price = xml_text(price_nodes)
calories_nodes = xml_find_all(x, "//calories")
calories = xml_double(calories_nodes)
X = data.frame(name, price, calories, stringsAsFactors = FALSE)
write.csv(X, file = 'foo.csv')
> On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alachanc at bates.edu> wrote:
> Hello Ben,
> Thank you for the advice. I am extremely new to any sort of coding so I have learned a lot already. Essentially, I was given an XML file and was told to convert all of it to a csv so that it could be uploaded into a database. Unfortunately the information I am working with is medical information and can't really share it. I initially tried to convert it using online programs, however that ended up with a large amount of blank spaces that wasn't useful for uploading into the database.
> So essentially, my goal is to parse all the data in the XML to a coherent, succinct CSV that could be uploaded. In the document, there are 361 patient files with 13 subcategories for each patient which further branches off to around 150 categories total. Since I am so new, I have been having a hard time seeing the bigger picture or knowing if there are any intermediary steps that will prevent all the blank spaces that the online conversion programs created.
> I will look through the information on the xml2 package. Any advice or recommendations would be greatly appreciated as I have felt fairly stuck. Once again, thank you very much for your help.
> On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btupper at bigelow.org <mailto:btupper at bigelow.org>> wrote:
> It's hard to know what to advise - much depends upon the XML data you have and what you want to extract from it. Without knowing about those two things there is little anyone could do to help. Can you post to the internet a to example data and provide the link here? Then state explicitly what you want to have in hand at the end.
> If you are just starting out I suggest that you try xml2 package ( https://cran.r-project.org/web/packages/xml2/ <https://cran.r-project.org/web/packages/xml2/> ) rather than XML package ( https://cran.r-project.org/web/packages/XML/ <https://cran.r-project.org/web/packages/XML/> ). I have been using it much more since the authors added the ability to create xml nodes (rather than just extracting data from existing xml nodes).
> P.S. Hello to my niece Olivia S on the Bates EMS team.
> > On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alachanc at bates.edu <mailto:alachanc at bates.edu>> wrote:
> > up votdown votefavorite
> > <http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1# <http://stats.stackexchange.com/questions/254328/how-to-convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
> > I am completely new to R and have tried to use several functions within the
> > xml packages to convert an XML to a csv and have had little success. Since
> > I am so new, I am not sure what the necessary steps are to complete this
> > conversion without a lot of NA.
> > --
> > Andrew D. Lachance
> > Chief of Service, Bates Emergency Medical Service
> > Residence Coordinator, Hopkins House
> > Bates College Class of 2017
> > alachanc at bates.edu <mailto:alachanc at bates.edu> <wcurley at bates.edu <mailto:wcurley at bates.edu>>
> > (207) 620-4854
> > [[alternative HTML version deleted]]
> > ______________________________________________
> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
> > and provide commented, minimal, self-contained, reproducible code.
> Ben Tupper
> Bigelow Laboratory for Ocean Sciences
> 60 Bigelow Drive, P.O. Box 380
> East Boothbay, Maine 04544
> http://www.bigelow.org <http://www.bigelow.org/>
> Andrew D. Lachance
> Chief of Service, Bates Emergency Medical Service
> Residence Coordinator, Hopkins House
> Bates College Class of 2017
> alachanc at bates.edu <mailto:wcurley at bates.edu>
> (207) 620-4854
Bigelow Laboratory for Ocean Sciences
60 Bigelow Drive, P.O. Box 380
East Boothbay, Maine 04544
[[alternative HTML version deleted]]
More information about the R-help