[R] saveXML() prefix argument

Duncan Temple Lang dtemplelang at ucdavis.edu
Sun Oct 20 17:49:07 CEST 2013


Thanks Earl and Milan.
Yes, the C code to serialize does branch and do things
differently for the different combinations of file, encoding and indent.
I have updated the code to use a different routine in libxml2 for this case
and that honors the indentation in this case. That will be in the next release
of XML.

In the meantime, you can use

   cat( saveXML( doc, encoding = "UTF-8", indent = TRUE),  file = "bob.xml")

rather than 
    saveXML(doc, file = "bob.xml", encoding = "UTF-8", indent = TRUE)
i.e. move the file argument to cat().

 Thanks,
     D.

On 10/19/13 4:36 AM, Milan Bouchet-Valat wrote:
> Le vendredi 18 octobre 2013 à 13:27 -0400, Earl Brown a écrit :
>> Thanks Duncan. However, now I can't get the Spanish and Portuguese accented vowels to come out correctly and still keep the indents in the saved document, even when I set encoding = "UTF-8":
>>
>> library("XML")
>> concepts <- c("español", "português")
>> info <- c("info about español", "info about português")
>>
>> doc <- newXMLDoc()
>> root <- newXMLNode("tips", doc = doc)
>> for (i in 1:length(concepts)) {
>> 	cur.concept <- concepts[i]
>> 	cur.info <- info[i]
>> 	cur.tip <- newXMLNode("tip", attrs = c(id = i), parent = root)
>> 	newXMLNode("h1", cur.concept, parent = cur.tip)
>> 	newXMLNode("p", cur.info, parent = cur.tip)
>> }
>>
>> # accented vowels don't come through correctly, but the indents are correct:
>> saveXML(doc, file = "test1.xml", indent = T)
>>
>> Resulting file looks like this:
>> <?xml version="1.0"?>
>> <tips>
>>   <tip id="1">
>>     <h1>espa&#xF1;ol</h1>
>>     <p>info about espa&#xF1;ol</p>
>>   </tip>
>>   <tip id="2">
>>     <h1>portugu&#xEA;s</h1>
>>     <p>info about portugu&#xEA;s</p>
>>   </tip>
>> </tips>
>>
>> # accented vowels are correct, but the indents are no longer correct:
>> saveXML(doc, file = "test2.xml", indent = T, encoding = "UTF-8")
>>
>> Resulting file:
>> <?xml version="1.0" encoding="UTF-8"?>
>> <tips><tip id="1"><h1>español</h1><p>info about español</p></tip><tip
>> id="2"><h1>português</h1><p>info about português</p></tip></tips>
>>
>> I tried to workaround the problem by simply loading in that resulting
>> file and saving it again:
>> doc2 <- xmlInternalTreeParse(file = "test2.xml", asTree = T)
>> saveXML(doc2, file = "test_word_around.xml", indent = T)
>>
>> but still don't get the indents.
>>
>> Does setting encoding = "UTF-8" override indents = TRUE in saveXML()?
> I can confirm the same issue happens here. What is interesting is that
> without the 'file' argument, the returned string includes the expected
> line breaks and spacing. These do not appear when redirecting the output
> to a file.
> 
>> saveXML(doc, encoding="UTF-8", indent=T)
> [1] "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<tips>\n  <tip id=\"1
> \">\n    <h1>español</h1>\n    <p>info about español</p>\n  </tip>\n
> <tip id=\"2\">\n    <h1>português</h1>\n    <p>info about
> português</p>\n  </tip>\n</tips>\n"
> 
>> saveXML(doc, encoding="UTF-8", indent=T, file="test.xml")
> 
> Contents of test.xml:
> <?xml version="1.0" encoding="UTF-8"?>
> <tips><tip id="1"><h1>español</h1><p>info about español</p></tip><tip id="2"><h1>português</h1><p>info about português</p></tip></tips>
> 
> 
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-redhat-linux-gnu (64-bit)
> 
> locale:
>  [1] LC_CTYPE=fr_FR.utf8       LC_NUMERIC=C             
>  [3] LC_TIME=fr_FR.utf8        LC_COLLATE=fr_FR.utf8    
>  [5] LC_MONETARY=fr_FR.utf8    LC_MESSAGES=fr_FR.utf8   
>  [7] LC_PAPER=C                LC_NAME=C                
>  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
> [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C      
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base     
> 
> other attached packages:
> [1] XML_3.96-1.1
> 
> 
> Regards
> 
>



More information about the R-help mailing list