[R] question about XML (package)
Duncan Temple Lang
duncan at research.bell-labs.com
Tue Mar 4 14:58:03 CET 2003
Apologies for the late reply; I was travelling and didn't see the message
until Ott brought it to my attention today.
Indeed, Stephen's diagnosis and workaround is correct: excessive
trimming. I have just put a new version of the package (XML_0.93-2)
on the Omegahat web site
http://www.omegahat.org/RSXML
So with inputs
<?xml version="1.0"?>
<fields>
<v1>a1 </v1>
<v1>1 </v1>
<v1>a b</v1>
<v1>a b c</v1>
<v1> a b c </v1>
<v2> 2 </v2>
<v3> 3</v3>
<v3> 3 </v3>
</fields>
we get
> v = xmlRoot(xmlTreeParse("oot.xml"))
> xmlSApply(v, xmlValue)
v1 v1 v1 v1 v1 v2 v3 v3
"a1" "1" "a b" "a b c" "a b c" "2" "3" "3"
Thanks for bringing it to my attention.
D.
Stephen C. Upton wrote:
> Ott,
>
> I get the same thing on windows version. If you set "trim=FALSE" in the
> xmlTreeParse function call, it works. I suspect xmlTreeParse is trimming
> a little too much! But xmlTreeParse(with trim=TRUE) also works when the
> first character is a non-digit - see below. We'll probably need to look
> at the source code, unless someone else has better insight.
>
> > a <- xmlTreeParse("test.xml",trim=FALSE)
> > a$doc
> $file
> [1] "test.xml"
>
> $version
> [1] "1.0"
>
> $children
> $children$fields
> <fields>
>
>
> <v1>
> 1
> </v1>
>
>
> <v2>
> 2
> </v2>
>
>
> <v3>
> 3
> </v3>
>
>
> </fields>
>
> However, it also works when the first character is a non-digit - so far.
> Here's a revised test.xml file:
> <?xml version="1.0"?>
> <fields>
> <v1>a1 </v1>
> <v2>2 </v2>
> <v3> 3</v3>
> </fields>
>
> > a <- xmlTreeParse("test.xml")
> > a
> $doc
> $file
> [1] "test.xml"
>
> $version
> [1] "1.0"
>
> $children
> $children$fields
> <fields>
> <v1>
> a1
> </v1>
> <v2>
> </v2>
> <v3>
> 3
> </v3>
> </fields>
>
> HTH
> steve
>
>
> -------------------------------
> > version
> _
> platform i386-pc-mingw32
> arch i386
> os mingw32
> system i386, mingw32
> status
> major 1
> minor 6.2
> year 2003
> month 01
> day 10
> language R -
>
> Ott Toomet wrote:
>
> > Hi,
> >
> > I have a problem with spacing in XML files when reading them with
> > xmlTreeParse. I don't know the exact specification of xml but
> > according what I have red before it should work.
> >
> > consider a tiny test.xml file:
> >
> > <?xml version="1.0"?>
> > <fields>
> > <v1>1 </v1>
> > <v2> 2 </v2>
> > <v3> 3</v3>
> > </fields>
> >
> > i.e. I have three fields v1, v2 and v3 which differ only by spacing.
> > Now when reading it as
> >
> > > a <- xmlTreeParse("/home/otoomet/tyyq/Taani-piir/andmed/test.xml")
> > > a$doc$children$fields
> > <fields>
> > <v1>
> > </v1>
> > <v2>
> > 2
> > </v2>
> > <v3>
> > 3
> > </v3>
> > </fields>
> >
> > you can see that field v1 is empty. Is it my misinterpretation, or a
> > problem with the library?
> >
> > Thanks in advance,
> >
> > Ott
> >
> > -----------------
> > > version
> > _
> > platform i686-pc-linux-gnu
> > arch i686
> > os linux-gnu
> > system i686, linux-gnu
> > status
> > major 1
> > minor 5.1
> > year 2002
> > month 06
> > day 17
> > language R
> > ------------
> > Package: XML
> > Version: 0.93-1
> > Date: 2002/11/06
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
--
_______________________________________________________________
Duncan Temple Lang duncan at research.bell-labs.com
Bell Labs, Lucent Technologies office: (908)582-3217
700 Mountain Avenue, Room 2C-259 fax: (908)582-3340
Murray Hill, NJ 07974-2070
http://cm.bell-labs.com/stat/duncan
More information about the R-help
mailing list