[R] XML and str
Martin Maechler
maechler at stat.math.ethz.ch
Sat Feb 10 16:54:23 CET 2007
>>>>> "DTL" == Duncan Temple Lang <duncan at wald.ucdavis.edu>
>>>>> on Sat, 10 Feb 2007 07:18:30 -0800 writes:
DTL> Martin Maechler wrote:
>>>>>>> "Ashley" == Ashley Ford <ford at signal.QinetiQ.com>
>>>>>>> on Wed, 07 Feb 2007 17:18:56 +0000 writes:
>>
Ashley> If I read in an .xml file eg with
>>
>> >> xeg <- xmlTreeParse(system.file("exampleData", "test.xml",
>> package="XML"))
>>
Ashley> It appears to be OK however examining it with str() gives an apparent
Ashley> error
>>
>> >> str(xeg, 2)
Ashley> List of 2
Ashley> $ doc:List of 3
Ashley> ..$ file : list()
Ashley> .. ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
Ashley> ..$ version :List of 4
Ashley> .. ..- attr(*, "class")= chr "XMLNode"
Ashley> ..$ children:Error in obj$children[[...]] : subscript out of bounds
>>
Ashley> I am unsure if this is a feature or a bug and if the latter whether it
Ashley> is in XML or str, it is not causing a problem but I would like to
Ashley> understand what is happening, any ideas ?
>>
>> Yes - thank you for providing a well-reproducible example.
>> After setting
>> options(error = recover)
>>
>> I do
>>
>> > obj <- xeg$doc
>> > mode(obj) # "list"
>> [1] "list"
>> > is.list(obj) # TRUE
>> [1] TRUE
>> > length(obj) # 3
>> [1] 3
>> > obj[[3]] # ---> the error you see above.
>> Error in obj$children[[...]] : subscript out of bounds
>>
>> Enter a frame number, or 0 to exit
>>
>> 1: obj[[3]]
>> 2: `[[.XMLDocumentContent`(obj, 3)
>>
>> Selection: 0
>>
>> > obj$children # works, should be identical to obj[[3]]
>> $comment
>> <!--A comment-->
>>
>> $foo
>> <foo x="1">
>> <element attrib1="my value"/>
>> ......
>>
>> This shows that the XML package implements the "[[" method
>> wrongly IMHO and also inconsistently with the "$" method.
>>
>>> From a strict OOP view, the XML author could argue that
>> this is not a bug in XML but rather str() which assumes that
>> x[[length(x)]] works for objects of mode "list" even when they
>> are not of *class* "list", but I hope he would still rather
>> consider changing [[.XMLDocumentContent ...
>>
DTL> More likely, the appropriate fix is to have
DTL> length() return the relevant value.
Hmm.
> library(XML)
> xeg <- xmlTreeParse(system.file("exampleData", "test.xml", package= "XML"))
> obj <- xeg$doc
> mode(obj) # "list"
[1] "list"
> is.list(obj) # TRUE
[1] TRUE
> length(obj) # 3
[1] 3
> obj[[3]] # ---> the error you see above.
Error in obj$children[[...]] : subscript out of bounds
> names(obj)
[1] "file" "version" "children"
> class(obj)
[1] "XMLDocumentContent"
> methods(class=class(obj))
[1] xmlApply.XMLDocumentContent* [[.XMLDocumentContent*
[3] xmlRoot.XMLDocumentContent* xmlSApply.XMLDocumentContent*
> XML:::`[[.XMLDocumentContent`
function (obj, ...)
{
obj$children[[...]]
}
<environment: namespace:XML>
so length(obj) is 3 and obj is a simple S3 object
which is just a list with 3 named components,
Do you really want to define length(.) to also return the
length of obj$children instead of the length() of the list
itself?
With that you'd have your XMLDocumentContent objects ``look''
like lists with three named components on one hand
(and help(xmlTreeParse) does mention these components)
but behave in other contexts as if it was just its own component
'obj$children'. Of course you then should also define
print.XMLDocumentContent() and
str.XMLDocumentContent() accordingly,
so users would barely know about the "file" and "version"
component of 'obj'.
But is this really desirable ?
With the above "[[.XMLDoc..." you break the basic S-language
premise of "[[" and "$" to behave accordingly.
You could solve "everything" elegantly if you used S4 instead of S3
classes, since there's no defined correspondence between slot
access and "[[" (and yes, then (with S4), I'd agree that
setMethod("length", "XMLDocumentContent",
function(x) length(x at children))
would be needed too -- and fine.
Martin
DTL> I even recall considering this at the time of writing
DTL> the package initially. But that was back in 1999/2000
DTL> and S4 and R/S-Plus compatibility were not what they
DTL> are now. It could be changed. Not certain when I will
DTL> get a chance.
Ashley> examining components eg
>> >> str(xeg$doc$children,2)
>>
Ashley> List of 2
Ashley> $ comment: list()
Ashley> ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
Ashley> etc
>>
Ashley> is OK.
>>
Ashley> XML Version 1.4-1,
Ashley> same behaviour on Windows and Linux, R version 2.4.1 (2006-12-18)
>>
More information about the R-help
mailing list