[R] subsetting/slicing xml2 nodesets
Tobias Fellinger
tobby @end|ng |rom htu@@t
Wed Aug 21 12:00:59 CEST 2019
Dear R-help members,
I'm working with the xml2 package to parse an xml document, and I don't
understand how subsetting / slicing of xml_nodesets works. I'd expect
xml_find_all to only return children of the nodes I selected with [ or
[[ but it returns all nodes found in the whole document. I did not find
any documentation on the [ and [[ operators for xml_nodeset. Below is a
small example and the sessionInfo.
thanks in advance, Tobias Fellinger
# load package
require(xml2)
# test document as text
test_chr <- "
<html>
<body>
<p>paragraph 1</p>
<p>paragraph 2</p>
</body>
</html>
"
# parse test document
test_doc <- read_xml(test_chr)
# extract nodeset
test_nodeset <- xml_find_all(test_doc, "//p")
# subset nodeset (working as expected)
test_nodeset[1]
# {xml_nodeset (1)}
# [1] <p>paragraph 1</p>
test_nodeset[[1]]
# {xml_node}
# <p>
# extract from subset (not working as expected)
xml_find_all(test_nodeset[1], "//p")
# {xml_nodeset (2)}
# [1] <p>paragraph 1</p>
# [2] <p>paragraph 2</p>
xml_find_all(test_nodeset[[1]], "//p")
# {xml_nodeset (2)}
# [1] <p>paragraph 1</p>
# [2] <p>paragraph 2</p>
sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# Matrix products: default
#
# locale:
# [1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252
LC_MONETARY=German_Austria.1252 LC_NUMERIC=C
LC_TIME=German_Austria.1252
#
# attached base packages:
# [1] stats graphics grDevices utils datasets methods
base
#
# other attached packages:
# [1] xml2_1.2.2
#
# loaded via a namespace (and not attached):
# [1] compiler_3.6.0 tools_3.6.0 Rcpp_1.0.2 packrat_0.5.0
More information about the R-help
mailing list