[R] subsetting/slicing xml2 nodesets

Tobias Fellinger tobby @end|ng |rom htu@@t
Wed Aug 21 12:00:59 CEST 2019


Dear R-help members,

I'm working with the xml2 package to parse an xml document, and I don't 
understand how subsetting / slicing of xml_nodesets works. I'd expect 
xml_find_all to only return children of the nodes I selected with [ or 
[[ but it returns all nodes found in the whole document. I did not find 
any documentation on the [ and [[ operators for xml_nodeset. Below is a 
small example and the sessionInfo.

thanks in advance, Tobias Fellinger



# load package
require(xml2)

# test document as text
test_chr <- "
<html>
<body>
<p>paragraph 1</p>
<p>paragraph 2</p>
</body>
</html>
"

# parse test document
test_doc <- read_xml(test_chr)

# extract nodeset
test_nodeset <- xml_find_all(test_doc, "//p")

# subset nodeset (working as expected)
test_nodeset[1]
# {xml_nodeset (1)}
# [1] <p>paragraph 1</p>
test_nodeset[[1]]
# {xml_node}
# <p>

# extract from subset (not working as expected)
xml_find_all(test_nodeset[1], "//p")
# {xml_nodeset (2)}
# [1] <p>paragraph 1</p>
# [2] <p>paragraph 2</p>
xml_find_all(test_nodeset[[1]], "//p")
# {xml_nodeset (2)}
# [1] <p>paragraph 1</p>
# [2] <p>paragraph 2</p>

sessionInfo()
# R version 3.6.0 (2019-04-26)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 7 x64 (build 7601) Service Pack 1
#
# Matrix products: default
#
# locale:
#   [1] LC_COLLATE=German_Austria.1252  LC_CTYPE=German_Austria.1252    
LC_MONETARY=German_Austria.1252 LC_NUMERIC=C                    
LC_TIME=German_Austria.1252
#
# attached base packages:
#   [1] stats     graphics  grDevices utils     datasets  methods   
base
#
# other attached packages:
#   [1] xml2_1.2.2
#
# loaded via a namespace (and not attached):
#   [1] compiler_3.6.0 tools_3.6.0    Rcpp_1.0.2     packrat_0.5.0



More information about the R-help mailing list