[R] XML getNodeSet syntax for PUBMED XML export

Duncan Temple Lang duncan at wald.ucdavis.edu
Wed Sep 8 19:11:48 CEST 2010


Hi Rob

  doc = xmlParse("url for document")

  dn = getNodeSet(doc, "//DescriptorName[@MajorTopic = 'Y']")

will do what you want, I believe.

XPath - a language for expressing such queries - is quite
simple and based on a few simple primitive concepts from which
one can create complex compound queries. The //DescriptorName
is a node test. The [] is a predicate that includes/discards
some of the resulting nodes.

   D.

On 9/8/10 9:09 AM, Rob James wrote:
>      I am looking for the syntax to capture XML tags marked with 
> /DescriptorName MajorTopicYN="Y"/ , but the combination of the internal 
> space (between "Name" and "Major" and the embedded quote marks are 
> defeating me. I can get all the "DescriptorName" tags, but these include 
> both MajroTopicYN = "Y" and "N" variants. Any suggestions?
> 
> Thanks in advance.
> 
> Prototype text from PUBMED
> 
> <MeshHeadingList>
> <MeshHeading>
> <DescriptorName MajorTopicYN="Y">Antibodies, Monoclonal</DescriptorName>
> </MeshHeading>
> <MeshHeading>
> <DescriptorName MajorTopicYN="N">Blood Platelets</DescriptorName>
> <QualifierName MajorTopicYN="N">immunology</QualifierName>
> <QualifierName MajorTopicYN="Y">physiology</QualifierName>
> <QualifierName MajorTopicYN="N">ultrastructure</QualifierName>
> </MeshHeading>
> </MeshHeadingList>
> 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list