[R] How to extract following data

Dieter Menne dieter.menne at menne-biomed.de
Wed Nov 5 08:36:02 CET 2008


RON70 <ron_michael70 <at> yahoo.com> writes:

> 
> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>   <Date>2005-01-17T00:00:00+05:30</Date> 
>   <SecurityID>10149</SecurityID> 
>   <PriceClose>1288.40002</PriceClose> 
>   </Temp>
....

Looks suspiciously like XML, and let's hope the real data are more like this
below, without the "-" and with a nice header

<?xml version="1.0" encoding="utf-8"?>
<temps>
<Temp diffgr:id="Temp14" msdata:rowOrder="13">
  <Date>2005-01-17T00:00:00+05:30</Date> 
  <SecurityID>10149</SecurityID> 
  <PriceClose>1288.40002</PriceClose> 
</Temp>
<Temp diffgr:id="Temp15" msdata:rowOrder="14">
  <Date>2005-01-18T00:00:00+05:30</Date> 
  <SecurityID>10149</SecurityID> 
  <PriceClose>1291.69995</PriceClose> 
</Temp>
<Temp diffgr:id="Temp16" msdata:rowOrder="15">
  <Date>2005-01-19T00:00:00+05:30</Date> 
  <SecurityID>10149</SecurityID> 
  <PriceClose>1288.19995</PriceClose> 
</Temp>
</temps>

The following code should give you a starter; some massaging of the Dates
required. There are warnings because of the missing prefixes diffgr and msdata.
For a first attempt, you can ignore these, but better get the full data set.

library(XML)
doc = xmlInternalTreeParse("temp.xml")
Date = sapply(getNodeSet(doc, "//Date"), xmlValue)
SecurityID = as.integer(sapply(getNodeSet(doc, "//SecurityID"), xmlValue))



Dieter



More information about the R-help mailing list