[R] How to extract following data
Dieter Menne
dieter.menne at menne-biomed.de
Wed Nov 5 08:36:02 CET 2008
RON70 <ron_michael70 <at> yahoo.com> writes:
>
> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
> <Date>2005-01-17T00:00:00+05:30</Date>
> <SecurityID>10149</SecurityID>
> <PriceClose>1288.40002</PriceClose>
> </Temp>
....
Looks suspiciously like XML, and let's hope the real data are more like this
below, without the "-" and with a nice header
<?xml version="1.0" encoding="utf-8"?>
<temps>
<Temp diffgr:id="Temp14" msdata:rowOrder="13">
<Date>2005-01-17T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1288.40002</PriceClose>
</Temp>
<Temp diffgr:id="Temp15" msdata:rowOrder="14">
<Date>2005-01-18T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1291.69995</PriceClose>
</Temp>
<Temp diffgr:id="Temp16" msdata:rowOrder="15">
<Date>2005-01-19T00:00:00+05:30</Date>
<SecurityID>10149</SecurityID>
<PriceClose>1288.19995</PriceClose>
</Temp>
</temps>
The following code should give you a starter; some massaging of the Dates
required. There are warnings because of the missing prefixes diffgr and msdata.
For a first attempt, you can ignore these, but better get the full data set.
library(XML)
doc = xmlInternalTreeParse("temp.xml")
Date = sapply(getNodeSet(doc, "//Date"), xmlValue)
SecurityID = as.integer(sapply(getNodeSet(doc, "//SecurityID"), xmlValue))
Dieter
More information about the R-help
mailing list