[R] How to extract following data

Gabor Grothendieck ggrothendieck at gmail.com
Wed Nov 5 13:32:03 CET 2008


Here is another solution made slightly shorter by using
strapply twice:

z <- zoo(strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]],
  strapply(Lines, "....-..-..", as.Date)[[1]])

or to create a data frame:

DF <- data.frame(date = strapply(Lines, "....-..-..", as.Date)[[1]],
     price = strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]])

On Wed, Nov 5, 2008 at 6:22 AM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> As others have pointed out its close to XML but not quite
> there; however, you could use strapply in gsubfn to extract
> the data.  It pulls out the data matching the regular expression
> giving vector, vec, consisting of: date price date price ...
> Pulling out even and odd elements separately and
> converting them to Date and numeric, respectively, gives the
> resulting data.frame.
>
> See
> http://gsubfn.googlecode.com
> for more on the gsubfn package and
> the three zoo vignettes in the zoo package for more on it.
>
> Lines <- '- <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>  <Date>2005-01-17T00:00:00+05:30</Date>
>  <SecurityID>10149</SecurityID>
>  <PriceClose>1288.40002</PriceClose>
>  </Temp>
> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
>  <Date>2005-01-18T00:00:00+05:30</Date>
>  <SecurityID>10149</SecurityID>
>  <PriceClose>1291.69995</PriceClose>
>  </Temp>
> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
>  <Date>2005-01-19T00:00:00+05:30</Date>
>  <SecurityID>10149</SecurityID>
>  <PriceClose>1288.19995</PriceClose>
>  </Temp>'
>
> library(gsubfn)
> vec <- strapply(Lines, "....-..-..|[0-9]+[.][0-9]+")[[1]]
> ix <- seq_along(vec) %% 2 == 1
> DF <- data.frame(date = as.Date(vec[ix]), price = as.numeric(vec[!ix]))
>
> # or, instead of the last line, you could convert it to a zoo object so
> # that its in a more convenient form for time series manipulation:
>
> library(zoo)
> z <- zoo(as.numeric(vec[!ix]), as.Date(vec[ix]))
>
>
>
> On Wed, Nov 5, 2008 at 1:22 AM, RON70 <ron_michael70 at yahoo.com> wrote:
>>
>> Hi everyone,
>>
>> I have this kind of raw dataset :
>>
>> - <Temp diffgr:id="Temp14" msdata:rowOrder="13">
>>  <Date>2005-01-17T00:00:00+05:30</Date>
>>  <SecurityID>10149</SecurityID>
>>  <PriceClose>1288.40002</PriceClose>
>>  </Temp>
>> - <Temp diffgr:id="Temp15" msdata:rowOrder="14">
>>  <Date>2005-01-18T00:00:00+05:30</Date>
>>  <SecurityID>10149</SecurityID>
>>  <PriceClose>1291.69995</PriceClose>
>>  </Temp>
>> - <Temp diffgr:id="Temp16" msdata:rowOrder="15">
>>  <Date>2005-01-19T00:00:00+05:30</Date>
>>  <SecurityID>10149</SecurityID>
>>  <PriceClose>1288.19995</PriceClose>
>>  </Temp>
>>
>> I was looking for some R procedure to extract data from this, that should be
>> in following format :
>>
>> 2005-01-17 1288.40002
>> 2005-01-18 1291.69995
>> 2005-01-19 1288.19995
>>
>> Can R help me to do this?
>>
>> --
>> View this message in context: http://www.nabble.com/How-to-extract-following-data-tp20336690p20336690.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>



More information about the R-help mailing list