[R] The time series analysis functions/packages don't seem to like my data

Ted Byers r.ted.byers at gmail.com
Sat Jul 4 02:54:42 CEST 2009


Sorry, I should have read the read.zoo documentation before replying
to thank Gabor for his repsonse.

Here is how it starts:

"read.zoo(zoo) R Documentation

Reading and Writing zoo Series
Description
read.zoo and write.zoo are convenience functions for reading and
writing "zoo" series from/to text files. They are convenience
interfaces to read.table and write.table, respectively.

Usage
read.zoo(file, format = "", tz = "", FUN = NULL,
  regular = FALSE, index.column = 1, aggregate = FALSE, ...)"

Clearly this should solve both our problems.

Cheers,

Ted

On Fri, Jul 3, 2009 at 8:40 PM, Mark Knecht<markknecht at gmail.com> wrote:
> On Fri, Jul 3, 2009 at 4:34 PM, Ted Byers<r.ted.byers at gmail.com> wrote:
>> Hi Mark
>>
>> Thanks for replying.
>>
>> Here is a short snippet that reproduces the problem:
>>
>> library(PerformanceAnalytics)
>> thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
>> = FALSE, na.strings="")
>> thedata
>> x = as.timeseries(thedata)
>> x
>> table.Drawdowns(thedata,top = 10)
>> table.Drawdowns(thedata$V2, top = 10)
>>
>> The object 'thedata' has exactly what I expected. the line 'thedata'
>> prints the correct contents of the file with each row prepended by a
>> line number.  The last few lines are:
>>
>> 8191 2009-06-17 48.40
>> 8192 2009-06-18 47.72
>> 8193 2009-06-19 48.83
>> 8194 2009-06-22 46.85
>> 8195 2009-06-23 47.11
>> 8196 2009-06-24 46.97
>> 8197 2009-06-25 47.43
>>
>> The number of lines (8197), dates (and their format) and prices are correct.
>>
>> The last four lines produce the following output:
>>> x = as.timeseries(thedata)
>> Error: could not find function "as.timeseries"
>>> x
>> Error: object 'x' not found
>>> table.Drawdowns(thedata,top = 10)
>> Error in 1 + na.omit(x) : non-numeric argument to binary operator
>>> table.Drawdowns(thedata$V2, top = 10)
>> Error in if (thisSign == priorSign) { :
>>  missing value where TRUE/FALSE needed
>>>
>>
>> Are the functions in your example in Rmetrics or PerformanceAnalytics?
>> (like I said, I am just beginning this exploration, and I started with
>> table.Drawdowns because it produces information that I need first)
>> And given that my data is in tab delimited files, and can be read
>> using read.csv, how do I feed my data into your four statements?
>>
>> My guess is I am missing something in coercing my data in (the data
>> frame?) thedata into a timeseries array of the sort the time series
>> analysis functions need: and one of the things I find a bit confusing
>> is that some of the documentation for this mentions S3 classes and
>> some mentions S4 classes (I don't know if that means I have to make
>> multiple copies of my data to get the output I need).  I could coerce
>> thedata$V2 into a numeric vector, but I'd rather not separate the
>> prices from their dates unless that is necessary (how would one
>> produce monthly, annual or annualized rates of return if one did
>> that?).
>>
>> Thanks
>>
>> Ted
>>
>> On Fri, Jul 3, 2009 at 6:39 PM, Mark Knecht<markknecht at gmail.com> wrote:
>>> On Fri, Jul 3, 2009 at 2:48 PM, Ted Byers<r.ted.byers at gmail.com> wrote:
>>>> I have hundreds of megabytes of price data time series, and perl
>>>> scripts that extract it to tab delimited files (I have C++ programs
>>>> that must analyse this data too, so I get Perl to extract it rather
>>>> than have multiple connections to the DB).
>>>>
>>>> I can read the data into an R object without any problems.
>>>>
>>>> thedata = read.csv("K:\\Work\\SignalTest\\BP.csv", sep = "\t", header
>>>> = FALSE, na.strings="")
>>>> thedata
>>>>
>>>> The above statements give me precisely what I expect.  The last few
>>>> lines of output are:
>>>> 8190 2009-06-16 49.30
>>>> 8191 2009-06-17 48.40
>>>> 8192 2009-06-18 47.72
>>>> 8193 2009-06-19 48.83
>>>> 8194 2009-06-22 46.85
>>>> 8195 2009-06-23 47.11
>>>> 8196 2009-06-24 46.97
>>>> 8197 2009-06-25 47.43
>>>>
>>>> I have loaded Rmetrics and PerformanceAnalytics, among other packages.
>>>>  I tried as.timeseries, but R2.9.1 tells me there is no such function.
>>>> I tried as.ts(thedata), but that only replaces the date field by the
>>>> row label in 'thedata'.
>>>>
>>>> If I apply the performance analytics drawdowns function to either
>>>> thedata or thedate$V2, I get errors:
>>>>> table.Drawdowns(thedata,top = 10)
>>>> Error in 1 + na.omit(x) : non-numeric argument to binary operator
>>>>> table.Drawdowns(thedata$V2, top = 10)
>>>> Error in if (thisSign == priorSign) { :
>>>>  missing value where TRUE/FALSE needed
>>>>>
>>>>
>>>> thedata$V2 by itself does give me the price data from the file.
>>>>
>>>> I am a relative novice in using R for timeseries, so I wouldn't be
>>>> surprised it I missed something that would be obvious to someone more
>>>> practiced in using R, but I don't see what that could be from the
>>>> documentation of the functions I am looking at using.  I have no
>>>> shortage of data, and I don't want to write C++ code, or perl code, to
>>>> do all the kinds of calculations provided in, Rmetrics and
>>>> performanceanalytics, but getting my data into the functions these
>>>> packages provide is killing me!
>>>>
>>>> What did I miss?
>>>>
>>>> Thanks
>>>>
>>>> Ted
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>> Could you supply some portion of the results when you run the example
>>> on your data? The example goes like:
>>>
>>> data(edhec)
>>> R=edhec[,"Funds.of.Funds"]
>>> findDrawdowns(R)
>>> sortDrawdowns(findDrawdowns(R))
>>>
>>> How are you using the function with your data?
>>>
>>> - Mark
>>>
>>
>
> Sorry, findDrawdowns is part of PerformanceAnalytics. I've added that
> to this code so you can just copy it and run it all.
>
> require(PerformanceAnalytics)
> data(edhec)
> class(edhec)
> R=edhec[,"Funds.of.Funds"]
> class(R)
> findDrawdowns(R)
> sortDrawdowns(findDrawdowns(R))
>
> This is a subject that interests me. My data is read in using read.csv
> which I think may provide similar problems should I ever want to use
> this so I'm interested in how I solve it. I'm a newbie so be VERY
> careful about anything I say!
>
> What I see is that edhec is of class zoo, as is R. Here's what I did to check:
>
>> require(PerformanceAnalytics)
>> data(edhec)
>> class(edhec)
> [1] "zoo"
>> R=edhec[,"Funds.of.Funds"]
>
>
>
>> class(R)
> [1] "zoo"
>> findDrawdowns(R)
> <SNIP>
>
>> sortDrawdowns(findDrawdowns(R))
> <SNIP>
>
>
> Note that R, class zoo, has dates as the names and then a single column of data:
>
>>
>> R
> Jan 1997 Feb 1997 Mar 1997 Apr 1997 May 1997 Jun 1997 Jul 1997 Aug
> 1997 Sep 1997 Oct 1997 Nov 1997 Dec 1997 Jan 1998 Feb 1998 Mar 1998
> Apr 1998
>  0.0317   0.0106  -0.0077   0.0009   0.0275   0.0225   0.0435
> 0.0051   0.0334  -0.0099  -0.0034   0.0089  -0.0036   0.0256   0.0373
>  0.0125
> May 1998 Jun 1998 Jul 1998 Aug 1998 Sep 1998 Oct 1998 Nov 1998 Dec
> 1998 Jan 1999 Feb 1999 Mar 1999 Apr 1999 May 1999 Jun 1999 Jul 1999
> Aug 1999
>  -0.0072   0.0021  -0.0007  -0.0616  -0.0037  -0.0002   0.0220
> 0.0222   0.0202  -0.0063   0.0213   0.0400   0.0119   0.0282   0.0088
>  0.0028
> <SNIP>
>>
>
>> names(R)
>  [1] "1997-01-31" "1997-02-28" "1997-03-31" "1997-04-30" "1997-05-31"
> "1997-06-30" "1997-07-31" "1997-08-31" "1997-09-30" "1997-10-31"
>  [11] "1997-11-30" "1997-12-31" "1998-01-31" "1998-02-28" "1998-03-31"
> "1998-04-30" "1998-05-31" "1998-06-30" "1998-07-31" "1998-08-31"
>  [21] "1998-09-30" "1998-10-31" "1998-11-30" "1998-12-31" "1999-01-31"
> "1999-02-28" "1999-03-31" "1999-04-30" "1999-05-31" "1999-06-30"
> <SNIP>
>
>> as.matrix(R)
>                 R
> 1997-01-31  0.0317
> 1997-02-28  0.0106
> 1997-03-31 -0.0077
> 1997-04-30  0.0009
> 1997-05-31  0.0275
> 1997-06-30  0.0225
> 1997-07-31  0.0435
> 1997-08-31  0.0051
> 1997-09-30  0.0334
> 1997-10-31 -0.0099
> 1997-11-30 -0.0034
> 1997-12-31  0.0089
> 1998-01-31 -0.0036
> 1998-02-28  0.0256
> 1998-03-31  0.0373
> 1998-04-30  0.0125
>
> <SNIP>
>
> So the question, as of yet unanswered by me, is how to coerce the data
> into that format. If we can then we will see if it works with dollar
> data as opposed to the fractional stuff in the edhec file.
>
> I'll be looking at this but don't expect much. I'm out in the deep end
> at this point.
>
> Cheers,
> Mark
>




More information about the R-help mailing list