[R] How to import and create time series data frames in an efficient way?

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Fri Nov 15 05:44:54 CET 2019


Ha! -- A bug! "Corrected" version inline below:
Bert Gunter

On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> Brute force approach, possibly inefficient:
>
> 1. You have a vector of file names. Sort them in the appropriate (time)
> order. These names are also the component names of all the data frames in
> your list that you read in, call it yourlist.
>
> 2. Create a vector of all the unique ticker names, perhaps by creating a
> vector of all the names and then unique() -ing it. Call this vector snames
> with n.names in it. It will probably have length several hundred at least I
> assume.
>
> 3. Suppose the  6 columns of data of each data frame that you want are
> named cnames = c("stocknames","Open", "High", "Low", "Close", "Volume").
>
> 4. You could proceed as you suggested, but it would likely be more
> efficient, since all data that you want are numeric, to create a 3D array
> of NA's via:
>
> yourdat <- array(dim = c(n.dates, n.names, 5), dimnames = list(NULL,
> snames, cnames[-1]))
>
> 5. Then just loop  through your list of files and use indexing to fill in
> the columns x category slices for each date. Stocks that are missing will
> be NA automatically. e.g. (warning: UNTESTED):
>
> For date "d", let df be the data frame from date "d" in your list, i.e.
>
> df <- yourlist[["d"]][, cnames]
> ## Note The order of the listed stocks in the "stocknames" column can be
> different from frame to frame of your master list.
>
> Then fill in the flat for the dth date (i.e. dth row) in your array by:
>
> ## corrected line here:

> yourdat[ ,df[ ,"stocknames"], cnames[-1] <- as.matrix(df[ ,-1]) ## need
> to omit the column names so it converts to numeric matrix
>
## need to get the names of the stocks in the "stocknames" column in the
order they appear in df.

>
> This should fill in  the values of the 2nd and 3rd dimensions of the array
> for all the stocks on the dth date with the data for each stock in the data
> frame matched to the appropriate column in the array.
>
> The entire loop will give all dates for all stocks and all categories with
> NA's for missing days. (*IF IT WORKS!*)
> You may need to modify this sightly if, for example, your stock names are
> row names rather than a field in your data frame. I leave such adjustments
> to you.
>
> Note again that this is fairly elementary with just arrays and indexing.
> Basic tutorials should tell you about all of this. Also, when plotting,
> you'll have to convert your dates to suitable date-time format.
>
> Cheers,
> Bert
>
>
>
>
> On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhnhan using gmail.com> wrote:
>
>> Hi Bert,
>>
>> I've attempted to find the answer and actually been able to import the
>> individual data sets into a list of data frames.
>>
>> But I'm not sure how to go ahead with the next step. I'm not necessarily
>> asking for a final answer. Perhaps if you (I mean others as well) would
>> like a constructive coaching, you would suggest a few key words to look at?
>>
>> Sorry for the HTML thing, this is my first post. I'll do better next
>> times.
>>
>> Thanks,
>> Nathan
>>
>>
>>
>> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4567 using gmail.com>
>> wrote:
>>
>>> So you've made no attempt at all to do this for yourself?!
>>>
>>> That suggests to me that you need to spend time with some R tutorials.
>>>
>>> Also, please post in plain text on this plain text list. HTML can get
>>> mangled, as it may have here.
>>>
>>> -- Bert
>>> "The trouble with having an open mind is that people keep coming along
>>> and sticking things into it."
>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>
>>>
>>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan using gmail.com> wrote:
>>>
>>>> I have many separate data files in csv format for a lot of daily stock
>>>> prices. Over a few years there are hundreds of those data files, whose
>>>> names are the dates of data record.
>>>>
>>>> In each file there are variables of ticker (or stock trading code),
>>>> date,
>>>> open price, high price, low price, close price, and trading volume. For
>>>> example, inside a data file named 20150128.txt it looks like this:
>>>>
>>>> FB,20150128,1.075,1.075,0.97,0.97,725221
>>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682
>>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900
>>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845
>>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many
>>>> more..................
>>>>
>>>> In case it's relevant, the number of stocks in these files are not
>>>> necessarily the same (so there will be missing data). I need to import
>>>> and
>>>> create 5 separate time series data frames from those files, one each for
>>>> Open, High, Low, Close and Volume. In each data frame, rows are indexed
>>>> by
>>>> date, and columns by ticker. For example, the data frame Open may look
>>>> like
>>>> this:
>>>>
>>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,...
>>>> 20150129,NA,2.3,0.5,5.2,1.7,... ...
>>>>
>>>> What will be an efficient way to do that? I've used the following codes
>>>> to
>>>> read the files into a list of data frames but don't know what to do next
>>>> from here.
>>>>
>>>> files = list.files(pattern="*.txt") mydata = lapply(files,
>>>> read.csv,head=FALSE)
>>>>
>>>> Thanks,
>>>>
>>>> Nathan
>>>>
>>>> Disclaimer: In case it's relevant, this question is also posted on
>>>> stackoverflow.
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list