[R] How to import and create time series data frames in an efficient way?

Mon Nov 18 03:33:49 CET 2019

Thanks Bert. I also managed to get this work

files = list.files(pattern="*.txt")
df = ldply(files, read_csv,col_names=c("ticker","date","open","high",
"low", "close", "volume"))
Cheers,
Nathan

On Fri, Nov 15, 2019 at 3:45 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> Ha! -- A bug! "Corrected" version inline below:
> Bert Gunter
>
> On Thu, Nov 14, 2019 at 8:10 PM Bert Gunter <bgunter.4567 using gmail.com>
> wrote:
>
>> Brute force approach, possibly inefficient:
>>
>> 1. You have a vector of file names. Sort them in the appropriate (time)
>> order. These names are also the component names of all the data frames in
>> your list that you read in, call it yourlist.
>>
>> 2. Create a vector of all the unique ticker names, perhaps by creating a
>> vector of all the names and then unique() -ing it. Call this vector snames
>> with n.names in it. It will probably have length several hundred at least I
>> assume.
>>
>> 3. Suppose the  6 columns of data of each data frame that you want are
>> named cnames = c("stocknames","Open", "High", "Low", "Close", "Volume").
>>
>> 4. You could proceed as you suggested, but it would likely be more
>> efficient, since all data that you want are numeric, to create a 3D array
>> of NA's via:
>>
>> yourdat <- array(dim = c(n.dates, n.names, 5), dimnames = list(NULL,
>> snames, cnames[-1]))
>>
>> 5. Then just loop  through your list of files and use indexing to fill in
>> the columns x category slices for each date. Stocks that are missing will
>> be NA automatically. e.g. (warning: UNTESTED):
>>
>> For date "d", let df be the data frame from date "d" in your list, i.e.
>>
>> df <- yourlist[["d"]][, cnames]
>> ## Note The order of the listed stocks in the "stocknames" column can be
>> different from frame to frame of your master list.
>>
>> Then fill in the flat for the dth date (i.e. dth row) in your array by:
>>
>> ## corrected line here:
>
>> yourdat[ ,df[ ,"stocknames"], cnames[-1] <- as.matrix(df[ ,-1]) ## need
>> to omit the column names so it converts to numeric matrix
>>
> ## need to get the names of the stocks in the "stocknames" column in the
> order they appear in df.
>
>>
>> This should fill in  the values of the 2nd and 3rd dimensions of the
>> array for all the stocks on the dth date with the data for each stock in
>> the data frame matched to the appropriate column in the array.
>>
>> The entire loop will give all dates for all stocks and all categories
>> with NA's for missing days. (*IF IT WORKS!*)
>> You may need to modify this sightly if, for example, your stock names are
>> row names rather than a field in your data frame. I leave such adjustments
>> to you.
>>
>> Note again that this is fairly elementary with just arrays and indexing.
>> Basic tutorials should tell you about all of this. Also, when plotting,
>> you'll have to convert your dates to suitable date-time format.
>>
>> Cheers,
>> Bert
>>
>>
>>
>>
>> On Thu, Nov 14, 2019 at 4:55 PM Nhan La <lathanhnhan using gmail.com> wrote:
>>
>>> Hi Bert,
>>>
>>> I've attempted to find the answer and actually been able to import the
>>> individual data sets into a list of data frames.
>>>
>>> But I'm not sure how to go ahead with the next step. I'm not necessarily
>>> asking for a final answer. Perhaps if you (I mean others as well) would
>>> like a constructive coaching, you would suggest a few key words to look at?
>>>
>>> Sorry for the HTML thing, this is my first post. I'll do better next
>>> times.
>>>
>>> Thanks,
>>> Nathan
>>>
>>>
>>>
>>> On Fri, Nov 15, 2019 at 11:34 AM Bert Gunter <bgunter.4567 using gmail.com>
>>> wrote:
>>>
>>>> So you've made no attempt at all to do this for yourself?!
>>>>
>>>> That suggests to me that you need to spend time with some R tutorials.
>>>>
>>>> Also, please post in plain text on this plain text list. HTML can get
>>>> mangled, as it may have here.
>>>>
>>>> -- Bert
>>>> "The trouble with having an open mind is that people keep coming along
>>>> and sticking things into it."
>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>>>
>>>>
>>>> On Thu, Nov 14, 2019 at 4:11 PM Nhan La <lathanhnhan using gmail.com> wrote:
>>>>
>>>>> I have many separate data files in csv format for a lot of daily stock
>>>>> prices. Over a few years there are hundreds of those data files, whose
>>>>> names are the dates of data record.
>>>>>
>>>>> In each file there are variables of ticker (or stock trading code),
>>>>> date,
>>>>> open price, high price, low price, close price, and trading volume. For
>>>>> example, inside a data file named 20150128.txt it looks like this:
>>>>>
>>>>> FB,20150128,1.075,1.075,0.97,0.97,725221
>>>>> AAPL,20150128,2.24,2.24,2.2,2.24,63682
>>>>> AMZN,20150128,0.4,0.415,0.4,0.415,194900
>>>>> NFLX,20150128,50.19,50.21,50.19,50.19,761845
>>>>> GOOGL,20150128,1.62,1.645,1.59,1.63,684835 ...................and many
>>>>> more..................
>>>>>
>>>>> In case it's relevant, the number of stocks in these files are not
>>>>> necessarily the same (so there will be missing data). I need to import
>>>>> and
>>>>> create 5 separate time series data frames from those files, one each
>>>>> for
>>>>> Open, High, Low, Close and Volume. In each data frame, rows are
>>>>> indexed by
>>>>> date, and columns by ticker. For example, the data frame Open may look
>>>>> like
>>>>> this:
>>>>>
>>>>> DATE,FB,AAPL,AMZN,NFLX,GOOGL,... 20150128,1.5,2.2,0.4,5.1,1.6,...
>>>>> 20150129,NA,2.3,0.5,5.2,1.7,... ...
>>>>>
>>>>> What will be an efficient way to do that? I've used the following
>>>>> codes to
>>>>> read the files into a list of data frames but don't know what to do
>>>>> next
>>>>> from here.
>>>>>
>>>>> files = list.files(pattern="*.txt") mydata = lapply(files,
>>>>> read.csv,head=FALSE)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Nathan
>>>>>
>>>>> Disclaimer: In case it's relevant, this question is also posted on
>>>>> stackoverflow.
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>

	[[alternative HTML version deleted]]