[R] dates and time series management
arun
smartpink111 at yahoo.com
Wed Jun 5 03:57:00 CEST 2013
Hi,
Forgot that you wanted the result in a data.frame
fun1<- function(lstf){
lst1<-lapply(lstf,function(x) read.table(x,sep="",header=TRUE,stringsAsFactors=FALSE))
lst2<- lapply(lst1,function(x) x[x$V1>=1961 & x$V1<=2005,])
lst3<- lapply(lst2,function(x) {
if((min(x$V1)>1961)|(max(x$V1)<2005)){
n1<- (min(x$V1)-1961)*12
x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1))
n2<- (2005-max(x$V1))*12
x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2))
x3<- rbind(x1,x,x2)
}
else {
x
} })
lst4<- lapply(lst3,function(x) data.frame(col1=unlist(x[,-c(1:2)])))
lst5<- lapply(seq_along(lst4),function(i){
x<- lst4[[i]]
colnames(x)<- lstf[i]
row.names(x)<- 1:nrow(x)
x
})
do.call(cbind,lst5)}
res<-fun1(lstf1)
head(res)
# dt3031093-1.txt dt3031093-2.txt dt3031093-3.txt
#1 0.21 0.21 NA
#2 0.00 0.00 NA
#3 0.21 0.21 NA
#4 0.00 0.00 NA
#5 0.00 0.00 NA
#6 0.00 0.00 NA
dim(res)
#[1] 16740 3
(2005-1960)*12*31
#[1] 16740
A.K.
----- Original Message -----
From: arun <smartpink111 at yahoo.com>
To: Zilefac Elvis <zilefacelvis at yahoo.com>
Cc: R help <r-help at r-project.org>
Sent: Tuesday, June 4, 2013 8:14 PM
Subject: Re: dates and time series management
Hi,
May be this helps:
I duplicated your dataset (only one was attached) and changed the dates.
lstf1<- list.files(pattern=".txt")
lstf1
#[1] "dt3031093-1.txt" "dt3031093-2.txt" "dt3031093-3.txt"
#3rd one has less number of observations.
fun1<- function(lstf){
lst1<-lapply(lstf,function(x) read.table(x,sep="",header=TRUE,stringsAsFactors=FALSE))
lst2<- lapply(lst1,function(x) x[x$V1>=1961 & x$V1<=2005,])
lst3<- lapply(lst2,function(x) {
if((min(x$V1)>1961)|(max(x$V1)<2005)){
n1<- (min(x$V1)-1961)*12
x1<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n1))
n2<- (2005-max(x$V1))*12
x2<- as.data.frame(matrix(NA,ncol=ncol(x),nrow=n2))
x3<- rbind(x1,x,x2)
}
else {
x
} })
lst4<- lapply(lst3,function(x) data.frame(col1=unlist(x[,-c(1:2)])))
lst5<- lapply(seq_along(lst4),function(i){
x<- lst4[[i]]
colnames(x)<- lstf[i]
row.names(x)<- 1:nrow(x)
x
})
lst5}
res<-fun1(lstf1)
lapply(res,head,3)
#[[1]]
# dt3031093-1.txt
#1 0.21
#2 0.00
#3 0.21
#
#[[2]]
# dt3031093-2.txt
#1 0.21
#2 0.00
#3 0.21
#
#[[3]]
# dt3031093-3.txt
#1 NA
#2 NA
#3 NA
sapply(res,nrow)
#[1] 16740 16740 16740
A.K.
________________________________
From: Zilefac Elvis <zilefacelvis at yahoo.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: "smartpink111 at yahoo.com" <smartpink111 at yahoo.com>
Sent: Tuesday, June 4, 2013 6:23 PM
Subject: dates and time series management
Hi,
I have 100 files (4 attached for your reference) with different file names,
different start and end dates. Years and months occupy 1st and 2nd columns while days occupy the rest of the 33 columns in each file.
If date starts before 1961 and ends after 2005, extract all rows between 1961 to 2005 in all 100 files,
else, if date starts after 1961 and does not go up till 2005, retain the values as they are, then generate a date vector "%Y-%m" from 1961 to 2005 and fill spaces without values using 'NA'. For example, in one file I have data from 1970 to 2000. I would like to generate dates from 1961 to 2005, fill 1961-1966, and 2001-2005 with 'NA'. Do same for all 100 files.
After doing the extracting and replacing, all files will have a common date window (1961-2005).
Now, delete year and month from each file (i.e. first two columns in each file) and convert each file to as.vector (column vector. i.e take column 4 and place under column 3 etc). My expected output would then be 100 files each having a column vector.
Finally, I would like to use the original file names as the resulting column names for each file. Then combine all 100 files in a data.frame
Using 4 files, final output should be 'equal rows * 4 columns', e.g 16354 rows * 4 columns, say.
Thanks so much.
Atem
More information about the R-help
mailing list