[R] Please help

Sun Mar 30 23:31:40 CEST 2014

Hi Atem,

Try this:

I created 3 folders (Precip, Tmax, Tmin) within the folder "sample"
#working directory: sample
list.files()
#[1] "Imputation_Daily_Sim01.dat"    "Imputation_Daily_Sim02.dat"   
#[3] "Imputation_Daily_Sim03.dat"    "Precip"                       
#[5] "Sim1971-2000_Daily_Sim001.dat" "Sim1971-2000_Daily_Sim002.dat"
#[7] "Sim1971-2000_Daily_Sim003.dat" "Tmax"                         
#[9] "Tmin" 

list.files(pattern="Sim1971-2000")
#[1] "Sim1971-2000_Daily_Sim001.dat" "Sim1971-2000_Daily_Sim002.dat"
#[3] "Sim1971-2000_Daily_Sim003.dat"

lst1 <- lapply(list.files(pattern="Sim1971-2000"),function(x) readLines(x))

lst1Not1970 <- lapply(lst1,function(x) x[!grepl("1970",x)]) 

#Using a small subset:
lst1Sub <- lapply(lst1Not1970,function(x) x[1:1000]) 

#replace lst1Sub with lst1Not1970 below 

lst2 <- lapply(lst1Sub,function(x) {dateSite <- gsub("(.*G\\d+).*","\\1",x); dat1 <- data.frame(Year=as.numeric(substr(dateSite,1,4)),Month=as.numeric(substr(dateSite,5,6)),Day=as.numeric(substr(dateSite,7,8)),Site=substr(dateSite,9,12),stringsAsFactors=FALSE);Sims <- gsub(".*G\\d+\\s+(.*)","\\1",x); Sims[grep("\\d+-",Sims)] <- gsub("(.*)([- ][0-9]+\\.[0-9]+)","\\1 \\2",gsub("^([0-9]+\\.[0-9]+)(.*)","\\1 \\2", Sims[grep("\\d+-",Sims)])); Sims1 <- read.table(text=Sims,header=FALSE); names(Sims1) <- c("Precipitation", "Tmin", "Tmax");dat2 <- cbind(dat1,Sims1)})

Precip <- lapply(lst2,function(x) x[,1:5])

Tmin <- lapply(lst2,function(x) x[,c(1:4,6)]) 

Tmax <- lapply(lst2,function(x) x[,c(1:4,7)])

Precip1 <- cbind(Precip[[1]][,1:4],do.call(cbind,lapply(Precip,`[`,5)))

names(Precip1)[5:ncol(Precip1)] <- paste0("Sim",sprintf("%03d",1:length(Precip))) 

lapply(split(Precip1,Precip1$Site),function(x) write.table(x,file=paste(getwd(),"Precip",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE))

Tmin1 <- cbind(Tmin[[1]][,1:4],do.call(cbind,lapply(Tmin,`[`,5)))

names(Tmin1) <- names(Precip1)

lapply(split(Tmin1,Tmin1$Site),function(x) write.table(x,file=paste(getwd(),"Tmin",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE)) 

Tmax1 <- cbind(Tmax[[1]][,1:4],do.call(cbind,lapply(Tmax,`[`,5)))

names(Tmax1) <- names(Precip1)

lapply(split(Tmax1,Tmax1$Site),function(x) write.table(x,file=paste(getwd(),"Tmax",paste0(unique(x$Site),".dat"),sep="/"),row.names=FALSE,quote=FALSE)) 

Hope this helps.
A.K.

On Friday, March 28, 2014 2:07 AM, Zilefac Elvis <zilefacelvis at yahoo.com> wrote:

Hi AK,
Consider that you had to use the large file which could not download. 
My final output will be as follows:
Three folders:
1) Precip
2) Tmin or minimum temperature
3) Tmax or maximum temperature

Within each folder, we will have 120 files. Each file is named by the site code e.g GGG1, GGG2 ,..., G120.
Each file will be a dataframe with the first 3 columns as date (Year,Month,Day). Years are from 1971-2000. For the large file, after the date columns are simulation numbers e.g Year,Month,Day,sim001,sim002...sim100. For the sample file, it would be Year,Month,Day,sim001,sim002,sim003.

Thanks again.
Atem.

On Thursday, March 27, 2014 11:55 PM, Zilefac Elvis <zilefacelvis at yahoo.com> wrote:

Hi AK,
Attached is a sample from the large file. The expected output is explained at the end of this message (bold).
It is a little lengthy but is worth it given that the number of sites is plentiful. I have attached three simulations, so your will have sim1,sim2,sim3 instead of sim1 to sim100 as in the previous message.
############################################################################
I have done some simulations in R and would like to order my data to usable format.
The data is to large so I have attached via Dropbox.
When you load Calibration.RData to the workspace, you will find the site codes (column 1) in "Prairies.Sites".
My initial dataset was in the form of a dataframe with with columns denoting stations. So I had three dataframes each for precipitation, Tmin, and Tmax. Individually, you reshaped the dataframes to three column vectors (see file called PrecipTminTmax) using this code: library(reshape2)
dat1 <- read.table("predictand.csv",header=TRUE,stringsAsFactors=TRUE,sep="\t") # Predictand.csv had 123 #columns with the columns 1,2,3 as date.
dat1<-precipitation
dat2M <- melt(dat1,id.var=c("year","month","day"))
dat2M1 <- dat2M[with(dat2M,order(year,month,day,variable)),]
dim(dat2M1)
#[1] 1972320       5
row.names(dat2M1) <- 1:nrow(dat2M1)
PrecipTminTmax<-cbind(precipitation,Tmin,Tmax) The problem to be solved Attached is a large file (SimCalibration.zip) containing my simulations (001 to 100). Please import files starting with "Sim1971-2000_Daily_" only. The rest is not important. My analysis is for the period 1971-2000. Any data before or after this period should be ignored.
My simulation was done in R using Fortran encoding to read data values. All files are ".dat". In each file, the columns are as follows :
Year, Month, Day, Site, Precip, Tmin, Tmax. In another project involving rainfall only, I read such files into R using this code:
rain.data <- scan("gaugvals.all",what=character(),sep="\n",n=257212)
rain.data <- data.frame(Year=as.numeric(substr(rain.data,1,4)),                        Month=as.numeric(substr(rain.data,5,6)),                        Day=as.numeric(substr(rain.data,7,8)),  
                      Site=substr(rain.data,10,12),                        
                      Rain=as.numeric(substr(rain.data,13,18))) 

Q1) So, I would like to read all files beginning with "Sim1971-2000_Daily_".
2) Split each file by variable name (Precip, Tmin, Tmax) and then arrange each variable in the form of a dataframe. For example, I will take precip from site GGG1 and have a data frame with colnames such as Year,Month,Day, sim1,sim2,...,sim100. Repeat this for all 120 sites. So that for Precip, you will have 120 files corresponding to the site codes. Each file has nrows with Year,Month,Day, sim1...sim100 columns. 3) Please repeat the above for Tmin and Tmax so that in the end I will have three folders (Precip, Tmin and Tmax). Each folder has 120 files with each file being a dataframe containing date and 100 columns).  When you successfullly go through this "difficult" section,I will access each folder, read each file and apply a function to it one at a time. Thanks AK, this is part of my Msc thesis project. Your help would be fully acknowledged. You have helped me a lot towards the success of this project. Atem.

On Thursday, March 27, 2014 9:09 PM, arun <smartpink111 at yahoo.com> wrote:

HI Atem,

I tried to download the first file. 
It is taking me forever.  With the speed I have, I doubt it would be successful.  Can you just provide some small reproducible example data and what your expected output would be?
Arun