[R] correlation matrix between data from different files

Rui Barradas rui1174 at sapo.pt
Fri Apr 13 18:18:22 CEST 2012


Hello,


jeff6868 wrote
> 
> Dear users,
> 
> I'm quite a new french R-user, and I have a problem about doing a
> correlation matrix.
> I have temperature data for each weather station of my study area and for
> each year (for example, a data file for the weather station N°1 for the
> year 2009, a data file  for the N°2 for the year 2010, ....). So I have 70
> weather stations with one data file per year since 2005. Each station has
> 4 temperature sensors.
> Each data file has exactly the same structure: date&hour, sensor1,
> sensor2, sensor3, sensor4. Here's an example:
> 
> time    	                  sensor1	sensor2	sensor3sensor4
> 01/01/2008 00:00	-0.25	-2.43	-3.25	-2.37
> 01/01/2008 00:15	-0.18	-2.37	-3.18	-2.25
> 01/01/2008 00:30	-0.25	-2.5	        -3.37	-2.56
> 01/01/2008 00:45	-0.25	-2.37	-3.31	-2.37
> 
> I need to do a matrix correlation between each same sensors of the
> different stations (one correlation matrix between all the sensors 1 of
> the 70 stations, another one for sensor 2, ...). 
> I have to find for each year and each station the best correlation. For
> example, which one of the 70 weather stations is the most well correlated
> with station 1 for the sensor 1? and with station 2? ... and so one for
> each sensor and each station.
> 
> Example:
> 
> Sensor 1 for the year 2009
> 
>                    Station 1 Station 2 Station 3 [...]
> Station 1         1       0.910         0.748
> Station 2     0.910        1                0.6 
> Station 3      0.748       0.6              1   
> [...]
> 
> And the same for year 2005,2006,2007,2008,2009,2010,2011 for each of the 4
> sensors.
> 
> Have you got any idea how can I do this on R? 
> Should I first merge all the sensors in one file or could I do it with
> data in separate files (like I have for the moment)?
> Thank you very much for all your answers!
> 


You don't need to merge all files, but you must do some preprocessing.
If you put all data of one year in a 3d array, then simply use 'cor'.

I've made up some fake data, in files named "station1_2009.dat", etc (only 6
stations),
each of them with the same number of observations. If you have 70 stations
per year, you'll
need an automated process to access them. Something like the function below
would solve
part of that problem.
What follows assumes that the n. obs. is the same in all files.

# This function gives file names with the pattern above
filenames <- function(y, n=70){
    tmp <- paste("station", seq_len(n), sep="")
    tmp <- paste(tmp, y, sep="_")
    paste(tmp, "dat", sep=".")
}


Sensors <- paste("sensor", 1:4, sep="")
Stations <- paste("station", 1:6, sep="")

nsensors <- length(Sensors)
nstations <- length(Stations)

year <- 2009
fnames <- filenames(year, nstations)

# If nobs is the same in all files, any one will do.
nobs <- nrow(read.table(fnames[1], header=TRUE))

yr2009 <- array(NA, dim=c(nobs, nsensors, nstations))
for(i in seq_len(nstations)){
    tmp <- read.table(fnames[i], header=TRUE)
    yr2009[ , , i] <- as.matrix(tmp[, Sensors])
}

dimnames(yr2009) <- list(seq.int(nobs), Sensors, Stations)

# correlations for sensor 1
cor(yr2009[ , 1, ])

# a list of correlations for the 4 sensors
cor2009 <- lapply(Sensors, function(s) cor(yr2009[ , s, ]))
names(cor2009) <- Sensors
cor2009$sensor1


Don't pay much attention to the files part, what's relevant is to create and
fill the array.

Hope this helps,

Rui Barradas


--
View this message in context: http://r.789695.n4.nabble.com/correlation-matrix-between-data-from-different-files-tp4552226p4555317.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list