[R] Get means of matrix

Dennis Murphy djmuser at gmail.com
Wed Nov 18 22:37:17 CET 2015


Hi:

Here's another way to look at the problem. Instead of manually adding
a new column after k datasets have been read in, read your individual
data files into a list, as long as they all have the same variable
names and the same class (in this case, data.frame). Then create a
vector of names for the list components and use 'apply family' logic
to get the column means, returning the combined results to a data
frame or matrix. Here's a toy example to illustrate the point.
Firstly, three data frames are created and saved to external files:

# Create some artificial data and ship to external files

d1 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25))
d2 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25))
d3 <- data.frame(x1 = rpois(10, 20), x2 = rpois(10, 23), x3 = rpois(10, 25))

write.csv(d1, file = "d1.csv", row.names = TRUE, quote = FALSE)
write.csv(d2, file = "d2.csv", row.names = TRUE, quote = FALSE)
write.csv(d3, file = "d3.csv", row.names = TRUE, quote = FALSE)

###
# Now, read them back in and store them in a list object

# Vector of file names to process
files <- paste0("d", 1:3, ".csv")

# Create the list of data frames and assign names to list components
L <- lapply(files, function(x) read.csv(x, header = TRUE))
names(L) <- paste0("d", 1:3)

# Compute column means from each list component and row bind them
# Method 1: base R
do.call(rbind, lapply(L, colMeans))


# Method 2: plyr package
library(plyr)
ldply(L, colMeans)


Dennis

On Wed, Nov 18, 2015 at 2:19 AM, Jesús Para Fernández
<j.para.fernandez at hotmail.com> wrote:
> Hi everyone
>
> I have a dataframe "data" wich is the result of join multiple csv (400 rows and 600cols every csv). The "data" dataframe has n rows and m columns (200000 rows and 600 cols) , and I have add a new colum, "csvdata", in which I specify the number of csv at wich those data belong.
>
> So, the dataframe "data" looks like:
>
> x1    x2     x3    ....    xn    csvdata
> 21   23    32    ....    12    1
> 27   21    39    ....    14    1
> 24   22    30    ....    11    1
> ..............................................
> 21   24    32    ....   19     2
> 27   21    39    ....    14    2
> ..............................................
> 27   22     30    ....    11    n
>
>
>
> I want to store into a matrix the mean values of different substes of data of every csv, for example:
>
> region1,1 (rows 1:20,columns 1:20) for every "csvdata" value
> region 2,1 (rows 21:40,columns 1:20) para every "csvdata" value
> ....
>
> And so on for hole data.frame.
>
> I have tryed:
>
> area1<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T)
> area2<-tapply(as.matrix(data[1:20,1]),datos$csvdata,mean,na.rm=T)
>
> But this error is the output I obtain:
>
> Error in tapply(data[1:30, ], datos$nueva, mean, na.rm = T) :
>   arguments must have same length
>
> I´m sure that it is not very complex to do it, but I have no idea of how to do it.
>
> Thanks for all.
>
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list