[R] mean calculation

arun smartpink111 at yahoo.com
Mon Jan 26 14:50:56 CET 2015


Hi Juvin,

 The error "dim(X) must have a positive length" usually shows when you are passing a vector to "apply", ie.

    apply(1:5,2,mean)
    #Error in apply(1:5, 2, mean) : dim(X) must have a positive length



   Also, if your dataset originally has "1206" columns, it is not clear why you needed the below code.  ("rainfall" is already a "data.frame")


      precip=data.frame(rainfall[1:1206]) 



Based on the data provided,

    rainfall <-  read.table(text="1    2    3    4    5    6    7    8    9    10    11 
NA    0    0    0    0    12    0    0    0    0    0 
NA    0    0    0    0    0    0    0    0    0    0 
NA    0    0    0    0    14    0    0    0    0    5 
NA    0    0    0    0    0    0    0    0    0    0 
NA    0    0    27    0    0    0    0    20    0    165 
NA    0    88    38    0    0    0    0    0    0    26 
NA    12    12    0    0    0    0    0    0    0    2 
NA    2    0    0    0    0    0    0    0    0    0 
NA    2    0    0    0    0    0    0    0    0    0 
NA    0    24    1    0    0    0    0    3    0    62 
NA    26    0    0    0    0    0    0    0    0    33",sep="", header=TRUE, check.names=FALSE) 



    apply(rainfall, 2, function(x) c(mean=mean(x, na.rm=TRUE), 

                   median=median(x, na.rm=TRUE), max=max(x, na.rm=TRUE)))

#1         2        3  4 5         6 7 8         9 10        11
#mean    NaN  3.818182 11.27273  6 0  2.363636 0 0  2.090909  0  26.63636
#median   NA  0.000000  0.00000  0 0  0.000000 0 0  0.000000  0   2.00000
#max    -Inf 26.000000 88.00000 38 0 14.000000 0 0 20.000000  0 165.00000



Or using `colMaxs`, `colMedians` from `matrixStats`

    library(matrixStats)
    rbind(mean=colMeans(rainfall, na.rm=TRUE), median= colMedians(as.matrix(rainfall),
          na.rm=TRUE), max=colMaxs(rainfall, na.rm=TRUE))

Another option would be to use `summarise_each` from `dplyr`

    library(dplyr)
    rainfall %>%
             summarise_each(funs(mean=mean(., na.rm=TRUE), median=median(., na.rm=TRUE), 

                               max=max(., na.rm=TRUE)))

A.K.


I tried to calculate a mean from a csv table by forming a data frame, 
but it says dim(x)must have a positive length. The table has 1206 column and 31 rows. I want to calculate mean, median, and maximum from the the table. The table has some NA values which i dont want to include. The 
table looks as follows: 
1    2    3    4    5    6    7    8    9    10    11 
NA    0    0    0    0    12    0    0    0    0    0 
NA    0    0    0    0    0    0    0    0    0    0 
NA    0    0    0    0    14    0    0    0    0    5 
NA    0    0    0    0    0    0    0    0    0    0 
NA    0    0    27    0    0    0    0    20    0    165 
NA    0    88    38    0    0    0    0    0    0    26 
NA    12    12    0    0    0    0    0    0    0    2 
NA    2    0    0    0    0    0    0    0    0    0 
NA    2    0    0    0    0    0    0    0    0    0 
NA    0    24    1    0    0    0    0    3    0    62 
NA    26    0    0    0    0    0    0    0    0    33 

I used following code to calculate mean: 
Any help would be appreciated. 
rainfall=read.table('bmark.csv',header=T,sep=',') 
precip=data.frame(rainfall[1:1206]) 
monthlyMean=apply(precip, MARGIN=2,FUN=mean,na.rm=TRUE) 

Juvin



More information about the R-help mailing list