[R] Summary statistics for matrix columns

arun smartpink111 at yahoo.com
Fri Nov 23 21:38:42 CET 2012


Hi,
No problem.

There are a couple of other libraries which deal with summary statistics:
library(pastecs)
?stat.desc() # 

library(matrixStats) 
#Using the functions from package: matrixStats
fun1<-function(x){
res<-rbind(colMins(x),colQuantiles(x)[,2],colMedians(x),colMeans(x),colSds(x),colQuantiles(x)[,4],colIQRs(x),colMaxs(x))
row.names(res)<-c("Min.","1st Qu.","Median","Mean","sd","3rd Qu.","IQR","Max.")
res}

set.seed(125)
x <- matrix(sample(1:80),nrow=8)
colnames(x)<- paste("Col",1:ncol(x),sep="")  
fun1(x)
#            Col1     Col2     Col3     Col4     Col5     Col6     Col7     Col8
#Min.    10.00000  1.00000 17.00000  3.00000 18.00000 11.00000 13.00000 15.00000
#1st Qu. 24.75000 29.50000 26.00000  7.75000 40.00000 17.25000 27.50000 34.75000
#Median  34.00000 46.00000 42.50000 35.50000 49.50000 23.50000 51.50000 51.50000
#Mean    42.50000 42.75000 41.75000 35.75000 44.87500 26.87500 44.75000 50.12500
#sd      25.05993 27.77846 19.57221 28.40397 16.39196 16.60841 21.97239 25.51995
#3rd Qu. 67.75000 58.50000 50.00000 63.25000 54.25000 30.25000 56.25000 70.50000
#IQR     43.00000 29.00000 24.00000 55.50000 14.25000 13.00000 28.75000 35.75000
#Max.    74.00000 77.00000 76.00000 70.00000 65.00000 63.00000 79.00000 80.00000
 #           Col9    Col10
#Min.     2.00000  6.00000
#1st Qu. 24.50000 12.50000
#Median  33.50000 48.00000
#Mean    34.87500 40.75000
#sd      24.39811 28.21727
#3rd Qu. 45.25000 63.00000
#IQR     20.75000 50.50000
#Max.    71.00000 72.00000

I thought this could be faster than the previous methods.  But, it was the slowest.

set.seed(125)
x1 <- matrix(sample(1:800000),nrow=1000)
colnames(x)<- paste("Col",1:ncol(x1),sep="")

system.time(fun1(x1))
#   user  system elapsed 
 # 0.968   0.000   0.956 
A.K.








________________________________
From: Fares Said <frespider at hotmail.com>
To: arun <smartpink111 at yahoo.com> 
Cc: Pete Brecknock <Peter.Brecknock at bp.com>; R help <r-help at r-project.org> 
Sent: Friday, November 23, 2012 10:23 AM
Subject: Re: [R] Summary statistics for matrix columns

Thank you all 

Sent from my iPhone

On 2012-11-23, at 10:19, "arun" <smartpink111 at yahoo.com> wrote:

> HI,
> You are right.
> It is slower when compared to Pete's solution:
> set.seed(125)
> x <- matrix(sample(1:800000),nrow=1000)
> colnames(x)<- paste("Col",1:ncol(x),sep="")
> 
> system.time({
> res<-sapply(data.frame(x),function(x) c(summary(x),sd=sd(x),IQR=IQR(x)))
>  res1<-as.matrix(res) 
> res2<-res1[c(1:4,7,5,8,6),] })
> # user  system elapsed 
> #  0.596   0.000   0.597 
> 
> system.time({
> res<-apply(x,2,function(x) c(Min=min(x),
>                         "1st Qu" =quantile(x, 0.25,names=FALSE),
>                         Median = quantile(x, 0.5, names=FALSE),
>                         Mean= mean(x),
>                         Sd=sd(x),
>                         "3rd Qu" = quantile(x,0.75,names=FALSE),
>                         IQR=IQR(x),
>                         Max = max(x))) })
> # user  system elapsed 
>  # 0.384   0.000   0.384 
> 
> 
> A.K.
> 
> 
> 
> ----- Original Message -----
> From: Pete Brecknock <Peter.Brecknock at bp.com>
> To: r-help at r-project.org
> Cc: 
> Sent: Friday, November 23, 2012 8:42 AM
> Subject: Re: [R] Summary statistics for matrix columns
> 
> frespider wrote
>> Hi,
>> 
>> it is possible. but don't you think it will slow the code if you convert
>> to data.frame?
>> 
>> Thanks 
>> 
>> Date: Thu, 22 Nov 2012 18:31:35 -0800
>> From:
> 
>> ml-node+s789695n4650500h51 at .nabble
> 
>> To:
> 
>> frespider@
> 
>> Subject: RE: Summary statistics for matrix columns
>> 
>> 
>> 
>>     HI,
>> 
>> Is it possible to use as.matrix()?
>> 
>> res<-sapply(data.frame(x),function(x) c(summary(x),sd=sd(x),IQR=IQR(x)))
>> 
>>   res1<-as.matrix(res)
>> 
>>   is.matrix(res1)
>> 
>> #[1] TRUE
>> 
>> res1[c(1:4,7,5,8,6),]
>> 
>> #            Col1     Col2     Col3     Col4     Col5     Col6     Col7    
>> Col8
>> 
>> #Min.    10.00000  1.00000 17.00000  3.00000 18.00000 11.00000 13.00000
>> 15.00000
>> 
>> #1st Qu. 24.75000 29.50000 26.00000  7.75000 40.00000 17.25000 27.50000
>> 34.75000
>> 
>> #Median  34.00000 46.00000 42.50000 35.50000 49.50000 23.50000 51.50000
>> 51.50000
>> 
>> #Mean    42.50000 42.75000 41.75000 35.75000 44.88000 26.88000 44.75000
>> 50.12000
>> 
>> #sd      25.05993 27.77846 19.57221 28.40397 16.39196 16.60841 21.97239
>> 25.51995
>> 
>> #3rd Qu. 67.75000 58.50000 50.00000 63.25000 54.25000 30.25000 56.25000
>> 70.50000
>> 
>> #IQR     43.00000 29.00000 24.00000 55.50000 14.25000 13.00000 28.75000
>> 35.75000
>> 
>> #Max.    74.00000 77.00000 76.00000 70.00000 65.00000 63.00000 79.00000
>> 80.00000
>> 
>>    #          Col9    Col10
>> 
>> #Min.     2.00000  6.00000
>> 
>> #1st Qu. 24.50000 12.50000
>> 
>> #Median  33.50000 48.00000
>> 
>> #Mean    34.88000 40.75000
>> 
>> #sd      24.39811 28.21727
>> 
>> #3rd Qu. 45.25000 63.00000
>> 
>> #IQR     20.75000 50.50000
>> 
>> #Max.    71.00000 72.00000
>> 
>> Solves the order and the matrix output!
>> 
>> A.K.
>> 
>> 
>> 
>> 
>> 
>>    
>>    
>>    
>>    
>> 
>>    
>> 
>>    
>>    
>>         If you reply to this email, your message will be added to the discussion
>> below:
>>    
>> http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-tp4650489p4650500.html
>>    
>>    
>>        
>>         To unsubscribe from Summary statistics for matrix columns, click here.
>> 
>>         NAML
> 
> Then maybe ....
> 
> x <- matrix(sample(1:8000),nrow=100) 
> colnames(x)<- paste("Col",1:ncol(x),sep="") 
> 
> apply(x,2,function(x) c(Min=min(x), 
>                         "1st Qu" =quantile(x, 0.25,names=FALSE), 
>                         Median = quantile(x, 0.5, names=FALSE),
>                         Mean= mean(x),
>                         Sd=sd(x), 
>                         "3rd Qu" = quantile(x,0.75,names=FALSE),
>                         IQR=IQR(x),
>                         Max = max(x)))
> 
> HTH
> 
> Pete
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-tp4650489p4650547.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list