[R] Summary statistics for matrix columns

William Dunlap wdunlap at tibco.com
Sat Nov 24 18:13:49 CET 2012


> isn't range mean the different between the max and min

That is one meaning of "range".  There are many.  To see what R's definition is type
   ? range
or
   help("range")

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of frespider
> Sent: Saturday, November 24, 2012 4:58 AM
> To: r-help at r-project.org
> Subject: Re: [R] Summary statistics for matrix columns
> 
> 
> 
> HI A.k,
> 
> I need one more question, if you can answer it please
> 
> M <- matrix(sample(1:8000),nrow=100)
> colnames(M)<- paste("Col",1:ncol(M),sep="")
> apply(M,2,function(x) c(Min=min(x),"1st Qu" =quantile(x, 0.25,names=FALSE),
>                         Range = range(x),
>                         Median = quantile(x, 0.5, names=FALSE),
>                         Mean= mean(x),Std=sd(x),
>                         "3rd Qu" = quantile(x,0.75,names=FALSE),
>                         IQR=IQR(x),Max = max(x)))
> 
> why I get two range . isn't range mean the different between the max and min
> 
> 
> Thanks
> Date: Fri, 23 Nov 2012 16:08:12 -0800
> From: ml-node+s789695n4650613h54 at n4.nabble.com
> To: frespider at hotmail.com
> Subject: Re: Summary statistics for matrix columns
> 
> 
> 
> 	Hi,
> 
> No problem.
> 
> 
> There are a couple of other libraries which deal with summary statistics:
> 
> library(pastecs)
> 
> ?stat.desc() #
> 
> 
> library(matrixStats)
> 
> #Using the functions from package: matrixStats
> 
> fun1<-function(x){
> 
> res<-
> rbind(colMins(x),colQuantiles(x)[,2],colMedians(x),colMeans(x),colSds(x),colQuantiles(x)[
> ,4],colIQRs(x),colMaxs(x))
> 
> row.names(res)<-c("Min.","1st Qu.","Median","Mean","sd","3rd Qu.","IQR","Max.")
> 
> res}
> 
> 
> set.seed(125)
> 
> x <- matrix(sample(1:80),nrow=8)
> 
> colnames(x)<- paste("Col",1:ncol(x),sep="")
> 
> fun1(x)
> 
> #            Col1     Col2     Col3     Col4     Col5     Col6     Col7     Col8
> 
> #Min.    10.00000  1.00000 17.00000  3.00000 18.00000 11.00000 13.00000 15.00000
> 
> #1st Qu. 24.75000 29.50000 26.00000  7.75000 40.00000 17.25000 27.50000 34.75000
> 
> #Median  34.00000 46.00000 42.50000 35.50000 49.50000 23.50000 51.50000 51.50000
> 
> #Mean    42.50000 42.75000 41.75000 35.75000 44.87500 26.87500 44.75000 50.12500
> 
> #sd      25.05993 27.77846 19.57221 28.40397 16.39196 16.60841 21.97239 25.51995
> 
> #3rd Qu. 67.75000 58.50000 50.00000 63.25000 54.25000 30.25000 56.25000 70.50000
> 
> #IQR     43.00000 29.00000 24.00000 55.50000 14.25000 13.00000 28.75000 35.75000
> 
> #Max.    74.00000 77.00000 76.00000 70.00000 65.00000 63.00000 79.00000 80.00000
> 
>  #           Col9    Col10
> 
> #Min.     2.00000  6.00000
> 
> #1st Qu. 24.50000 12.50000
> 
> #Median  33.50000 48.00000
> 
> #Mean    34.87500 40.75000
> 
> #sd      24.39811 28.21727
> 
> #3rd Qu. 45.25000 63.00000
> 
> #IQR     20.75000 50.50000
> 
> #Max.    71.00000 72.00000
> 
> 
> I thought this could be faster than the previous methods.  But, it was the slowest.
> 
> 
> set.seed(125)
> 
> x1 <- matrix(sample(1:800000),nrow=1000)
> 
> colnames(x)<- paste("Col",1:ncol(x1),sep="")
> 
> 
> system.time(fun1(x1))
> 
> #   user  system elapsed
> 
>  # 0.968   0.000   0.956
> 
> A.K.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ________________________________
> 
> From: Fares Said <[hidden email]>
> 
> To: arun <[hidden email]>
> 
> Cc: Pete Brecknock <[hidden email]>; R help <[hidden email]>
> 
> Sent: Friday, November 23, 2012 10:23 AM
> 
> Subject: Re: [R] Summary statistics for matrix columns
> 
> 
> Thank you all
> 
> 
> Sent from my iPhone
> 
> 
> On 2012-11-23, at 10:19, "arun" <[hidden email]> wrote:
> 
> 
> > HI,
> 
> > You are right.
> 
> > It is slower when compared to Pete's solution:
> 
> > set.seed(125)
> 
> > x <- matrix(sample(1:800000),nrow=1000)
> 
> > colnames(x)<- paste("Col",1:ncol(x),sep="")
> 
> >
> 
> > system.time({
> 
> > res<-sapply(data.frame(x),function(x) c(summary(x),sd=sd(x),IQR=IQR(x)))
> 
> >  res1<-as.matrix(res)
> 
> > res2<-res1[c(1:4,7,5,8,6),] })
> 
> > # user  system elapsed
> 
> > #  0.596   0.000   0.597
> 
> >
> 
> > system.time({
> 
> > res<-apply(x,2,function(x) c(Min=min(x),
> 
> >                         "1st Qu" =quantile(x, 0.25,names=FALSE),
> 
> >                         Median = quantile(x, 0.5, names=FALSE),
> 
> >                         Mean= mean(x),
> 
> >                         Sd=sd(x),
> 
> >                         "3rd Qu" = quantile(x,0.75,names=FALSE),
> 
> >                         IQR=IQR(x),
> 
> >                         Max = max(x))) })
> 
> > # user  system elapsed
> 
> >  # 0.384   0.000   0.384
> 
> >
> 
> >
> 
> > A.K.
> 
> >
> 
> >
> 
> >
> 
> > ----- Original Message -----
> 
> > From: Pete Brecknock <[hidden email]>
> 
> > To: [hidden email]
> 
> > Cc:
> 
> > Sent: Friday, November 23, 2012 8:42 AM
> 
> > Subject: Re: [R] Summary statistics for matrix columns
> 
> >
> 
> > frespider wrote
> 
> >> Hi,
> 
> >>
> 
> >> it is possible. but don't you think it will slow the code if you convert
> 
> >> to data.frame?
> 
> >>
> 
> >> Thanks
> 
> >>
> 
> >> Date: Thu, 22 Nov 2012 18:31:35 -0800
> 
> >> From:
> 
> >
> 
> >> ml-node+s789695n4650500h51 at .nabble
> 
> >
> 
> >> To:
> 
> >
> 
> >> frespider@
> 
> >
> 
> >> Subject: RE: Summary statistics for matrix columns
> 
> >>
> 
> >>
> 
> >>
> 
> >>     HI,
> 
> >>
> 
> >> Is it possible to use as.matrix()?
> 
> >>
> 
> >> res<-sapply(data.frame(x),function(x) c(summary(x),sd=sd(x),IQR=IQR(x)))
> 
> >>
> 
> >>   res1<-as.matrix(res)
> 
> >>
> 
> >>   is.matrix(res1)
> 
> >>
> 
> >> #[1] TRUE
> 
> >>
> 
> >> res1[c(1:4,7,5,8,6),]
> 
> >>
> 
> >> #            Col1     Col2     Col3     Col4     Col5     Col6     Col7
> 
> >> Col8
> 
> >>
> 
> >> #Min.    10.00000  1.00000 17.00000  3.00000 18.00000 11.00000 13.00000
> 
> >> 15.00000
> 
> >>
> 
> >> #1st Qu. 24.75000 29.50000 26.00000  7.75000 40.00000 17.25000 27.50000
> 
> >> 34.75000
> 
> >>
> 
> >> #Median  34.00000 46.00000 42.50000 35.50000 49.50000 23.50000 51.50000
> 
> >> 51.50000
> 
> >>
> 
> >> #Mean    42.50000 42.75000 41.75000 35.75000 44.88000 26.88000 44.75000
> 
> >> 50.12000
> 
> >>
> 
> >> #sd      25.05993 27.77846 19.57221 28.40397 16.39196 16.60841 21.97239
> 
> >> 25.51995
> 
> >>
> 
> >> #3rd Qu. 67.75000 58.50000 50.00000 63.25000 54.25000 30.25000 56.25000
> 
> >> 70.50000
> 
> >>
> 
> >> #IQR     43.00000 29.00000 24.00000 55.50000 14.25000 13.00000 28.75000
> 
> >> 35.75000
> 
> >>
> 
> >> #Max.    74.00000 77.00000 76.00000 70.00000 65.00000 63.00000 79.00000
> 
> >> 80.00000
> 
> >>
> 
> >>    #          Col9    Col10
> 
> >>
> 
> >> #Min.     2.00000  6.00000
> 
> >>
> 
> >> #1st Qu. 24.50000 12.50000
> 
> >>
> 
> >> #Median  33.50000 48.00000
> 
> >>
> 
> >> #Mean    34.88000 40.75000
> 
> >>
> 
> >> #sd      24.39811 28.21727
> 
> >>
> 
> >> #3rd Qu. 45.25000 63.00000
> 
> >>
> 
> >> #IQR     20.75000 50.50000
> 
> >>
> 
> >> #Max.    71.00000 72.00000
> 
> >>
> 
> [[elided Hotmail spam]]
> 
> >>
> 
> >> A.K.
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>
> 
> >>         If you reply to this email, your message will be added to the discussion
> 
> >> below:
> 
> >>
> 
> >> http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-
> tp4650489p4650500.html
> >>
> 
> >>
> 
> >>
> 
> >>         To unsubscribe from Summary statistics for matrix columns, click here.
> 
> >>
> 
> >>         NAML
> 
> >
> 
> > Then maybe ....
> 
> >
> 
> > x <- matrix(sample(1:8000),nrow=100)
> 
> > colnames(x)<- paste("Col",1:ncol(x),sep="")
> 
> >
> 
> > apply(x,2,function(x) c(Min=min(x),
> 
> >                         "1st Qu" =quantile(x, 0.25,names=FALSE),
> 
> >                         Median = quantile(x, 0.5, names=FALSE),
> 
> >                         Mean= mean(x),
> 
> >                         Sd=sd(x),
> 
> >                         "3rd Qu" = quantile(x,0.75,names=FALSE),
> 
> >                         IQR=IQR(x),
> 
> >                         Max = max(x)))
> 
> >
> 
> > HTH
> 
> >
> 
> > Pete
> 
> >
> 
> >
> 
> >
> 
> > --
> 
> > View this message in context: http://r.789695.n4.nabble.com/Summary-statistics-for-
> matrix-columns-tp4650489p4650547.html
> > Sent from the R help mailing list archive at Nabble.com.
> 
> >
> 
> > ______________________________________________
> 
> > [hidden email] mailing list
> 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> >
> 
> ______________________________________________
> 
> [hidden email] mailing list
> 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 		If you reply to this email, your message will be added to the discussion
> below:
> 		http://r.789695.n4.nabble.com/Summary-statistics-for-matrix-columns-
> tp4650489p4650613.html
> 
> 
> 
> 		To unsubscribe from Summary statistics for matrix columns, click here.
> 
> 		NAML
> 
> 
> 
> 
> --
> View this message in context: http://r.789695.n4.nabble.com/Summary-statistics-for-
> matrix-columns-tp4650489p4650643.html
> Sent from the R help mailing list archive at Nabble.com.
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list