[R] colnames

Gilbert Wu gilbert.wu at sabrefund.com
Tue Jul 19 16:49:05 CEST 2005


Hi Adai,

Thank you very much for your suggestions. Your optimized function would come in very handy cause I will need to generate a matrix of size around 2250 * 1000.

Regards,

Gilbert


-----Original Message-----
From: Adaikalavan Ramasamy [mailto:ramasamy at cancer.org.uk]
Sent: 19 July 2005 12:20
To: Gilbert Wu
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] colnames


First, your problem could be boiled down to the following example. See
how the colnames of the two outputs vary.

df <- cbind.data.frame( "100"=1:2, "200"=3:4 )
df/df
  X100 X200
1    1    1
2    1    1

m  <- as.matrix( df )   # coerce to matrix class
m/m
  100 200
1   1   1
2   1   1

It appears that whenever R has to create a new dataframe automatically,
it tries to get nice colnames. See help(data.frame). I am not exactly
sure why this behaviour is different when creating a matrix. But I do
not think this is a major problem for most people. If you coerce your
input to matrix, the problem goes away.


Next, note the following points :
 a) "mat[ 1:3, 1:ncol(mat) ]" is equivalent to simply "mat[ 1:3,  ]". 
 b) "mat[ 2:nrow(mat), ]" is equivalent to simply "mat[ -1,  ]"
See help(subset) for more information.

Using the points above, we can simplify your function as 

 p.RIs2Returns <- function (mat){

   mat <- as.matrix(mat)
   x <- mat[ -nrow(mat), ]
   y <- mat[ -1, ]
  
   return( y/x -1 )
 }

If your data contains only numerical data, it is probably good idea to
work with matrices as matrix operations are faster.


Finally, we can shorten your function. You can use the diff (which works
column-wise if input is a matrix) and apply function if you know that 

	y/x  =  exp(log(y/x))  =  exp( log(y) - log(x) )

which could be coded in R as

	exp( diff( log(r1) ) )

and then subtract 1 from above to get your returns.

Regards, Adai



On Tue, 2005-07-19 at 09:17 +0100, Gilbert Wu wrote:
> Hi Adai,
> 
> Many Thanks for the examples.
> 
> I work for a financial institution. We are exploring R as a tool to implement our portfolio optimization strategies. Hence, R is still a new language to us.
> 
> The script I wrote tried to make a returns matrix from the daily return indices extracted from a SQL database. Please find below the output that produces the 'X' prefix in the colnames. The reason to preserve the column names is that they are stock identifiers which are to be used by other sub systems rather than R.
> 
> I would welcome any suggestion to improve the script.
> 
> 
> Regards,
> 
> Gilbert
> 
> > "p.RIs2Returns" <-
> + function (RIm)
> + {
> + x<-RIm[1:(nrow(RIm)-1), 1:ncol(RIm)]
> + y<-RIm[2:nrow(RIm), 1:ncol(RIm)]
> + RReturns <- (y/x -1)
> + RReturns
> + }
> > 
> > 
> > channel<-odbcConnect("ourSQLDB")
> > result<-sqlQuery(channel,paste("select * from equityRIs;"))
> > odbcClose(channel)
> > result
>    stockid    sdate  dbPrice
> 1   899188 20050713  7.59500
> 2   899188 20050714  7.60500
> 3   899188 20050715  7.48000
> 4   899188 20050718  7.41500
> 5   902232 20050713 10.97000
> 6   902232 20050714 10.94000
> 7   902232 20050715 10.99000
> 8   902232 20050718 11.05000
> 9   901714 20050713 17.96999
> 10  901714 20050714 18.00999
> 11  901714 20050715 17.64999
> 12  901714 20050718 17.64000
> 13  28176U 20050713  5.19250
> 14  28176U 20050714  5.25000
> 15  28176U 20050715  5.25000
> 16  28176U 20050718  5.22500
> 17  15322M 20050713 11.44000
> 18  15322M 20050714 11.50000
> 19  15322M 20050715 11.33000
> 20  15322M 20050718 11.27000
> > r1<-reshape(result, timevar="stockid", idvar="sdate", direction="wide")
> > r1
>      sdate dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U dbPrice.15322M
> 1 20050713          7.595          10.97       17.96999         5.1925          11.44
> 2 20050714          7.605          10.94       18.00999         5.2500          11.50
> 3 20050715          7.480          10.99       17.64999         5.2500          11.33
> 4 20050718          7.415          11.05       17.64000         5.2250          11.27
> > #Set sdate as the rownames
> > rownames(r1) <-as.character(r1[1:nrow(r1),1:1])
> > #Get rid of the first column
> > r1 <- r1[1:nrow(r1),2:ncol(r1)]
> > r1
>          dbPrice.899188 dbPrice.902232 dbPrice.901714 dbPrice.28176U dbPrice.15322M
> 20050713          7.595          10.97       17.96999         5.1925          11.44
> 20050714          7.605          10.94       18.00999         5.2500          11.50
> 20050715          7.480          10.99       17.64999         5.2500          11.33
> 20050718          7.415          11.05       17.64000         5.2250          11.27
> > colnames(r1) <- as.character(sub("[[:alnum:]]*\\.","", colnames(r1)))
> > r1
>          899188 902232   901714 28176U 15322M
> 20050713  7.595  10.97 17.96999 5.1925  11.44
> 20050714  7.605  10.94 18.00999 5.2500  11.50
> 20050715  7.480  10.99 17.64999 5.2500  11.33
> 20050718  7.415  11.05 17.64000 5.2250  11.27
> > RRs<-p.RIs2Returns(r1)
> > RRs
>               X899188      X902232      X901714      X28176U      X15322M
> 20050714  0.001316656 -0.002734731  0.002225933  0.011073664  0.005244755
> 20050715 -0.016436555  0.004570384 -0.019988906  0.000000000 -0.014782609
> 20050718 -0.008689840  0.005459509 -0.000566006 -0.004761905 -0.005295675
> > 
>




More information about the R-help mailing list