[R-SIG-Finance] Mode list to mode numerical.... fast..

Joshua Ulrich josh.m.ulrich at gmail.com
Thu May 1 16:04:02 CEST 2014


On Thu, May 1, 2014 at 8:47 AM, Steve Greiner <sgreiner at factset.com> wrote:
> Okay, I've had it!!!..   Every time I read in a dataset using something like:
> returnmatrix = read.csv("S&P.csv", header=TRUE, sep=",")
>
> It comes back with "returnmatrix" as mode list.   How can I quickly convert the dataset to mode numerical?   This is pissing me off.  I can do it manually by creating a new matrix and assigning values of the list matrix to the values of the numerical matrix element by element, but it's time consuming.  What can anybody recommend me?

?read.csv says it returns a data.frame (which is a list with some
specific attributes).  If you want to convert it to a matrix, just
use:
returnmatrix = as.matrix(read.csv("S&P.csv", header=TRUE, sep=","))

You don't say exactly what data "S&P.csv" contains... but if it's a
large matrix, then you can get some fairly substantial performance
improvement by following the advice in the "Memory usage" section of
?read.csv, which says:

'read.table' is not the right tool for reading large matrices,
especially those with many columns: it is designed to read _data
frames_ which may have columns of very different classes.  Use
'scan' instead for matrices.

So you could try something like:

column_names = scan("S&P.csv", n=1, sep=",", what="")
returnmatrix = matrix(scan("S&P.csv", skip=1, sep=","),
ncol=length(column_names), dimnames=list(NULL, column_names))

You might need to specify byrow=TRUE in the above matrix() call... I
can't remember off the top of my head.

> Steve
>
> Steven P. Greiner, Ph.D.
> Director of Portfolio Risk
> FactSet Research Systems, Inc.
> sgreiner at factset.com
>

Best,
--
Joshua Ulrich  |  about.me/joshuaulrich
FOSS Trading  |  www.fosstrading.com



More information about the R-SIG-Finance mailing list