[R] Correlation coefficient of large data sets
jwiley.psych at gmail.com
Tue Mar 16 06:32:22 CET 2010
The command to correlate two variables and a set is the same (see
?cor). How have you read the data in? If it is a matrix or data
frame, you should be able to just use cor(name_of_your_matrix) and it
will return the correlation matrix for all variables in your matrix or
If you read each of your 230,000 variables in separately, you can
combine them into a matrix or dataframe using cbind(variablename1, 2,
On Mon, Mar 15, 2010 at 10:12 PM, Vincent Davis
<vincent at vincentdavis.net> wrote:
> So I am very new to R. Have been using python for a project and need to
> calculate the correlation coefficient matrix for my data set. the data is in
> the range of 10-15 observations of 230,000 variables. ie the correlation
> matrix would be 230,000X230,000 Using python and the numpy.corrcoef() I run
> out of memory if I try to do this with more than ~30,000 variables.
> I was able to load the data into R, remember I am newbe so this is big :)
> I could find commands that would calculate the correlation between 2
> variables but not for a set of variables. How do I do this?
> Am I going to be able to do this with R, I have the 64 bit version installed
> and have access to an 8 core machine with 48GB of memory.
> *Vincent Davis
> 720-301-3003 *
> vincent at vincentdavis.net
> my blog <http://vincentdavis.net> |
> [[alternative HTML version deleted]]
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Senior in Psychology
University of California, Riverside
More information about the R-help