[R] [ExternalEmail] Pearson Correlation Speed
Nathan S. Watson-Haigh
nathan.watson-haigh at csiro.au
Tue Dec 16 03:23:35 CET 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Charles C. Berry wrote:
> On Mon, 15 Dec 2008, Nathan S. Watson-Haigh wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Nathan S. Watson-Haigh wrote:
>>> I'm trying to calculate Pearson correlation coefficients for a large
>>> matrix of size 18563 x 18563. The following function takes about XX
>>> minutes to complete, and I'd like to do this calculation about 15 times
>>> and so speed is some what of an issue.
>
> I think you are on the wrong track, Nathan.
>
> The matrix you are starting with is 18563 x 18563 and the result of
> finding the correlations amongst the columns of that matrix is also 18563
> x 18563. It will require more than 5 Gigabytes of memory to store the
> result and the original matrix.
Yes the memory usage is somewhat large - luckily I have the use of a
cluster with lots of shared memory! However, I'm interested to learn how
you came about the calculation to determine the memory requirements.
>
> Likely the time needed to do the calc is inflated because of caching
> issues and if your machine has less than enough memory to store the
> result and all the intermediate pieces by swapping as well.
>
> You can finesse these by breaking your problem into smaller pieces, say
> computing the correlations between each pair of 19 blocks of columns
> (columns 1:977, 977+1:977, ... 18*977+1:977 ), then assembling the
> results.
This is possibly, however why is something like this not implemented
internally in the cor() function if it poorly scales due to the large
memory requirements?
>
> ---
>
> BTW, R already has the necessary machinery to calculate the crossproduct
> matrix (etc) needed to find the correlations. You can access the low level
> linear algebra that R uses. You can marry R to an optimized BLAS if you
> like.
>
> So pulling in some other code to do this will not save you anything. If
> you ever do decide to import C[++] code there is excellent documentation
> in the Writing R Extensions manual, which you should review before
> attempting to import C++ code into R.
Thanks, I have seen this and it seemed quite technical to use as a
starting point for someone unfamiliar with both C++ and incorporating
C++ code into R.
Cheers,
Nathan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAklHESYACgkQ9gTv6QYzVL68aQCgl0TsZL4CcnWFdlP073d7Vvui
5WAAoIcvGcunYzR+DM0Xv6R1TPmH4oA+
=5As1
-----END PGP SIGNATURE-----
More information about the R-help
mailing list