[R] inconsistency with cor() - "x must be numeric"

peter dalgaard pdalgd at gmail.com
Tue Dec 14 00:16:25 CET 2010

On Dec 13, 2010, at 23:23 , Justin Fincher wrote:

> I apologize for the lack of example.  I was trying not to be too long
> winded.  Below is the first portion of my function that is causing the
> error. (I'm including both calls to cor(), though it quits after the first
> throws an error).  I do not believe he has redefined cor() as he is a novice
> user and we tried this after starting a fresh session.  And I will look into
> upgrading.  I realize it is a little out of date since it is the version in
> the repository for my distribution and not the latest-and-greatest from R.
> I just didn't realize a change like that would be made that would
> (seemingly to me) reduce functionality. Thank you again for your help.

Well, let me put it this way: Once you realize what you are doing, you will appreciate that R is not letting you do that anymore...

> - Fincher
>   # As they don't change, hard code gene density values
>   gene_densities =
> data.frame(chrom=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7",
> "chr8","chr9","chr10","chr11","chr12","chr13",
> "chr14","chr15","chr16","chr17","chr18","chr19",
> "chr20","chr21","chr22","chrX","chrY"),
> avg_density=c(10.19,6.457,6.71,4.917,6.083,7.491,7.453,
>                                       5.939,7.27,7.132,11.38,9.429,3.757,
>                                       7.607,8.455,11.81,17.84,4.649,26.52,
>                                       11.19,6.51,11.28,7.535,2.931))
>   acc_averages = c()
>   # subset out relevant data
>   accessibility_data = subset(accessibility_data,
> accessibility_data$V9==";color=000000")
>   # calculate mean accessibility value for each chromosome
>   for(i in seq(1,22)){
>      sub = paste("chr",i,sep="")
>      temp = subset(accessibility_data,accessibility_data$V1==sub)
>      acc_averages = rbind(acc_averages,c(sub,as.double(mean(temp$V6))))
>   }
>   temp = subset(accessibility_data,accessibility_data$V1=="chrX")
>   acc_averages = rbind(acc_averages,c("chrX",as.double(mean(temp$V6))))

This and the similar line 3 lines earlier is the culprit. The c() construct creates a character vector because its 1st argument is character. Hence, acc_averages is a character matrix. Now, are you _sure_ you know what happens if you correlate something with the character vector acc_averages[,2]? It may have given you the right thing for Pearson correlations, but it certainly did not for rank correlations pre 2.11.0, leading to a "non-bug report" and the subsequent check for numeric data. What happened then was that ranks were based on the _alphabetical_ ordering of data!

I'm fairly confident that you'd really want to do the whole thing with a suitable aggregate() call, but for now, how about just keeping the labels and the values in two separate vectors?

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

