[R] inconsistency with cor() - "x must be numeric"

Tue Dec 14 00:16:25 CET 2010

On Dec 13, 2010, at 23:23 , Justin Fincher wrote:

> I apologize for the lack of example.  I was trying not to be too long
> winded.  Below is the first portion of my function that is causing the
> error. (I'm including both calls to cor(), though it quits after the first
> throws an error).  I do not believe he has redefined cor() as he is a novice
> user and we tried this after starting a fresh session.  And I will look into
> upgrading.  I realize it is a little out of date since it is the version in
> the repository for my distribution and not the latest-and-greatest from R.
> I just didn't realize a change like that would be made that would
> (seemingly to me) reduce functionality. Thank you again for your help.

Well, let me put it this way: Once you realize what you are doing, you will appreciate that R is not letting you do that anymore...

> 
> - Fincher
> 
>   # As they don't change, hard code gene density values
>   gene_densities =
> data.frame(chrom=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7",
> 
> "chr8","chr9","chr10","chr11","chr12","chr13",
> 
> "chr14","chr15","chr16","chr17","chr18","chr19",
> 
> "chr20","chr21","chr22","chrX","chrY"),
> 
> avg_density=c(10.19,6.457,6.71,4.917,6.083,7.491,7.453,
>                                       5.939,7.27,7.132,11.38,9.429,3.757,
>                                       7.607,8.455,11.81,17.84,4.649,26.52,
>                                       11.19,6.51,11.28,7.535,2.931))
> 
>   acc_averages = c()
> 
>   # subset out relevant data
>   accessibility_data = subset(accessibility_data,
> accessibility_data$V9==";color=000000")
> 
>   # calculate mean accessibility value for each chromosome
>   for(i in seq(1,22)){
>      sub = paste("chr",i,sep="")
>      temp = subset(accessibility_data,accessibility_data$V1==sub)
>      acc_averages = rbind(acc_averages,c(sub,as.double(mean(temp$V6))))
>   }
>   temp = subset(accessibility_data,accessibility_data$V1=="chrX")
>   acc_averages = rbind(acc_averages,c("chrX",as.double(mean(temp$V6))))

This and the similar line 3 lines earlier is the culprit. The c() construct creates a character vector because its 1st argument is character. Hence, acc_averages is a character matrix. Now, are you _sure_ you know what happens if you correlate something with the character vector acc_averages[,2]? It may have given you the right thing for Pearson correlations, but it certainly did not for rank correlations pre 2.11.0, leading to a "non-bug report" and the subsequent check for numeric data. What happened then was that ranks were based on the _alphabetical_ ordering of data!

I'm fairly confident that you'd really want to do the whole thing with a suitable aggregate() call, but for now, how about just keeping the labels and the values in two separate vectors?

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com