[R] inconsistency with cor() - "x must be numeric"
Joshua Wiley
jwiley.psych at gmail.com
Mon Dec 13 23:48:04 CET 2010
Hi,
I can certainly understand not wanting to be long winded, and no
damage done. Here's a link to the R news file:
http://cran.stat.ucla.edu/src/base/NEWS and if you search in your
browser for "cor() and cov()" you should find what happened.
At any rate, I could not fully check your code because: object
'accessibility_data' not found, but my guess would be that you created
a matrix (if inadvertently), and at least one of the columns had some
character data in it, which would push *all* the data to character
class (even though a particular column may be numeric data it is not
stored as character). Previously I think cor() did not check this,
and would silently convert using as.numeric().
I would look at:
str(acc_averages)
and I bet you will find that it is not numeric. If this is the case,
one fix would be:
correlation = cor(as.numeric(acc_averages[,2]),
gene_densities$avg_density[1:23])
probably a better fix would be to initiate acc_averages as a
data.frame rather than with c(), that way it can store different types
of data without moving everything up the hierarchy of classes. To see
what I mean look at ?rbind under the heading "Values" the second
paragraph.
Cheers,
Josh
On Mon, Dec 13, 2010 at 2:23 PM, Justin Fincher <fincher at cs.fsu.edu> wrote:
> I apologize for the lack of example. I was trying not to be too long
> winded. Below is the first portion of my function that is causing the
> error. (I'm including both calls to cor(), though it quits after the first
> throws an error). I do not believe he has redefined cor() as he is a novice
> user and we tried this after starting a fresh session. And I will look into
> upgrading. I realize it is a little out of date since it is the version in
> the repository for my distribution and not the latest-and-greatest from R.
> I just didn't realize a change like that would be made that would
> (seemingly to me) reduce functionality. Thank you again for your help.
> - Fincher
> # As they don't change, hard code gene density values
> gene_densities =
> data.frame(chrom=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7",
>
> "chr8","chr9","chr10","chr11","chr12","chr13",
>
> "chr14","chr15","chr16","chr17","chr18","chr19",
>
> "chr20","chr21","chr22","chrX","chrY"),
>
> avg_density=c(10.19,6.457,6.71,4.917,6.083,7.491,7.453,
> 5.939,7.27,7.132,11.38,9.429,3.757,
> 7.607,8.455,11.81,17.84,4.649,26.52,
> 11.19,6.51,11.28,7.535,2.931))
>
> acc_averages = c()
> # subset out relevant data
> accessibility_data = subset(accessibility_data,
> accessibility_data$V9==";color=000000")
>
> # calculate mean accessibility value for each chromosome
> for(i in seq(1,22)){
> sub = paste("chr",i,sep="")
> temp = subset(accessibility_data,accessibility_data$V1==sub)
> acc_averages = rbind(acc_averages,c(sub,as.double(mean(temp$V6))))
> }
> temp = subset(accessibility_data,accessibility_data$V1=="chrX")
> acc_averages = rbind(acc_averages,c("chrX",as.double(mean(temp$V6))))
>
> # Output the correlation without including chromosome Y
> correlation = cor(acc_averages[,2],gene_densities$avg_density[1:23])
> cat("Correlation w/o chrY:",correlation,'\n')
>
> temp = subset(accessibility_data,accessibility_data$V1=="chrY")
> acc_averages = rbind(acc_averages,c("chrY",mean(temp$V6)))
> # Output overall correlation
> correlation = cor(acc_averages[,2],gene_densities$avg_density)
> cat("Correlation w/chrY:",correlation,'\n')
>
> On Mon, Dec 13, 2010 at 17:06, Joshua Wiley <jwiley.psych at gmail.com> wrote:
>>
>> Hi Fincher,
>>
>> cor() only works on numeric arguments now (as of R 2.11 or 2.10 if
>> memory serves). So, I would update your function to ensure that you
>> are only passing numeric data to cor() and the error should go away
>> (it will probably be easier on you if you can update your version of R
>> to the latest and greatest...quite a bit has changed since 2.8.1). If
>> you post a reproducible example of your function, I'm sure we can help
>> update it.
>>
>> Cheers,
>>
>> Josh
>>
>> On Mon, Dec 13, 2010 at 1:56 PM, Justin Fincher <fincher at cs.fsu.edu>
>> wrote:
>> > Howdy,
>> > I have written a small function to generate a simple plot and my
>> > colleague is having an error when attempting to run it. Essentially I
>> > loop
>> > through categories in a data frame and take the average value for each
>> > category The categories are in $V1, subset first then mean taken and
>> > concatenated to previous values using rbind(c("label",mean(data$V6)).
>> > The
>> > result is a two-column matrix with labels in column one and values in
>> > column
>> > two. Within the function I calculate the correlation of column two and
>> > another set of values that are part of the function. On my computer
>> > (linux
>> > box running R 2.8.1) the function runs correctly. On my colleague's
>> > computer (Windows box running R 2.12) the function throws an error at
>> > the
>> > cor() function call saying that "x must be numeric." We are running on
>> > the
>> > exact same data set and source'ing the same function definition. Any
>> > help
>> > would be appreciated.
>> >
>> > - Fincher
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://www.joshuawiley.com/
>>
>> --
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
More information about the R-help
mailing list