[R] Odp: In need of help with correlations
Petr PIKAL
petr.pikal at precheza.cz
Mon Apr 11 09:04:40 CEST 2011
Hi
r-help-bounces at r-project.org napsal dne 09.04.2011 19:24:38:
> I am in need of someone's help in correlating gene expression. I'm
somewhat
> new to R, and can't seem to find anyone local to help me with what I
think
> is a simple problem.
>
> I need to obtain pearson and spearman correlation coefficients, and
> corresponding p-values for all of the genes in my dataset that correlate
to
> one specific gene of interest. I'm working with mouse Affymetrix Mouse
430
> 2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column
> containing identifiers) and 30 biological replicates (columns; with the
top
> row containing the header information).
>
> I've looked through several Intro manuals and the R help files.
>
> I know that "cor(x,y, use ="everything", method = c("pearson")) " can
help
> obtain the coefficients.
>
> I also know that "cor.test()" is supposed to test the significance of a
> single correlation coefficients.
>
> I've also found the bioconductor package "genefilter" / "genefinder"
that
> looks for correlations to a given gene (although I can't get it to
work).
>
> So far I've been able to:
>
> #Read in the csv file
> data<-read.csv("my data.csv")
>
> #Check the dimensions, names, class, fix(data) to ensure the file was
> loaded properly
> dim(data)
> names(data)
> class(data)
> fix(data)
>
> #So far I've been able to successfully correlate the entire 'column'
matrix
> through:
> x <- data[,2:30]
> y <- data[,2:30]
>
> corr.data<-cor(x,y, use = "everything", method = c("pearson"))
>
> write.csv(corr.data, file = "correlation of my data by columns.csv")
>
> -----------------------------------
>
> Now if I try and run the 'cor.test()' function on the same matrix, I get
and
> error message with 'x' must be a numeric vector. This I don't
understand.
In cor.test help page it is said
x, y: numeric vectors of data values. ‘x’ and ‘y’ must have the
same length.
however your data[,2:30] is most probably data frame, see
str(data[,2:20])
To be able to do cor.test you need to do cor.test like
cor.test(data[,2], data[,3])
or to do it in some cycle (untested)
result <- matrix(NA, 20,20)
for( i in 2:20) {
for(j in i+1:20) {
result[i,j] <- cor.test(data[,i], data[,j])
}}
But most probably there are other ways.
Regards
Petr
> And this is not my goal, but rather me trying to learn how to go about
doing
> correlation analysis in R.
>
> I've also tried transposing the data.frame using
"as.data.frame(t(data))"
> and doing so gives the same error message as above.
>
> Can anyone help me with figuring out how to conduct a correlation
analysis
> for specific gene/probeset, and help me understand why I get the above
error
> message? I know it probably is a simple analysis, that is probably just
over
> my head right now since I'm still new to R. But I can't figure it out
and
> have been trying with a bunch of different variations for the past week.
>
> Thank you in advance for your help.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list