[R] Pairwise correlation

R. Michael Weylandt michael.weylandt at gmail.com
Thu Nov 17 15:30:37 CET 2011


I think something like this should do it, but I can't test without data:

rownames(mydata) <- mydata[,1] # Put the elements in the first column
as rownames
mydata <- mydata[,-1] # drop the things that are now rownames

Michael

On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <musahass at gmail.com> wrote:
> Hi Michael,
> Thanks for the response. I have noticed that the error occurred during my
> data read. It appears that the rownames (which when the data is transposed
> become my colnames) were converted to numbers instead of strings as they
> should be. The original header names don't change, just the rownames. I have
> to figure out how to import the data and have the strings not converted.
> Right now am using:
> mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>
> then to convert the data frame to matrix
> mydata=data.matrix(mydata)
>
> Then I just do the correlation as Peter suggested.
>
> expression=cor(t(expression))
>
> Thanks.
>
> On 17 November 2011 08:51, R. Michael Weylandt <michael.weylandt at gmail.com>
> wrote:
>>
>> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <musahass at gmail.com> wrote:
>> > Thanks to everyone who replied to my post, I finally got it to work. I
>> > am
>> > however not sure how well it worked since it run so quickly, but seems
>> > like
>> > I have a 2000 x 2000 data set.
>>
>> Behold the great and mighty power that is R! Don't worry -- on a
>> decent machine the correlation of a 2k x 2k data set should be pretty
>> fast. (It's about 9 seconds on my old-ish laptop with a bunch of other
>> junk running)
>>
>> >  My followup questions would be, how do I get
>> > only pairs with say a certain pearson correlation value additionally it
>> > seems like my output didn't retain the headers but instead replaced them
>> > with numbers making it hard to know which gene pairs correlate.
>>
>> This is a little worrisome: R carries column names through cor() so
>> this would suggest you weren't using them. Were your headers listed as
>> part of your data (instead of being names)? If so, they would have
>> been taken as numbers.
>>
>> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> then they are being treated as data instead of numbers. If they are,
>> can you provide some reproducible code and we can debug more fully.
>> The easiest way to send data is to use the dput() function to get a
>> copy-pasteable plain text representation. It would also be great if
>> you could restrict it to a subset of your data rather than the full 4M
>> data points, but if that's hard to do, don't worry.
>>
>> You should have expected behavior like
>>
>> X <- matrix(1:9,3)
>> colnames(X) <- c("A","B","C")
>> cor(X) # Prints with labels
>>
>> Michael
>>
>> >
>> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> > ml-node+s789695n4078114h81 at n4.nabble.com> wrote:
>> >
>> >> > -----Original Message-----
>> >> > From: [hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0>[mailto:
>> >> r-help-bounces at r-
>> >> > project.org] On Behalf Of muzz56
>> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> > To: [hidden
>> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> > Subject: Re: [R] Pairwise correlation
>> >> >
>> >> > Thanks Peter. I tried this after reading in the csv (read.csv) and
>> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> > correlation,
>> >> > I keeping getting the error (x must be numeric) yet when I view the
>> >> > data,
>> >> > its numeric.
>> >> >
>> >>
>> >> What does R tell you if you execute the following?
>> >>
>> >> str(x)
>> >>
>> >> Just because the data looks like it is numeric when it prints doesn't
>> >> mean
>> >> it is.
>> >>
>> >>
>> >> Dan
>> >>
>> >> Daniel J. Nordlund
>> >> Washington State Department of Social and Health Services
>> >> Planning, Performance, and Accountability
>> >> Research and Data Analysis Division
>> >> Olympia, WA 98504-5204
>> >>
>> >>
>> >> ______________________________________________
>> >> [hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> >> discussion
>> >> below:
>> >>
>> >> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >>  To unsubscribe from Pairwise correlation, click
>> >> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4076963&code=bXVzYWhhc3NAZ21haWwuY29tfDQwNzY5NjN8LTE5ODYxNDM0OTI=>
>> >> .
>> >>
>> >> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>> >>
>> >
>> >
>> > --
>> > View this message in context:
>> > http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
>> > Sent from the R help mailing list archive at Nabble.com.
>> >        [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>
>



More information about the R-help mailing list