[R] Pairwise correlation

R. Michael Weylandt michael.weylandt at gmail.com
Thu Nov 17 21:32:40 CET 2011


I can't see how it's stored like that and the email servers garble it
up. Use dput() to create a plain text representation and paste that
back in.

Thanks,
Michael

On Thu, Nov 17, 2011 at 9:37 AM, muzz56 <musahass at gmail.com> wrote:
> Hi Michael,
> Here is a sample of the data.
>
>  Gene Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 Array9 Array10
> Array11  Fth1 26016.01 23134.66 17445.71 39856.04 27245.45 23622.98 37887.75
> 49857.46 25864.73 21852.51 29198.4  B2m 7573.64 7768.52 6608.24 8571.65
> 6380.78 6242.76 6903.92 7330.63 7256.18 5678.21 10937.05  Tmsb4x 6192.44
> 4277.22 5024.59 4851.51 3062.55 4562.43 7948.1 5018.58 3200.17 2855.77
> 6139.23  H2-D1 3141.41 3986.06 3328.62 4726.6 3589.89 2885.95 7509.88
> 5257.62 4742.26 3431.33 5300.72  Prdx5 3935.7 3938.9 3401.68 4193.14 4028.95
> 3438.19 6640.15 5486.61 4424.57 3368.83 5265.92
> I want to retain the gene names in the data. What you've proposed will take
> them out and I'll have to append them back to the results after the cor()
>
> On 17 November 2011 09:33, Michael Weylandt [via R] <
> ml-node+s789695n4080177h34 at n4.nabble.com> wrote:
>
>> I think something like this should do it, but I can't test without data:
>>
>> rownames(mydata) <- mydata[,1] # Put the elements in the first column
>> as rownames
>> mydata <- mydata[,-1] # drop the things that are now rownames
>>
>> Michael
>>
>> On Thu, Nov 17, 2011 at 9:23 AM, Musa Hassan <[hidden email]<http://user/SendEmail.jtp?type=node&node=4080177&i=0>>
>> wrote:
>>
>> > Hi Michael,
>> > Thanks for the response. I have noticed that the error occurred during
>> my
>> > data read. It appears that the rownames (which when the data is
>> transposed
>> > become my colnames) were converted to numbers instead of strings as they
>> > should be. The original header names don't change, just the rownames. I
>> have
>> > to figure out how to import the data and have the strings not converted.
>> > Right now am using:
>> > mydata = read.csv(mydata.csv, headers=T,stringsAsFactors=F)
>> >
>> > then to convert the data frame to matrix
>> > mydata=data.matrix(mydata)
>> >
>> > Then I just do the correlation as Peter suggested.
>> >
>> > expression=cor(t(expression))
>> >
>> > Thanks.
>> >
>> > On 17 November 2011 08:51, R. Michael Weylandt <[hidden email]<http://user/SendEmail.jtp?type=node&node=4080177&i=1>>
>>
>> > wrote:
>> >>
>> >> On Wed, Nov 16, 2011 at 11:22 PM, muzz56 <[hidden email]<http://user/SendEmail.jtp?type=node&node=4080177&i=2>>
>> wrote:
>> >> > Thanks to everyone who replied to my post, I finally got it to work.
>> I
>> >> > am
>> >> > however not sure how well it worked since it run so quickly, but
>> seems
>> >> > like
>> >> > I have a 2000 x 2000 data set.
>> >>
>> >> Behold the great and mighty power that is R! Don't worry -- on a
>> >> decent machine the correlation of a 2k x 2k data set should be pretty
>> >> fast. (It's about 9 seconds on my old-ish laptop with a bunch of other
>> >> junk running)
>> >>
>> >> >  My followup questions would be, how do I get
>> >> > only pairs with say a certain pearson correlation value additionally
>> it
>> >> > seems like my output didn't retain the headers but instead replaced
>> them
>> >> > with numbers making it hard to know which gene pairs correlate.
>> >>
>> >> This is a little worrisome: R carries column names through cor() so
>> >> this would suggest you weren't using them. Were your headers listed as
>> >> part of your data (instead of being names)? If so, they would have
>> >> been taken as numbers.
>> >>
>> >> Take a look at dimnames(NAMEOFDATA) -- if your headers aren't there,
>> >> then they are being treated as data instead of numbers. If they are,
>> >> can you provide some reproducible code and we can debug more fully.
>> >> The easiest way to send data is to use the dput() function to get a
>> >> copy-pasteable plain text representation. It would also be great if
>> >> you could restrict it to a subset of your data rather than the full 4M
>> >> data points, but if that's hard to do, don't worry.
>> >>
>> >> You should have expected behavior like
>> >>
>> >> X <- matrix(1:9,3)
>> >> colnames(X) <- c("A","B","C")
>> >> cor(X) # Prints with labels
>> >>
>> >> Michael
>> >>
>> >> >
>> >> > On 16 November 2011 17:11, Nordlund, Dan (DSHS/RDA) [via R] <
>> >> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080177&i=3>>
>> wrote:
>> >> >
>> >> >> > -----Original Message-----
>> >> >> > From: [hidden
>> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=0
>> >[mailto:
>> >> >> r-help-bounces at r-
>> >> >> > project.org] On Behalf Of muzz56
>> >> >> > Sent: Wednesday, November 16, 2011 12:28 PM
>> >> >> > To: [hidden
>> >> >> > email]<http://user/SendEmail.jtp?type=node&node=4078114&i=1>
>> >> >> > Subject: Re: [R] Pairwise correlation
>> >> >> >
>> >> >> > Thanks Peter. I tried this after reading in the csv (read.csv) and
>> >> >> > converted the data to matrix (as.matrix). But when I tried the
>> >> >> > correlation,
>> >> >> > I keeping getting the error (x must be numeric) yet when I view
>> the
>> >> >> > data,
>> >> >> > its numeric.
>> >> >> >
>> >> >>
>> >> >> What does R tell you if you execute the following?
>> >> >>
>> >> >> str(x)
>> >> >>
>> >> >> Just because the data looks like it is numeric when it prints
>> doesn't
>> >> >> mean
>> >> >> it is.
>> >> >>
>> >> >>
>> >> >> Dan
>> >> >>
>> >> >> Daniel J. Nordlund
>> >> >> Washington State Department of Social and Health Services
>> >> >> Planning, Performance, and Accountability
>> >> >> Research and Data Analysis Division
>> >> >> Olympia, WA 98504-5204
>> >> >>
>> >> >>
>> >> >> ______________________________________________
>> >> >> [hidden email]
>> >> >> <http://user/SendEmail.jtp?type=node&node=4078114&i=2>mailing list
>> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> >> PLEASE do read the posting guide
>> >> >> http://www.R-project.org/posting-guide.html
>> >> >> and provide commented, minimal, self-contained, reproducible code.
>> >> >>
>> >> >>
>> >> >> ------------------------------
>> >> >>  If you reply to this email, your message will be added to the
>> >> >> discussion
>> >> >> below:
>> >> >>
>> >> >>
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078114.html
>> >> >>  To unsubscribe from Pairwise correlation, click
>> >> >> here<
>>
>> >> >> .
>> >> >>
>> >> >> NAML<
>> http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> >> >>
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >> >
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4078915.html
>> >> > Sent from the R help mailing list archive at Nabble.com.
>> >> >        [[alternative HTML version deleted]]
>> >> >
>> >> > ______________________________________________
>> >> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4080177&i=4>mailing list
>> >> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >> > PLEASE do read the posting guide
>> >> > http://www.R-project.org/posting-guide.html
>> >> > and provide commented, minimal, self-contained, reproducible code.
>> >> >
>> >
>> >
>>
>> ______________________________________________
>> [hidden email] <http://user/SendEmail.jtp?type=node&node=4080177&i=5>mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>> http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080177.html
>>  To unsubscribe from Pairwise correlation, click here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4076963&code=bXVzYWhhc3NAZ21haWwuY29tfDQwNzY5NjN8LTE5ODYxNDM0OTI=>
>> .
>> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespace&breadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Pairwise-correlation-tp4076963p4080194.html
> Sent from the R help mailing list archive at Nabble.com.
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list