[BioC] Sorting matrix by column

Kasoji, Manjula (NIH/NCI) [C] manjula.kasoji at nih.gov
Wed Oct 24 16:07:16 CEST 2012


Well, the fix was easy. I just did a no quote() on my matrix, and now I
can order them and simply us the duplicated() function and it
automatically removes the duplicates and keeps the one with the higher FC.
Pasting an example below in case others want to view.

:-) Thanks!

> zz=z[order(z[,2]),]
> zz
         ID               Gene Symbol    logFC          Adj.PVal
10496580 10496580 Gbp3        1.00088125196237 0.044611409531886
10496539 10496539 Gbp5        1.30128040497582 0.0319569661457467
10531994 10531994 Gbp6        1.19085298275753 0.0490943973094095
10496569 10496569 Gbp7        1.0421928217272  0.0490943973094095
10376324 10376324 Gm12250     1.60937590067288 0.030458897264666
10490273 10490273 Gm14305     1.01718341526644 0.0368613093977217
10455961 10455961 Iigp1       1.0315422556842  0.044611409531886
10376326 10376326 Irgm2       1.36277705961511 0.0323289276651196
10398039 10398039 Serpina3f   1.0686870563162  0.044611409531886
10385518 10385518 Tgtp1       1.64120481997653 0.0384608883577761
10385533 10385533 Tgtp1       1.37274810522256 0.044611409531886


> zz[!duplicated(zz[,2]),]
         ID              Gene Symbol    logFC           Adj.PVal
10496580 10496580 Gbp3        1.00088125196237 0.044611409531886
10496539 10496539 Gbp5        1.30128040497582 0.0319569661457467
10531994 10531994 Gbp6        1.19085298275753 0.0490943973094095
10496569 10496569 Gbp7        1.0421928217272  0.0490943973094095
10376324 10376324 Gm12250     1.60937590067288 0.030458897264666
10490273 10490273 Gm14305     1.01718341526644 0.0368613093977217
10455961 10455961 Iigp1       1.0315422556842  0.044611409531886
10376326 10376326 Irgm2       1.36277705961511 0.0323289276651196
10398039 10398039 Serpina3f   1.0686870563162  0.044611409531886
10385518 10385518 Tgtp1       1.64120481997653 0.0384608883577761



On 10/23/12 3:32PM, "James W. MacDonald" <jmacdon at uw.edu> wrote:

>If you want to annotate data, an easier way to do it is to use the
>annaffy package - you can output either text or HTML tables. I have some
>functions in affycoretools to automate going from a MArrayLM object to
>the HTML or text tables if you are interested.
>
>Best,
>
>Jim
>
>On 10/23/2012 2:32 PM, Kasoji, Manjula (NIH/NCI) [C] wrote:
>> Thanks, guys. I think I got that because I did a cbind() with my
>>ebayes()
>> results and my annotation results from mget() that used to annotate my
>> significant genes from the mogene10sttranscriptcluster db.
>>
>> I'll try out a few things. If you guys have any further suggestions or
>> recommendations I will certainly appreciate them.
>>
>> Thanks!
>>
>> On 10/23/12 11:57AM, "Axel Klenk"<axel.klenk at actelion.com>  wrote:
>>
>>> Dear Manjula,
>>>
>>> wow. How did you create that? :-)
>>>
>>> order() doesn't like lists:
>>>
>>>> order(list(1:3))
>>> Error in order(list(1:3)) : unimplemented type 'list' in 'orderVector1'
>>>
>>> and I think you should try to make your x look something like the
>>> data.frame that Jim has used in his example and it will work.
>>>
>>> Cheers,
>>>
>>> Axel (not Alex!!) Klenk
>>> Research Informatician
>>> Information Management Drug Discovery
>>>
>>> Actelion Pharmaceuticals Ltd. • Gewerbestrasse 16 • CH-4123 Allschwil
>>> • Switzerland
>>> G12.O1.R10
>>>
>>> axel.klenk at actelion.com • www.actelion.com
>>> Address for visitors: Hegenheimermattweg 92
>>>
>>>
>>> On Tue, Oct 23, 2012 at 5:45 PM, Kasoji, Manjula (NIH/NCI) [C]
>>> <manjula.kasoji at nih.gov>  wrote:
>>>> Hi Alex,
>>>>
>>>> Please see the output below:
>>>>
>>>>> str(x)
>>>>
>>>> List of 80
>>>>   $ : chr "10371400"
>>>>   $ : chr "10453900"
>>>>   $ : chr "10375051"
>>>>   $ : chr "10575211"
>>>>   $ : chr "10566254"
>>>>   $ : chr "10602372"
>>>>   $ : chr "10398428"
>>>>   $ : chr "10383518"
>>>>   $ : chr "10397054"
>>>>   $ : chr "10384020"
>>>>   $ : chr "10608710"
>>>>   $ : chr "10363762"
>>>>   $ : chr "10375058"
>>>>   $ : chr "10381603"
>>>>   $ : chr "10442373"
>>>>   $ : chr "10421227"
>>>>   $ : chr "10534966"
>>>>   $ : chr "10398408"
>>>>   $ : chr "10398418"
>>>>   $ : chr "10572772"
>>>>   $ : chr "Lypla1"
>>>>   $ : chr "Tcea1"
>>>>   $ : chr "Atp6v1h"
>>>>   $ : chr "Oprk1"
>>>>
>>>>> class(x[,2])
>>>> [1] "list"
>>>>
>>>>
>>>>
>>>>
>>>> On 10/23/12 11:42AM, "Axel Klenk"<axel.klenk at actelion.com>  wrote:
>>>>
>>>>> Dear Guest,
>>>>>
>>>>> I think your approach is valid in general and it is your x that is
>>>>> causing the
>>>>> problem; column 'Gene Symbol' appears to contain two values. What is
>>>>>the
>>>>> result of
>>>>>
>>>>> str(x)
>>>>>
>>>>> and/or
>>>>>
>>>>> class(x[,2])
>>>>>
>>>>> ?
>>>>>
>>>>> Cheers,
>>>>>
>>>>> - axel
>>>>>
>>>>>
>>>>> Axel Klenk
>>>>> Research Informatician
>>>>> Information Management Drug Discovery
>>>>>
>>>>> Actelion Pharmaceuticals Ltd. € Gewerbestrasse 16 € CH-4123 Allschwil
>>>>> € Switzerland
>>>>> G12.O1.R10
>>>>>
>>>>> axel.klenk at actelion.com € www.actelion.com
>>>>> Address for visitors: Hegenheimermattweg 92
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Oct 23, 2012 at 5:15 PM, Guest
>>>>>[guest]<guest at bioconductor.org>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I would like to sort a matrix by a specific column (column 2). I
>>>>>>tried
>>>>>> the order() function, but I get an error. I think it is because the
>>>>>> values in column 2 are not numeric, they are gene symbols. This may
>>>>>>be
>>>>>> a
>>>>>> general R question, but I thought I would post it here since it is
>>>>>> microarray data analysis.
>>>>>>
>>>>>> I have matrix x:
>>>>>>
>>>>>>> x
>>>>>>           ID         Gene Symbol     logFC      Adj.PVal
>>>>>> 10344624 "10371400" "Lypla1"        0.3592492  0.9999522
>>>>>> 10344633 "10453900" "Tcea1"         0.1886117  0.9999522
>>>>>> 10344637 "10375051" "Atp6v1h"       0.6713107  0.9999522
>>>>>> 10344653 "10575211" "Oprk1"         -0.2342731 0.9999522
>>>>>> 10344658 "10566254" "Rb1cc1"        1.790676   0.9999522
>>>>>> 10344674 "10602372" "Fam150a"       1.397496   0.9999522
>>>>>> 10344679 "10398428" "St18"          -0.3278807 0.9999522
>>>>>> 10344707 "10383518" "Pcmtd1"        -0.2231074 0.9999522
>>>>>> 10344713 "10397054" "Ahcy"          -0.1844897 0.9999522
>>>>>> 10344723 "10384020" "Rrs1"          -0.2322781 0.9999522
>>>>>> 10344725 "10608710" "Adhfe1"        0.5993566  0.9999522
>>>>>> 10344741 "10363762" "Hnrnpa3"       -0.2660978 0.9999522
>>>>>> 10344743 "10375058" "3110035E14Rik" 0.9178868  0.9999522
>>>>>> 10344750 "10381603" "Sgk3"          -0.2961638 0.9999522
>>>>>> 10344772 "10442373" "6030422M02Rik" -0.1653454 0.9999522
>>>>>> 10344789 "10421227" "Cspp1"         -0.1480766 0.9999522
>>>>>> 10344799 "10534966" "Cspp1"         -0.2436361 0.9999522
>>>>>> 10344801 "10398408" "Cspp1"         -0.4040665 0.9999522
>>>>>> 10344803 "10398418" "Cspp1"         -0.2556627 0.9999522
>>>>>> 10344805 "10572772" "Cspp1"         -0.1864641 0.9999522
>>>>>>
>>>>>> I want to sort on the "Gene Symbol" column so that I can remove the
>>>>>> duplicates and keep the one with the highest log fold change.
>>>>>>
>>>>>> I tried the following and received an error.
>>>>>>> x[order(x[,2]),]
>>>>>> Error in order(x[, 2]) : unimplemented type 'list' in 'orderVector1'
>>>>>>
>>>>>> If anyone has any suggestions for an easy way to sort a significant
>>>>>> gene list, remove duplicated values, and keep the value with highest
>>>>>> fold change, that would be helpful!
>>>>>>
>>>>>> I've posted my session info below.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Guest
>>>>>>
>>>>>>   -- output of sessionInfo():
>>>>>>
>>>>>>> sessionInfo()
>>>>>> R version 2.15.1 (2012-06-22)
>>>>>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] tools_2.15.1
>>>>>>
>>>>>> --
>>>>>> Sent via the guest posting facility at bioconductor.org.
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioconductor mailing list
>>>>>> Bioconductor at r-project.org
>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>> Search the archives:
>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> --
>>>>>
>>>>> The information of this email and in any file transmitted with it is
>>>>> strictly confidential and may be legally privileged.
>>>>> It is intended solely for the addressee. If you are not the intended
>>>>> recipient, any copying, distribution or any other use of this email
>>>>>is
>>>>> prohibited and may be unlawful. In such case, you should please
>>>>>notify
>>>>> the
>>>>> sender immediately and destroy this email.
>>>>> The content of this email is not legally binding unless confirmed by
>>>>> letter.
>>>>> Any views expressed in this message are those of the individual
>>>>>sender,
>>>>> except where the message states otherwise and the sender is
>>>>>authorised
>>>>> to
>>>>> state them to be the views of the sender's company. For further
>>>>> information
>>>>> about Actelion please see our website at http://www.actelion.com
>>>>>
>>> -- 
>>>
>>> The information of this email and in any file transmitted with it is
>>> strictly confidential and may be legally privileged.
>>> It is intended solely for the addressee. If you are not the intended
>>> recipient, any copying, distribution or any other use of this email is
>>> prohibited and may be unlawful. In such case, you should please notify
>>> the
>>> sender immediately and destroy this email.
>>> The content of this email is not legally binding unless confirmed by
>>> letter.
>>> Any views expressed in this message are those of the individual sender,
>>> except where the message states otherwise and the sender is authorised
>>>to
>>> state them to be the views of the sender's company. For further
>>> information
>>> about Actelion please see our website at http://www.actelion.com
>>>
>
>-- 
>James W. MacDonald, M.S.
>Biostatistician
>University of Washington
>Environmental and Occupational Health Sciences
>4225 Roosevelt Way NE, # 100
>Seattle WA 98105-6099
>



More information about the Bioconductor mailing list