[R] "order" issue

Zoppoli, Gabriele (NIH/NCI) [G] zoppolig at mail.nih.gov
Mon May 24 01:57:52 CEST 2010


after read.delim:

'data.frame':   60 obs. of  4 variables:
 $ Cell       : Factor w/ 60 levels "BR:BT_549","BR:HS578T",..: 23 51 20 25 34 16 44 3 60 55 ...
 $ hsa-miR-204: num  -4.37 -4.34 -4.33 -4.29 -4.26 ...
 $ hsa-miR-210: num  -0.223 1.575 1.66 1.668 0.373 ...
 $ Tissue     : Factor w/ 9 levels "Breast","CNS",..: 5 8 5 5 6 3 7 1 9 9 ...

before:

 chr [1:60, 1:4] "ME:SK_MEL_5" "ME:SK_MEL_28" "ME:SK_MEL_2" ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:60] "48" "47" "46" "50" ...
  ..$ : chr [1:4] "Product" "hsa.miR.204" "hsa.miR.210" "Tissue"

Looks like the issue is that, after the first time I "read.delim"med the txt file, I removed the first three raws by doing

x=x[-c(1:3),]

because the first three raws were characters (parameters like "probe name", "chromosomal position" ecc.)

So maybe R remembers that the columns used were characters and not numeric... How would you "explain" R (sorry for the naive definitions but I've learnt R over time by myself and I misuse some words, hope it's clear anyway) that a matrix is all numeric? by doing as.numeric(x), it transforms everything in a long colum of number, but loses the matrix structure...

Thank you all guys! You're really precious!

Now, how can you "explain" (sorry for my naive definitions...) R that now all of your values are numeric in a matrix? If you do as.numeric, everything becomes a long column of n 



Gabriele Zoppoli, MD
Ph.D. Fellow, Experimental and Clinical Oncology and Hematology, University of Genova, Genova, Italy
Guest Researcher, LMP, NCI, NIH, Bethesda MD

Work: 301-451-8575
Mobile: 301-204-5642
Email: zoppolig at mail.nih.gov
________________________________________
From: William Dunlap [wdunlap at tibco.com]
Sent: Sunday, May 23, 2010 7:05 PM
To: Zoppoli, Gabriele (NIH/NCI) [G]; ted.harding at manchester.ac.uk
Cc: R-help at r-project.org
Subject: RE: [R] "order" issue

> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Zoppoli,
> Gabriele (NIH/NCI) [G]
> Sent: Sunday, May 23, 2010 3:44 PM
> To: ted.harding at manchester.ac.uk
> Cc: R-help at r-project.org
> Subject: Re: [R] "order" issue
>
> crazy stuff!!! I tried to reload the txt file, and now it's working...

When you "reloaded" the txt file (with what function?) it
probably was made into a "data.frame", with some columns
factors or characters and some columns numerics.  It looks
like your original problem arose after you converted that
data.frame into a "matrix", all of whose columns must be
the same (character in this case).  Sorting character
representations of numbers is different than sorting the
numbers as numbers.
  > sort(c(1, 0.05, 0.0000, -0.10, -2))
  [1] -2.00 -0.10  0.00  0.05  1.00
  > sort(as.character(c(1, 0.05, 0.0000, -0.10, -2)))
  [1] "-0.1" "-2"   "0"    "0.05" "1"

Use str(x) again to see if this is what is happening.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

>
> this is the original (attached)
>
> thanks!
>
> Gabriele Zoppoli, MD
> Ph.D. Fellow, Experimental and Clinical Oncology and
> Hematology, University of Genova, Genova, Italy
> Guest Researcher, LMP, NCI, NIH, Bethesda MD
>
> Work: 301-451-8575
> Mobile: 301-204-5642
> Email: zoppolig at mail.nih.gov
> ________________________________________
> From: Ted Harding [Ted.Harding at manchester.ac.uk]
> Sent: Sunday, May 23, 2010 6:31 PM
> To: Zoppoli, Gabriele (NIH/NCI) [G]
> Cc: R help
> Subject: RE: [R] "order" issue
>
> On 23-May-10 21:39:06, Zoppoli, Gabriele (NIH/NCI) [G] wrote:
> > Hi everybody, this is a real dummy thing.
> >
> > I sorted a matrix based on a given column, and what I get is right,
> > until it comes to columns of negative and positive values; than,
> > "order" orders everything from max to min in the negative
> values, and
> > then AGAIN from max to min in the positive values!!!
> >
> > Why isn't everything order from max to min, and that's it?
> > Thank you!!!
> >
> > Attached is the txt file I use; try:
> >
> > x=x[order(x[,2]),]
> >
> > What I get is:
> >
> > print(x)
> >
> >           Product A B   Tissue
> > 44  ME:MDA_MB_435     -0.1915    -0.16744 Melanoma
> > 17     CNS:SNB_75    -0.23183     1.03945      CNS
> > 37       LE:K_562    -0.58218      1.8581 Leukemia
> > 43    ME:MALME_3M    -0.67327    -1.33493 Melanoma
> > 49    ME:UACC_257    -0.72431    -1.84753 Melanoma
> > 42         ME:M14    -0.73942    -0.73904 Melanoma
> > 40          LE:SR    -0.93541     2.95346 Leukemia
> > 25      CO:SW_620    -1.53265    -1.35446    Colon
> > 63      RE:CAKI_1    -2.48443     0.43245    Renal
> > 39   LE:RPMI_8226    -2.59561     -1.9448 Leukemia
> > 26        LC:A549    -2.66221     0.71215     Lung
> > 61        RE:A498    -2.89402     0.93287    Renal
> > 9       BR:HS578T    -2.94118      1.1217   Breast
> > 34    LC:NCI_H522    -2.94381      0.3859     Lung
> > 66       RE:TK_10    -2.95281     1.26245    Renal
> > 52 OV:NCI_ADR_RES    -3.04456     0.17046  Ovarian
> > 57     OV:SK_OV_3    -3.04477     2.15405  Ovarian
> > 53     OV:OVCAR_3     -3.0705    -0.31743  Ovarian
> > 14     CNS:SF_295    -3.09348    -1.00095      CNS
> > 54     OV:OVCAR_4    -3.13137    -0.47497  Ovarian
> > 36       LE:HL_60    -3.16745    -3.16745 Leukemia
> > 38      LE:MOLT_4    -3.20055    -1.72841 Leukemia
> > 11  BR:MDA_MB_231    -3.24907     1.58326   Breast
> > 59        PR:PC_3    -3.36612     1.39328 Prostate
> > 19     CO:HCT_116    -3.39764     0.43061    Colon
> > 12        BR:T47D    -3.41228     1.13818   Breast
> > 22      CO:HCT_15    -3.45342     0.16357    Colon
> > 64     RE:RXF_393    -3.49615     2.59144    Renal
> > 28      LC:HOP_62     -3.4968     0.67884     Lung
> > 60       RE:786_0     -3.5086     1.75056    Renal
> > 35    LE:CCRF_CEM    -3.54526    -2.09262 Leukemia
> > 29      LC:HOP_92    -3.60636     0.87116     Lung
> > 21    CO:HCC_2998    -3.61457    -0.32362    Colon
> > 13     CNS:SF_268    -3.63916     2.54378      CNS
> > 20     CO:COLO205    -3.64656     0.54344    Colon
> > 56     OV:OVCAR_8    -3.66053     -0.9594  Ovarian
> > 24        CO:KM12    -3.68703     2.19991    Colon
> > 55     OV:OVCAR_5     -3.7852     2.43038  Ovarian
> > 8       BR:BT_549    -3.80239    -0.43099   Breast
> > 15     CNS:SF_539    -3.86184     1.39114      CNS
> > 65       RE:SN12C    -3.90776     0.85244    Renal
> > 31     LC:NCI_H23    -3.91625    -1.14955     Lung
> > 62        RE:ACHN    -3.96246    -0.62365    Renal
> > 67       RE:UO_31    -3.99791    -1.09215    Renal
> > 10        BR:MCF7    -4.00187     1.46303   Breast
> > 51      OV:IGROV1    -4.02758     2.04324  Ovarian
> > 23        CO:HT29    -4.11624    -0.02799    Colon
> > 41     ME:LOXIMVI     -4.2572     0.37259 Melanoma
> > 32   LC:NCI_H322M    -4.28534     1.66783     Lung
> > 27        LC:EKVX    -4.32847     1.66042     Lung
> > 58      PR:DU_145    -4.33961     1.57548 Prostate
> > 30    LC:NCI_H226    -4.37408    -0.22311     Lung
> > 33    LC:NCI_H460      0.0042     -0.6023     Lung
> > 18       CNS:U251     0.01263     1.66389      CNS
> > 16     CNS:SNB_19     0.16583     0.03737      CNS
> > 45       ME:MDA_N     0.21077     0.05502 Melanoma
> > 50     ME:UACC_62     0.52503      0.1605 Melanoma
> > 46    ME:SK_MEL_2     0.55255     -1.6667 Melanoma
> > 47   ME:SK_MEL_28      1.7425     1.45266 Melanoma
> > 48    ME:SK_MEL_5     1.74749    -1.47817 Melanoma
> >
> > Gabriele Zoppoli, MD
>
> Somewhat strange indeed! The only further question I can think of
> is to ask how what did "x" look like before your re-ordered it.
> Using the "x.txt" file you supplied, I get:
>
>   x <- read.table("x.txt")
>   str(x)
>   # 'data.frame':   60 obs. of  4 variables:
>   #  $ Product: Factor w/ 60 levels
> "BR:BT_549","BR:HS578T",..: 37 10 30
>   #    36 42 35 33 18 56 32 ...
>   #  $ A      : num  -0.192 -0.232 -0.582 -0.673 -0.724 ...
>   #  $ B      : num  -0.167 1.039 1.858 -1.335 -1.848 ...
>   #  $ Tissue : Factor w/ 9 levels "Breast","CNS",..: 6 2 4 6
> 6 6 4 3 9 4
>   #    ...
>
>
> so x[,2] and x[,3] are indeed numeric. Then (similar to yours):
>
>   X<-x[order(x[,2]),]
>   print(X)
>   #           Product        A        B   Tissue
>   # 30    LC:NCI_H226 -4.37408 -0.22311     Lung
>   # 58      PR:DU_145 -4.33961  1.57548 Prostate
>   # 27        LC:EKVX -4.32847  1.66042     Lung
>   # 32   LC:NCI_H322M -4.28534  1.66783     Lung
>   # 41     ME:LOXIMVI -4.25720  0.37259 Melanoma
>   # 23        CO:HT29 -4.11624 -0.02799    Colon
>   # 51      OV:IGROV1 -4.02758  2.04324  Ovarian
>   # 10        BR:MCF7 -4.00187  1.46303   Breast
>   # 67       RE:UO_31 -3.99791 -1.09215    Renal
>   # 62        RE:ACHN -3.96246 -0.62365    Renal
>   # 31     LC:NCI_H23 -3.91625 -1.14955     Lung
>   # 65       RE:SN12C -3.90776  0.85244    Renal
>   # 15     CNS:SF_539 -3.86184  1.39114      CNS
>   # 8       BR:BT_549 -3.80239 -0.43099   Breast
>   # 55     OV:OVCAR_5 -3.78520  2.43038  Ovarian
>   # 24        CO:KM12 -3.68703  2.19991    Colon
>   # 56     OV:OVCAR_8 -3.66053 -0.95940  Ovarian
>   # 20     CO:COLO205 -3.64656  0.54344    Colon
>   # 13     CNS:SF_268 -3.63916  2.54378      CNS
>   # 21    CO:HCC_2998 -3.61457 -0.32362    Colon
>   # 29      LC:HOP_92 -3.60636  0.87116     Lung
>   # 35    LE:CCRF_CEM -3.54526 -2.09262 Leukemia
>   # 60       RE:786_0 -3.50860  1.75056    Renal
>   # 28      LC:HOP_62 -3.49680  0.67884     Lung
>   # 64     RE:RXF_393 -3.49615  2.59144    Renal
>   # 22      CO:HCT_15 -3.45342  0.16357    Colon
>   # 12        BR:T47D -3.41228  1.13818   Breast
>   # 19     CO:HCT_116 -3.39764  0.43061    Colon
>   # 59        PR:PC_3 -3.36612  1.39328 Prostate
>   # 11  BR:MDA_MB_231 -3.24907  1.58326   Breast
>   # 38      LE:MOLT_4 -3.20055 -1.72841 Leukemia
>   # 36       LE:HL_60 -3.16745 -3.16745 Leukemia
>   # 54     OV:OVCAR_4 -3.13137 -0.47497  Ovarian
>   # 14     CNS:SF_295 -3.09348 -1.00095      CNS
>   # 53     OV:OVCAR_3 -3.07050 -0.31743  Ovarian
>   # 57     OV:SK_OV_3 -3.04477  2.15405  Ovarian
>   # 52 OV:NCI_ADR_RES -3.04456  0.17046  Ovarian
>   # 66       RE:TK_10 -2.95281  1.26245    Renal
>   # 34    LC:NCI_H522 -2.94381  0.38590     Lung
>   # 9       BR:HS578T -2.94118  1.12170   Breast
>   # 61        RE:A498 -2.89402  0.93287    Renal
>   # 26        LC:A549 -2.66221  0.71215     Lung
>   # 39   LE:RPMI_8226 -2.59561 -1.94480 Leukemia
>   # 63      RE:CAKI_1 -2.48443  0.43245    Renal
>   # 25      CO:SW_620 -1.53265 -1.35446    Colon
>   # 40          LE:SR -0.93541  2.95346 Leukemia
>   # 42         ME:M14 -0.73942 -0.73904 Melanoma
>   # 49    ME:UACC_257 -0.72431 -1.84753 Melanoma
>   # 43    ME:MALME_3M -0.67327 -1.33493 Melanoma
>   # 37       LE:K_562 -0.58218  1.85810 Leukemia
>   # 17     CNS:SNB_75 -0.23183  1.03945      CNS
>   # 44  ME:MDA_MB_435 -0.19150 -0.16744 Melanoma
>   # 33    LC:NCI_H460  0.00420 -0.60230     Lung
>   # 18       CNS:U251  0.01263  1.66389      CNS
>   # 16     CNS:SNB_19  0.16583  0.03737      CNS
>   # 45       ME:MDA_N  0.21077  0.05502 Melanoma
>   # 50     ME:UACC_62  0.52503  0.16050 Melanoma
>   # 46    ME:SK_MEL_2  0.55255 -1.66670 Melanoma
>   # 47   ME:SK_MEL_28  1.74250  1.45266 Melanoma
>   # 48    ME:SK_MEL_5  1.74749 -1.47817 Melanoma
>
> and now the values in X[,2] are indeed in the correct numerical order,
> yet essentially the same command as your has been executed.
>
> I have not succeeded in repoducing your result by ordering on other
> columns of "x" or on the row-names of "x".
>
> So it is a mystery! The only thing I can think of is that the
> columns of "x" (as seen by R) are different from what you think
> they should be. Since your file "x.txt" looks like the value
> of "x" after your re-ordering, it is impossible to test such
> guesses on the original "x".
>
> Ted.
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 23-May-10                                       Time: 23:31:25
> ------------------------------ XFMail ------------------------------
>



More information about the R-help mailing list