[R] Output of order() incorrectly ordered?

Paul Hiemstra p.hiemstra at geo.uu.nl
Tue Mar 25 11:06:15 CET 2008


Hi Shirley,

You can use the function sort_df() from the reshape package to sort an 
entire data.frame based on one column.

cheers,
Paul

Shirley Wu wrote:
> Hello,
>
> I have a data frame consisting of four columns and would like to sort  
> based on the first column and then write the sorted data frame to a  
> file.
>
>  > df <- read.table("file.txt", sep="\t")
> where file.txt is simply a tab-delimited file containing 4 columns of  
> data (first 2 numeric, second 2 character). I then do,
>
>  > df_ordered <- df[order(df$V1), ]
>
> OR, I assume equivalently,
>
>  > df_ordered <- df[ do.call(order, df), ]
>
> and then,
>
>  > write.table(df_ordered, file="newfile.txt", ...)
>
> The input data file looks like this:
>
> 0.083044        375.276 680220  majority
> 5.50816e-09     2.48914e-05     26377   conformation
> 0.000169618     0.766505        1546938 interaction
> 3.90425e-05     0.176433        1655338 vitamin
> 0.0378182       170.9   1510941 array
> 3.00359e-07     0.00135732      69421   oligo(dT)-cellulose
> 1.01517e-13     4.58754e-10     699918  elastase
> ...
>
> I'd like the output file to look the same except sorted by the first  
> column. The output of the commands above give me something that is  
> sorted in some places but not sorted in others:
>
> [sorted section]
> ...
> 1.87276e-07     0.000846299     1142090 vitamin K
> 1.89026e-07     0.000854207     917889  leader peptide
> 1.90884e-07     0.000862605     31206   s
> 0.00536062      24.2246 1706420 prevent
> 5.42648e-05     0.245223        1513041 measured
> 5.42648e-05     0.245223        1513040 measured
> 0.019939        90.1044 12578   fly
> 0.00135512      6.12377 61688   GPI
> 0.00124421      5.62257 681915  content
> 0.0128271       57.9655 681916  estimated
> ...
> [sorted section]
> ...
> [unsorted section]
> ...
> [etc]
>
> I'm not sure if this is a problem with the input data or with order()  
> or what. I am only doing this in R because many of my numeric values  
> are expressed in exponential notation and UNIX sort does not handle  
> this to my knowledge, but this behavior baffles me. I am pretty new  
> to R so it's possible I'm missing something.
>
> Any insight would be greatly appreciated!
>
> Thanks,
> -Shirley
> graduate student
> Stanford University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>   


-- 
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone: 	+31302535773
Fax:	+31302531145
http://intamap.geo.uu.nl/~paul



More information about the R-help mailing list