[R] Help with Order

David Winsemius dwinsemius at comcast.net
Mon Jan 11 14:49:43 CET 2010


On Jan 11, 2010, at 7:49 AM, Duncan Murdoch wrote:

> On 11/01/2010 7:37 AM, Steve Sidney wrote:
>> Dear List
>> As a fairly new R programmer I seem to have run into a strange  
>> problem - probably my inexperience with R
>> After reading and merging successive files into a single data  
>> frame, I find that order does not sort the data as expected.
>> I have multiple references in each file but each file refers to  
>> measurement data obtained at a different time.
>> Here's the code
>> library(reshape)
>> # Enter file name to Read & Save data
>> FileName=readline("Enter File name:\n")
>> # Find first occurance of file
>> for ( round1 in 1 : 6) {
>> ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
>> if (file.exists(ReadFile))
>> break
>> }
>> x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
>> for ( round2 in (round1+1) : 6) {
>> #
>> ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
>> if (file.exists(ReadFile)) {
>> y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
>>    if (round2 == (round1 +1))
>>    z=data.frame(merge(x,y,all=TRUE))
>>    z=data.frame(merge(y,z,all=TRUE))
>> }
>> }
>> ordered = order(z$lab_id)

Following Duncan's hypothesis, perhaps change this to :
ordered = order(as.character(z$lab_id))

>> results = z[ordered,]
>> res =  
>> data 
>> .frame 
>> ( lab 
>> = 
>> results 
>> [,"lab_id 
>> "],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r =  
>> results[,"rnd"])
>> #
>> # Establish no of samples recorded
>> nsmpls = length(res[,c("lab")])
>> # Evaluate Z_scores for Between Lab Results
>> for ( i in 1 : nsmpls) {
>> if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
>> res[i,"pf_zbw"]=1
>> }
>> # Evaluate Z_scores for Within Lab Results
>> for ( i in 1 : nsmpls) {
>> if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
>> res[i,"pf_zwi"]=1
>> }
>> dd = melt(res, id=c("lab","r"), "pf_zbw")
>> b = cast(dd, lab ~ r)
>> If anyone could see why the ordering only works for about 55 of 70  
>> records and could steer me in the right direction I would be obliged
>
> I can't try out your code, but I'd guess it's due to conversion of  
> strings to factors.  Sorting factors will sort them by their  
> numerical value, not by the strings.
>
> So the solution is to set stringsAsFactors=FALSE, either in each  
> read.csv call, or globally with options(stringsAsFactors=FALSE).
>
> Duncan Murdoch
>


David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list