[R] Help with Order

Mon Jan 11 15:12:12 CET 2010

David , Duncan

Thanks for the swift response.

You guys hit the nail on the head. That's exactly what the problem was.

All the best
Steve
----- Original Message ----- 
From: "David Winsemius" <dwinsemius at comcast.net>
To: "Duncan Murdoch" <murdoch at stats.uwo.ca>
Cc: "Steve Sidney" <sbsidney at mweb.co.za>; <r-help at r-project.org>
Sent: Monday, January 11, 2010 3:49 PM
Subject: Re: [R] Help with Order

> 
> On Jan 11, 2010, at 7:49 AM, Duncan Murdoch wrote:
> 
>> On 11/01/2010 7:37 AM, Steve Sidney wrote:
>>> Dear List
>>> As a fairly new R programmer I seem to have run into a strange  
>>> problem - probably my inexperience with R
>>> After reading and merging successive files into a single data  
>>> frame, I find that order does not sort the data as expected.
>>> I have multiple references in each file but each file refers to  
>>> measurement data obtained at a different time.
>>> Here's the code
>>> library(reshape)
>>> # Enter file name to Read & Save data
>>> FileName=readline("Enter File name:\n")
>>> # Find first occurance of file
>>> for ( round1 in 1 : 6) {
>>> ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
>>> if (file.exists(ReadFile))
>>> break
>>> }
>>> x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
>>> for ( round2 in (round1+1) : 6) {
>>> #
>>> ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
>>> if (file.exists(ReadFile)) {
>>> y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
>>>    if (round2 == (round1 +1))
>>>    z=data.frame(merge(x,y,all=TRUE))
>>>    z=data.frame(merge(y,z,all=TRUE))
>>> }
>>> }
>>> ordered = order(z$lab_id)
> 
> Following Duncan's hypothesis, perhaps change this to :
> ordered = order(as.character(z$lab_id))
> 
>>> results = z[ordered,]
>>> res =  
>>> data 
>>> .frame 
>>> ( lab 
>>> = 
>>> results 
>>> [,"lab_id 
>>> "],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r =  
>>> results[,"rnd"])
>>> #
>>> # Establish no of samples recorded
>>> nsmpls = length(res[,c("lab")])
>>> # Evaluate Z_scores for Between Lab Results
>>> for ( i in 1 : nsmpls) {
>>> if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
>>> res[i,"pf_zbw"]=1
>>> }
>>> # Evaluate Z_scores for Within Lab Results
>>> for ( i in 1 : nsmpls) {
>>> if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
>>> res[i,"pf_zwi"]=1
>>> }
>>> dd = melt(res, id=c("lab","r"), "pf_zbw")
>>> b = cast(dd, lab ~ r)
>>> If anyone could see why the ordering only works for about 55 of 70  
>>> records and could steer me in the right direction I would be obliged
>>
>> I can't try out your code, but I'd guess it's due to conversion of  
>> strings to factors.  Sorting factors will sort them by their  
>> numerical value, not by the strings.
>>
>> So the solution is to set stringsAsFactors=FALSE, either in each  
>> read.csv call, or globally with options(stringsAsFactors=FALSE).
>>
>> Duncan Murdoch
>>
> 
> 
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
> 
>