[R] Help with Order
Steve Sidney
sbsidney at mweb.co.za
Mon Jan 11 15:12:12 CET 2010
David , Duncan
Thanks for the swift response.
You guys hit the nail on the head. That's exactly what the problem was.
All the best
Steve
----- Original Message -----
From: "David Winsemius" <dwinsemius at comcast.net>
To: "Duncan Murdoch" <murdoch at stats.uwo.ca>
Cc: "Steve Sidney" <sbsidney at mweb.co.za>; <r-help at r-project.org>
Sent: Monday, January 11, 2010 3:49 PM
Subject: Re: [R] Help with Order
>
> On Jan 11, 2010, at 7:49 AM, Duncan Murdoch wrote:
>
>> On 11/01/2010 7:37 AM, Steve Sidney wrote:
>>> Dear List
>>> As a fairly new R programmer I seem to have run into a strange
>>> problem - probably my inexperience with R
>>> After reading and merging successive files into a single data
>>> frame, I find that order does not sort the data as expected.
>>> I have multiple references in each file but each file refers to
>>> measurement data obtained at a different time.
>>> Here's the code
>>> library(reshape)
>>> # Enter file name to Read & Save data
>>> FileName=readline("Enter File name:\n")
>>> # Find first occurance of file
>>> for ( round1 in 1 : 6) {
>>> ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
>>> if (file.exists(ReadFile))
>>> break
>>> }
>>> x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
>>> for ( round2 in (round1+1) : 6) {
>>> #
>>> ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
>>> if (file.exists(ReadFile)) {
>>> y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
>>> if (round2 == (round1 +1))
>>> z=data.frame(merge(x,y,all=TRUE))
>>> z=data.frame(merge(y,z,all=TRUE))
>>> }
>>> }
>>> ordered = order(z$lab_id)
>
> Following Duncan's hypothesis, perhaps change this to :
> ordered = order(as.character(z$lab_id))
>
>>> results = z[ordered,]
>>> res =
>>> data
>>> .frame
>>> ( lab
>>> =
>>> results
>>> [,"lab_id
>>> "],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r =
>>> results[,"rnd"])
>>> #
>>> # Establish no of samples recorded
>>> nsmpls = length(res[,c("lab")])
>>> # Evaluate Z_scores for Between Lab Results
>>> for ( i in 1 : nsmpls) {
>>> if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
>>> res[i,"pf_zbw"]=1
>>> }
>>> # Evaluate Z_scores for Within Lab Results
>>> for ( i in 1 : nsmpls) {
>>> if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
>>> res[i,"pf_zwi"]=1
>>> }
>>> dd = melt(res, id=c("lab","r"), "pf_zbw")
>>> b = cast(dd, lab ~ r)
>>> If anyone could see why the ordering only works for about 55 of 70
>>> records and could steer me in the right direction I would be obliged
>>
>> I can't try out your code, but I'd guess it's due to conversion of
>> strings to factors. Sorting factors will sort them by their
>> numerical value, not by the strings.
>>
>> So the solution is to set stringsAsFactors=FALSE, either in each
>> read.csv call, or globally with options(stringsAsFactors=FALSE).
>>
>> Duncan Murdoch
>>
>
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
>
More information about the R-help
mailing list