[R] Help with Order
Duncan Murdoch
murdoch at stats.uwo.ca
Mon Jan 11 13:49:14 CET 2010
On 11/01/2010 7:37 AM, Steve Sidney wrote:
> Dear List
>
> As a fairly new R programmer I seem to have run into a strange problem -
> probably my inexperience with R
>
> After reading and merging successive files into a single data frame, I find
> that order does not sort the data as expected.
>
> I have multiple references in each file but each file refers to measurement
> data obtained at a different time.
>
> Here's the code
>
> library(reshape)
> # Enter file name to Read & Save data
> FileName=readline("Enter File name:\n")
> # Find first occurance of file
> for ( round1 in 1 : 6) {
> ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
> if (file.exists(ReadFile))
> break
> }
>
> x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
> for ( round2 in (round1+1) : 6) {
> #
> ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
> if (file.exists(ReadFile)) {
> y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
> if (round2 == (round1 +1))
> z=data.frame(merge(x,y,all=TRUE))
> z=data.frame(merge(y,z,all=TRUE))
> }
> }
> ordered = order(z$lab_id)
>
> results = z[ordered,]
>
> res = data.frame(
> lab=results[,"lab_id"],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r
> = results[,"rnd"])
>
>
> #
> # Establish no of samples recorded
> nsmpls = length(res[,c("lab")])
>
> # Evaluate Z_scores for Between Lab Results
> for ( i in 1 : nsmpls) {
> if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
> res[i,"pf_zbw"]=1
> }
> # Evaluate Z_scores for Within Lab Results
> for ( i in 1 : nsmpls) {
> if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
> res[i,"pf_zwi"]=1
> }
>
> dd = melt(res, id=c("lab","r"), "pf_zbw")
> b = cast(dd, lab ~ r)
> If anyone could see why the ordering only works for about 55 of 70 records
> and could steer me in the right direction I would be obliged
I can't try out your code, but I'd guess it's due to conversion of
strings to factors. Sorting factors will sort them by their numerical
value, not by the strings.
So the solution is to set stringsAsFactors=FALSE, either in each
read.csv call, or globally with options(stringsAsFactors=FALSE).
Duncan Murdoch
More information about the R-help
mailing list