[R] Efficient selection and alteration of dataframe records
Gabor Grothendieck
ggrothendieck at myway.com
Thu Feb 3 22:29:45 CET 2005
I did not attempt to follow your code or discussion but you could
try these:
1. try to pin down what part of your code is taking the time
2. try to eliminate the loop, if possible
3. use matrices rather than data frames -- matrices are faster
Daniel E. Bunker <deb37 <at> columbia.edu> writes:
:
: Hi All,
:
: I am writing a simulation that examines the effects of species
: extinctions on ecological communties by sequentially removing
: individuals of a given species (sometimes using weighted probabilities)
: and replacing the lost individuals with species identities randomly
: sampled from the remaining individuals. Thus I use two dataframes. One
: contains all the individuals and their species identities (plotdf). The
: other contains each species and their associated weights (traitdf).
:
: While I have code that works, it runs slowly. I suspect there is a more
: efficient way.
:
: First, I 'sample' one species from the species file (traitdf), then I
: use that result to 'subset' the individuals dataframe (plotdf) into two
: new files: individuals of the extincted species (plotdf.del) and
: retained individuals (plotdf.old).
:
: I then use a 'for' loop to run through each record in plotdf.del and
: randomly sample a new species identity from plotdf.old. (Note that I
: also need one species specific variable from the traitdf dataframe,
: which I have been attaching using 'merge.') When all are replaced, I
: simply 'rbind' plotdf.old and plotdf.del back together. I then delete
: another species, etc, etc.
:
: My guess is that there is a way to replace the lost individuals using a
: 'sample' that simply excludes the lost individuals (records). This
: would avoid splitting the data frame and 'rbind'ing it back together.
: If I could also inlcude a second variable from the 'sample'd records,
: this would eliminate the need for the 'merge'.
:
: I am running R2.0.0 on windows 2000.
:
: Simplified code is below.
:
: Any suggestions would be greatly appreciated.
:
: Thanks for your time, Dan
:
: plotdf=data.frame(
: tag=1:100,
:
: pspp=c(rep("Sp1",40),rep("Sp2",30),rep("Sp3",20),rep("Sp4",5),rep("Sp5",5)),
: dim=runif(100)*100)
: plotdf[1,]
:
: abun.table=as.data.frame(table(plotdf$pspp))
:
: #2.1 calculate Smax (count of species)
: Smax=length(abun.table$Freq[abun.table$Freq>0])
: Smax
:
: traitdf=data.frame(
: tspp=c("Sp1","Sp2","Sp3","Sp4","Sp5"),
: width=runif(5),
: abun=abun.table$Freq)
: traitdf[1,]
:
: rm(abun.table)
:
: #3. merge plotdf and traitdf
: plotdft=merge(plotdf, traitdf, by.x ="pspp", by.y="tspp")
:
: #4 define summary dataframe sumdf
: sumdf=data.frame(s.n=NA, s.S=NA, s.crop=NA)
:
: #reset all data to raw data.
: #b. calculate crop in plotdft with all species present
: plotdft$crop=plotdft$width*exp(-2.0+2.42*(log(plotdft$dim)))
: #c. sum crop
: sumcrop=sum(plotdft$crop)
: #d. write n, S, crop to sumdf
: sumdflength=length(sumdf$s.n)
: sumdf[sumdflength+1,1]=1;
: sumdf[sumdflength+1,2]=Smax;
: sumdf[sumdflength+1,3]=sumcrop;
:
: #6. SPECIES DELETION LOOP. This is the species deletion loop.
: #a. repeat from n=1:Smax-1 (S=Smax-n+1)
: for(n in 1:(Smax-1)) {
: S=Smax-n+1;
:
: #b. remove and replace one species
: #1. sample one species based on weight (e.g., abundance)
: #delsp = sample(traitdf$tspp, size=1);delsp
: delsp = sample(traitdf$tspp, size=1, prob=traitdf[,3]);
:
: #2. select traitdf records that match delsp
: traitdf.del = subset(traitdf, tspp==delsp);traitdf.del[1,]
:
: #3. and delete that species from trait data
: traitdf = subset(traitdf, tspp!=delsp[1]);
:
: #4. split that species from plot data into new df
: plotdf.old = subset(plotdf, plotdf$pspp!=delsp);plotdf.old[1,]
: plotdf.del = subset(plotdf, plotdf$pspp==delsp);plotdf.del[1,]
:
: #5. replace delsp params with params randomly selected from
: remaining spp:
: for (x in 1:length(plotdf.del$pspp)){
: newsp = sample(plotdf.old$pspp, size=1);#print(newsp[1])
: plotdf.del$pspp[x]=newsp[1]
: }
: #6. rbind plotdf and splitdf into plotdf,
: plotdf=rbind(plotdf.old,plotdf.del);plotdf[1,]
:
: #b. calculate standing crop,etc
: #1. merge plotdf and traitdf
: plotdft=merge(plotdf, traitdf, by.x ="pspp", by.y="tspp")
:
: #2. calculate crop in plotdft
: plotdft$crop=plotdft$width*exp(-2.0+2.42*log(plotdft$dim))
:
: #3. sum crop
: sumcrop=sum(plotdft$crop)
:
: #4. calculate S
: abun.table=as.data.frame(table(plotdf$pspp))
: S=length(abun.table$Freq[abun.table$Freq>0])
:
: #c. write n, S, crop to sumdf
: sumdflength=length(sumdf$s.n)
: sumdf[sumdflength+1,1]=n+1;
: sumdf[sumdflength+1,2]=S;
: sumdf[sumdflength+1,3]=sumcrop;
: }#d. REPEAT SPECIES DELETION LOOP
: #housekeeping
: rm(delsp, plotdf, plotdf.del, plotdf.old, plotdft, traitdf.del)
: gc()
:
: #8. plot results, fit line
: print(sumdf)
: traitdf
:
: plot(sumdf$s.S, sumdf$s.crop)
:
More information about the R-help
mailing list