[R] Sequential Naming of ggplot .pngs using plyr
Matthew Dowle
mdowle at mdowle.plus.com
Thu Aug 11 10:42:39 CEST 2011
Hi Justin,
In data.table 1.6.1 there was this news item :
o j's environment is now consistently reused so
that local variables may be set which persist
from group to group; e.g., incrementing a group
counter :
DT[,list(z,groupInd<-groupInd+1),by=x]
One of the reasons data.table is fast is that there is no function
run per group. It's just that j expression. That's run in the same
persistent environment for each group, so you can do things
like increment a group counter within it.
If your data were in 'long' format (data.table prefers long format,
like a database) it might be something like (the ggplot line is untested) :
ctr = 1
DT[,{
png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,width=11,units='in',pointsize=9,res=300)
print(ggplot(aes(x=site,y=val))+geom_boxplot()+opts(title=paste('plot
number',ctr,sep=' ')))
dev.off()
ctr<-ctr+1 },
by=site]
Btw, there was a new feature in 1.6.3, where you can subassign
into data.table 500 times faster than <-. See the NEWS from
1.6.3 for an example :
http://datatable.r-forge.r-project.org/
Matthew
"Justin Haynes" <jtor14 at gmail.com> wrote in message
news:CAFaj53kjqy=1bJy+iLjeeLYKgvx=rTE2h_HA24Pt20wQVchh4A at mail.gmail.com...
> Thanks Ista,
>
> In my real code that is exactly what I'm doing, but I want to prepend the
> names with a sequential number for easier reference once the pngs are
> made.
>
> My initial thought was to add the sequential number to the data before
> sending it to plyr and drawing it out there, but that seems like an
> excessive extra step when I have 1e6 - 1e7 rows.
>
>
> Justin
>
>
> On Wed, Aug 10, 2011 at 2:42 PM, Ista Zahn
> <izahn at psych.rochester.edu>wrote:
>
>> Hi Justin,
>>
>> On Wed, Aug 10, 2011 at 5:04 PM, Justin Haynes <jtor14 at gmail.com> wrote:
>> > If I have data:
>> >
>> >
>> dat<-data.frame(a=rnorm(20),b=rnorm(20),c=rnorm(20),d=rnorm(20),site=rep(letters[5:8],each=5))
>> >
>> > And want to plot like this:
>> >
>> > ctr<-1
>> > for(i in c('a','b','c','d')){
>> > png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
>> > width=11,units='in',pointsize=9,res=300)
>> > print(ggplot(dat[,names(dat) %in%
>> >
>> c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
>> > number',ctr,sep=' ')))
>> > dev.off()
>> > ctr<-ctr+1
>> > }
>> >
>> > Is there a way to do the same naming using plyr (or data.table or
>> > foreach
>> > which I am not familiar with at all!)?
>>
>> This is not "the same naming", but the same general idea can be
>> achieved with plyr using
>>
>> d_ply(melt(dat,id.vars='site'),.(variable),function(df) {
>> png(file=paste("plyr_plot", unique(df$variable),
>> ".png"),height=8.5,width=11,units='in',pointsize=9,res=300)
>> print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
>> dev.off()
>> })
>>
>> I'm not up to speed on .parallel, foreach etc., so I'l leave the rest
>> to someone else.
>>
>> Best,
>> Ista
>> >
>> > m.dat<-melt(dat,id.vars='site')
>> > ddply(m.dat,.(variable),function(df)
>> > print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot()+ ..?)
>> >
>> > And better yet, is there a way to do it using .parallel=T?
>> >
>> > Faceting is not really an option (unless I can facet onto multiple
>> > pages
>> of
>> > a pdf or something) because these need to go into reports as
>> > individually
>> > labelled and titled plots.
>> >
>> >
>> > As a bit of a corollary, is it really worth the headache to resolve
>> > this
>> if
>> > I am only using melt/plyr to split on the four letter variables? With a
>> > larger set of data (1e6 rows), the melt/plyr version takes a
>> > significant
>> > amount of time but .parallel=T drops the time significantly. Is the
>> right
>> > answer a foreach loop and can I do that with the increasing counter? (I
>> > haven't gotten beyond Hadley's .parallel feature in my parallel R
>> > dealings.)
>> >
>> >>
>> >
>> dat<-data.frame(a=rnorm(1e6),b=rnorm(1e6),c=rnorm(1e6),d=rnorm(1e6),site=rep(letters[5:8],each=2.5e5))
>> >> ctr<-1
>> >> system.time(for(i in c('a','b','c','d')){
>> > + png(file=paste('/tmp/plot_number_',ctr,'.png',sep=''),height=8.5,
>> > width=11,units='in',pointsize=9,res=300)
>> > + print(ggplot(dat[,names(dat) %in%
>> >
>> c('site',i)],aes(x=factor(site),y=dat[,i]))+geom_boxplot()+opts(title=paste('plot
>> > number',ctr,sep=' ')))
>> > + dev.off()
>> > + ctr<-ctr+1
>> > + })
>> > user system elapsed
>> > 54.630 0.120 54.843
>> >
>> >> system.time(
>> > + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
>> > +
>> >
>> png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
>> > + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
>> > + dev.off()
>> > + },.parallel=F)
>> > + )
>> > user system elapsed
>> > 58.40 0.13 58.63
>> >
>> >> system.time(
>> > + ddply(melt(dat,id.vars='site'),.(variable),function(df) {
>> > +
>> >
>> png(file='/tmp/plyr_plot.png',height=8.5,width=11,units='in',pointsize=9,res=300)
>> > + print(ggplot(df,aes(x=factor(site),y=value))+geom_boxplot())
>> > + dev.off()
>> > + },.parallel=T)
>> > + )
>> > user system elapsed
>> > 70.33 3.46 27.61
>> >>
>> >
>> > How might I speed this up and include the sequential plot names?
>> >
>> > Thanks a bunch!
>> >
>> > Justin
>> >
>> > [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>>
>>
>> --
>> Ista Zahn
>> Graduate student
>> University of Rochester
>> Department of Clinical and Social Psychology
>> http://yourpsyche.org
>>
>
> [[alternative HTML version deleted]]
>
More information about the R-help
mailing list