[R-SIG-Mac] Poor plotting performance on Mac OS X

David Winsemius dwinsemius at comcast.net
Wed Aug 2 17:56:29 CEST 2017


> On Jul 6, 2017, at 4:12 AM, Ashley Betts <Ashley.Betts at saltbushsoftware.com> wrote:
> 
> Hi All,
>    I'm quite new to R and recently started investigating the geospatial plotting capabilities of R via ggplot2. I started by using some of the publicly available datasets from the Australian Bureau of Statistics. Plotting the Level 3 Statistical Area boundaries took over 2 hours on my 2012 Mac Book Pro. As there were over 3M rows in the fortify’ed data frame I initially thought this was just how long it must take. I then ran the exact same script on my work laptop which is similarly spec’ed and it ran in approximately 30 seconds. This now has me extremely disappointed in the performance on the Mac which is where I use R the most. I changed my BLAS library to the Accelerate library in a whim that this might make a difference. It did not. Whilst I primarily use RStudio I also ran the same script in R.app and if there was any improvement it was not noticeable. I did notice in the Windows run that it seemed to use multiple cores (which is what made me investigate the BLAS change) whilst the Mac seems to stay bound to a single core. My initial thoughts were that it must be something to do with ggplot but after sampling the rsession process a number of times (see attached Sample of rsession.txt) it appears to be spending most of it’s time in CGContextDrawPath in Apples CoreGraphics so I assume it is a Graphics related issue. I’m running R 3.4 on my Mac and 3.3.2 on the Windows machine. I’ve attached the script, process sample text and a number of screen shots that I hope will be helpful in analysing the issue. Could someone possibly let me know if this is PEBKAC issue or an actual problem with R. If the later how do I go about getting the issue resolved?
> 
> The SA3 boundary data is available here:
> 
> http://www.abs.gov.au/AUSSTATS/abs@.nsf/DetailsPage/1270.0.55.001July%202016?OpenDocument
> 
> as 'Statistical Area Level 3 (SA3) ASGS Ed 2016 Digital Boundaries in ESRI Shapefile Format’

I tried opening that .R file (which I was surprised made it through the usual scrubbing process) and I see this at the top.
=====copied=====
library(rio)
library(ggplot2)
library(rgdal)
library(rgeos)
library(dplyr)

convert("../Data/ABS/14100DS0001_2017-03.xlsx", "absregdata.csv")
======end-copy===

After seeing that I went looking in the linked document which was really not a link to a document. I did find the referenced document on that page and downloaded the file:

http://www.abs.gov.au/AUSSTATS/subscriber.nsf/log?openagent&1270055001_sa3_2016_aust_shape.zip&1270.0.55.001&Data%20Cubes&43942523105745CBCA257FED0013DB07&0&July%202016&12.07.2016&Latest

That's the shapefile that is referenced later in the code, but I see no way to find the CSV file that you are loading. So I see no method of reproducing your observations.

You are also several version behind the current dplyr release. I happen to have the same outdated versions of rgdal, rgeos, and sp packages but they, too, are slightly out-of-date.

So unable to attempt reproducing your difficulties. You should try at a minimum to supply data that will allow this. You should also try starting your Mac with a minimum of of other loaded applications on a clean session. memory fragmentation often prevents execution of large jobs in memory and long times are possible if you need to page out to disk and do not have a SSD device as your system disk.

(I'm able to read but not to understand the results of your sampling. It's possible that more savvy users of macs will be able to tell whether my hypothesis, that this is caused by paging-out to disk, is correct.

Hope this helps;
David.



> 
> Regards,
> 
> Ashley
> 


David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.'   -Gehm's Corollary to Clarke's Third Law



More information about the R-SIG-Mac mailing list