[R] When is *interactive* data visualization useful to use?

Mike Marchywka marchywka at hotmail.com
Fri Feb 11 13:49:12 CET 2011



----------------------------------------
> From: tal.galili at gmail.com
> Date: Fri, 11 Feb 2011 08:26:16 +0200
> To: r-help at r-project.org
> Subject: [R] When is *interactive* data visualization useful to use?
>
> Hello all,
>
> Before getting to my question, I would like to apologize for asking this
> question here. My question is not directly an R question, however, I still
> find the topic relevant to R community of users - especially due to only *
> partial* (current) support for interactive data visualization (see here:
> http://cran.r-project.org/web/views/Graphics.html were with iplots we are
> waiting for iplots extreme, and with rggobi, it currently can not run with R
> 2.12 and windows 7 OS).


I guess I would just mention a few related issues that are central to R
that I have encountered. This is not well organized but if there is a point
here I'm suggesting that maybe the thing to do is make R work better with streaming
data and provide a way to pipe text data to and from other graphically oriented 
tools that could be taken from many unrelated sources. 

One issue is the concept of streaming for dealing with unlimited data 
and the other is playing nice with the other tools. I recently encountered
your concerns with R ( a few days ago) wondering if interactive may be a good
way to survey some plot I had- many thousands of points that were hard to
explore without interactive zoom seemed to be a natural for this. Often people
here complain about memory limits with large data sets and it is not unreasonable
to work with indefinitely long data streams and examine results in real time.
I had encountered this in the past, IIRC I wanted to watch histograms from a monte
carlo simulation and wanted to know right away if things were going wrong.

Probably you would want to consider R capabilities along with those of
related tools and means for sharing data. Even complex models or data
are normally reducible to text that can piped around to various tools so
having a feature like this in any tools or packages is important.


If you want to author fixed results but let the viewer interact with them,
maybe look at things like PDF once there are more open source tools for dealing with it. 
I have grown up hating PDF but apparently the viewers can offer reasonable
interactivity with properly authored PDF files. The "Standard" is hardly well 
supported with open source tools
and many features of the standard get referred to "only available if you buy this from Adobe."
This creates two issues, one just being cost and annoyance but the other is ability
to check results. If you suspect something is wrong with open source you are can always
look and taking someone's word for software correctness, well, take a look at the credit
rating agencies LOL. And there is always a concern for an attitude problem with this too as
web designers seem to think that " well we created a huge
brand-name file that is also a 'standard' if it is that big from a big 
company there must be lots of information in all those bytes" as if they get paid by 
the megabyte when often just a csv file would be more important to R users.


If you really want professional graphics with good interactivity and are willing
to dig a little as part of a larger survey, I'd be curious to know if there is anything
that can be extracted from all the interactive games LOL...



>
> And now for my question:
>
> While preparing for a talk I will give soon, I recently started digging into
> two major (Free) tools for interactive data visualization:
> GGobi
> and mondrian  - both offer a great range of
> capabilities (even if they're a bit buggy).
>
> I wish to ask for your help in articulating (both to myself, and for my
> future audience) *When is it helpful to use interactive plots? Either for
> data exploration (for ourselves) and data presentation (for a "client")?*
>
> For when explaining the data to a client, I can see the value of animation
> for:
>
> - Using "identify/linking/brushing" for seeing which data point in the
> graph is what.
> - Presenting a sensitivity analysis of the data (e.g: "if we remove this
> point, here is what we will get)
> - Showing the effect of different groups in the data (e.g: "let's look at
> our graphs for males and now for the females")
> - Showing the effect of time (or age, or in general, offering another
> dimension to the presentation)
>
> For when exploring the data ourselves, I can see the value of
> identify/linking/brushing when exploring an outlier in a dataset we are
> working on.
>
> But other then these two examples, I am not sure what other practical use
[[elided Hotmail spam]]
>
> It could be argued that the interactive part is good for exploring (For
> example) a different behavior of different groups/clusters in the data. But
> when (in practice) I approached such situation, what I tended to do was to
> run the relevant statistical procedures (and post-hoc tests) - and what I
> found to be significant I would then plot with colors clearly dividing the
> data to the relevant groups. From what I've seen, this is a safer approach
> then "wondering around" the data (which could easily lead to data dredging
> (were the scope of the multiple comparison needed for correction is not even
> clear).
>
> I'd be very happy to read your experience/thoughts on this matter.
>
>
> Thanks in advance,
> Tal
>
>
> ----------------Contact
> Details:-------------------------------------------------------
> Contact me: Tal.Galili at gmail.com | 972-52-7275845
> Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
> www.r-statistics.com (English)
> ----------------------------------------------------------------------------------------------
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
 		 	   		  


More information about the R-help mailing list