[R] Plot of large dataset
Duncan Murdoch
murdoch at stats.uwo.ca
Mon Sep 8 19:01:09 CEST 2008
I'd start with scatterplots of the two subsets (pass vs fail), but with
280k points, those are likely to be fairly uninformative masses of black
ink). However, there might be enough separation between them that you
don't need anything else.
If not, then a pair of hexbin plots (from the Bioconductor hexbin
package), e.g.
plot(hexbin(rnorm(280000), rnorm(280000)))
may work. Other possibilities are to use partially transparent points,
and possibly to use jittering if there are a lot of ties.
I would avoid 3D histograms; they aren't nearly as informative.
Duncan Murdoch
On 9/8/2008 11:40 AM, Jason Thibodeau wrote:
> I apologize, I forgot to type the title.
>
> On Mon, Sep 8, 2008 at 11:39 AM, Jason Thibodeau <jbloudg20 at gmail.com>wrote:
>
>> Hello all,
>>
>> I have a very large file (280k lines) containing three comma separated
>> variables. The first variable is a 0 or 1 depicting a pass or fail. The
>> other two are X and Y coordinates. Is there a good way I can represent this
>> data in a chart/plot form other than using a 3d histogram? If I need to use
>> the histogram, should I base my chart off the example contained in the RGL
>> package?
>>
>> Thanks a lot.
>>
>> --
>> Jason Thibodeau
>> ECE Dept., University of Connecticut
>> 371 Fairfield Way, Storrs, CT 06269
>> Phone: 860-486-5274 , Fax: 860-486-2447
>> Email: jpt03002 at engr.uconn.edu
>> URL: www.engr.uconn.edu/~jpt03002 <http://www.engr.uconn.edu/%7Ejpt03002>
>>
>
>
>
More information about the R-help
mailing list