[R] a replace for subset

Jeff Newmiller jdnewmil at dcn.davis.ca.us
Sat Apr 16 15:55:45 CEST 2016


Use the split function to automatically create a list of pre-subsetted 
data frames, and then generate your output however you wish to. For 
example (using Jim Lemon's sample data generator):

library(ggplot2)

mydata <- data.frame( RE = sample( 5:50, 100, TRUE)
                     , LU = sample( 1500:4500, 100 )
                     , COUNTRY = factor( sample( c( "DE","FR","JP","AU")
                                               , 100
                                               , TRUE
                                               )
                                       )
                     , Light = factor( sample( c( "ON", "OFF" )
                                             , 100
                                             , TRUE
                                             )
                                     )
                     , OR = factor( sample( c( "S", "T" )
                                          , 100
                                          , TRUE
                                          )
                                  )
                     , PAT = factor( sample( c( "low", "high", "middle" )
                                           ,100
                                           ,TRUE
                                           )
                                   )
                     )
# split wants you to specify a list of columns to create unique
# groups by;
# data frames are lists of columns;
# data frame indexing lets you specify a subset of columns
mydataList0 <- split( mydata
                     , mydata[ , c( "COUNTRY", "Light" ) ]
                     )
# you should use the str() function frequently in an interactive
# fashion to help you understand the data you are working with:
str( mydataList0 )

# if you try to specify a single column as a subset of columns,
# R will by default forget the "list of" aspect... to keep it, use 
# drop=FALSE
mydataList <- split( mydata
                    , mydata[ , c( "COUNTRY" ), drop = FALSE ]
                    )

# I happen to like packing information into a single plot where possible.
# Since you did not provide a minimial reproducible example, I cannot
# tell whether this will work for you. You can use some variant of 
# mydataList0 if you don't like this approach.
for ( idx in seq_along( mydataList ) ) {
     print( ggplot( mydataList[[ idx ]], aes( x=RE, y=LU, shape=Light ) ) +
             geom_point() +
             facet_grid( PAT ~ OR ) +
             ggtitle( paste( "Country ="
                           , mydataList[[ idx ]][1,"COUNTRY"]))
     )
}

For future reference, the Posting Guide mentions several good practices 
for asking questions online that will help you understand your own problem 
better as well as making it easier for us to provide answers.

On Sat, 16 Apr 2016, ch.elahe via R-help wrote:

> Hi, 
> I have a data set (mydata), which a part of this is like the following: 
>
>
> 'data.frame':   36190 obs. of 16 variables: 
> $ RE                    : int  38 41 11 67 30 18 38 41 41 30 ... 
> $ LU                     : int  4200 3330 530 4500 3000 1790 4700 3400 3640 4000 ... 
> $ COUNTRY        : Factor w/ 4 levels "DE","FR","JP", "FR"? 
> $Light                  : Factor w/2 levels   "ON","OFF","ON", ?. 
> $OR                     : Factor w/2 levels   "S","T","S",?. 
> $PAT                  : Factor w/3 levels   "low", "high", "middle",?. 
>
>
> Now I want to plot RE vs LU with ggplot2 for all the possible cases, I know how to do subsetting for the data but I want to know is there any shorter way to do that? For example I want to have a plot for RE vs LU for (COUNTRY= FR, Light=off, OR=S, PAT=low) and one for (COUNTRY= FR, Light=on, OR=S, PAT=high) and ?., as you see doing subset is time consuming, is there any other way? 
> Thank you for any help. 
> Elahe
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k



More information about the R-help mailing list