FW: [R] Newbie struggling with "factors"

Warnes, Gregory R gregory_r_warnes at groton.pfizer.com
Fri Mar 29 17:56:38 CET 2002


Hint #1,  to do any useful transformations on your variables you will
probably need to convert them temporarily into character variables (aka
strings).  Do that with 

	as.character(n$OSUSE)

Probably your will want to convert each of the variables that are in this
format into a set of numeric variables.  Something like this:

	n <- data.frame(OSUSE = c("1","1,3","1,2,3"))	
	n$OSUSE.Windows   <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"1" %in% X ) )
	n$OSUSE.Macintosh <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"2" %in% X ) )
     	n$OSUSE.Unix      <- sapply( strsplit(n$OSUSE, ",") , function(X) (
"3" %in% X ) )

Alternatively, if you often have variables like this, you might consider
creating a new object type that extends factor and that includes the
operations that you need.  

Something like:

### Start Sample Code ###

checklist <- function(X, boxnames)
  {
    attr(X, "boxnames") <- boxnames
    class(X) <- c("checklist","factor")
    return(X)
  }

contains <- function(X, name)
  {
    if(is.character(name) )
      name <- pmatch( name, attr(X,"boxnames" ) )
                     
    retval <- sapply( strsplit(X, ",") , function(X) ( name %in% X ) )
    return(retval)
  }

numchecked <- function(X)
  {
    retval <- sapply( strsplit(X, ","), length )
    return(retval)
  }

summary.checklist <- function(x, ...)
  {
    sum <- apply( as.matrix(x), 2, sum )
    mean <- apply( as.matrix(x), 2, mean )
    return( rbind(sum,mean))
  }

as.matrix.checklist <- function(x, ...)
  {
    sapply( attr(x, "boxnames"), function(YY) contains(x, YY) )
  }

### End Sample Code ##

Here's some examples of using these functions:

> n <- data.frame(OSUSE = c("1","1,3","1,2,3"))
> 
> n$OSUSE <- checklist(n$OSUSE, c("Windows","Macintosh","Unix"))
#
# Check if OSUSE includes a specific OS
#
> contains( n$OSUSE, "Windows")
[1] TRUE TRUE TRUE
> contains( n$OSUSE, "Macintosh")
[1] FALSE FALSE  TRUE
> contains( n$OSUSE, "Unix")
[1] FALSE  TRUE  TRUE
>
#
# Compute the average number of checked items
# 
> numchecked(n$OSUSE)
[1] 1 2 3
> mean(numchecked(n$OSUSE))
[1] 2
> 
#
# Create a matrix showing whether each box was checked or not
#
> as.matrix(n$OSUSE)
     Windows Macintosh  Unix
[1,]    TRUE     FALSE FALSE
[2,]    TRUE     FALSE  TRUE
[3,]    TRUE      TRUE  TRUE
> 
#
# Show some summary info
#
> summary(n$OSUSE)
     Windows Macintosh      Unix
sum        3 1.0000000 2.0000000
mean       1 0.3333333 0.6666667		


Of course, you'll want to modify these classes to suit your needs.  A little
time up front can help a lot.

If you like, I'll include these classes and any enhancements that you make
in my 'gregmisc' library.


-Greg


> -----Original Message-----
> From: Tom Arnold [mailto:thomas_l_arnold at yahoo.com]
> Sent: Friday, March 29, 2002 8:59 AM
> To: R
> Subject: [R] Newbie struggling with "factors"
> 
> 
> I am processing some survey results, and my data are
> being read in as "factors". I don't know how to
> process these things in any way.
> 
> To start with, several of the survey questions are
> mulit-choice check boxes on the original (web-based)
> survey, as in "check all that apply".
> 
> These are encoded as numbers. For example, if the
> survey has a question:
> Which operating systems have you used? (Check all that
> apply)
> [ ]Windows
> [ ]Macinotsh
> [ ]Unix
> 
> ...then the data exported for three different
> responses might look like
> ;1;
> ;1,3;
> ;1,2,3;
> 
> ...where ";" is the field delimiter. 
> I use read.table to get the data in. I read all the
> survey data into a table "n" and the field above is
> called "OSUSE". When I query R about the field, it
> tells me it is class "factor"
> 
> > class(n$OSUSE)
> [1] "factor"
> > mode(n$OSUSE)
> [1] "numeric"
> 
> I'd like to be able to do some simple things like:
> what is the most common item checked (1, 2, or 3?)
> What is the average number of boxes checked?
> 
> But I can't find any way to manipulate this "factor"
> field. What's the secret?
> 
> Thanks.
> 
> =====
> Tom Arnold
> Summit Media Partners
> Visit our web site at http://www.summitmediapartners.com
> 
> __________________________________________________
> 
> Yahoo! Greetings - send holiday greetings for Easter, Passover
> 
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> -.-.-.-.-.-.-.-.-
> r-help mailing list -- Read 
> http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: 
> r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> _._._._._._._._._
> 


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list