[R] multiple values in one column

Fri Apr 6 21:03:41 CEST 2012

This is a function I use for these kinds of situations.  Assuming the delimiter within the column is consistent and the spelling is consistent, it is pretty useful.

The function returns a vector of 0/1 values, 1 if the text in level is found, 0 otherwise.
var=the variable
level=The value of interest in var

'split_levels' <- function(var, level, sep=","){

#*** identify level in var.
  f <- function(v){
    v <- unlist(strsplit(v,sep))
    ifelse(level %in% v, return(1), return(0))
  }

#*** split the variable
  new.var <- unlist(sapply(var,f))
  names(new.var) <- NULL

#*** assign NA's where they were in original variable
  new.var[is.na(var)] <- NA
  return(new.var)
}

  Benjamin Nutter |  Biostatistician     |  Quantitative Health Sciences
  Cleveland Clinic    |  9500 Euclid Ave.  |  Cleveland, OH 44195  | (216) 445-1365

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Mark Grimes
Sent: Friday, April 06, 2012 11:16 AM
To: John D. Muccigrosso
Cc: r-help at r-project.org
Subject: Re: [R] multiple values in one column

John

I have to deal with this kind of thing too for my class.

# 	Some functions
	# for ad$Full.name = "Mark Grimes"
	get.first.name <- function(cell){
	x<-unlist(strsplit(as.character(cell), " "))
	return(x[1]) 
	}
	get.last.name <- function(cell){
	x<-unlist(strsplit(as.character(cell), " "))
	return(x[2]) 
	}
	# For roster$Name = "Grimes, Mark L"
	get.first.namec <- function(cell){
	x<-unlist(strsplit(as.character(cell), ", "))
	y <- get.first.name(x[2])
	return(y) 
	}
	get.last.namec <- function(cell){
	x<-unlist(strsplit(as.character(cell), ", "))
	return(x[1]) 
	}
Use these functions with the apply family for processing class files. 

Hope this helps,

Mark

On Apr 6, 2012, at 9:09 AM, John D. Muccigrosso wrote:

> I have some data files in which some fields have multiple values. For example
> 
> first  last   sex   major
> John   Smith  M     ANTH
> Jane   Doe    F     HIST,BIOL
> 
> What's the best R-like way to handle these data (Jane's major in my example), so that I can do things like summarize the other fields by them (e.g., sex by major)?
> 
> Right now I'm processing the files (in excel since they're spreadsheets) by duplicating lines with two values in the major field, eliminating one value per row. I suspect there's a nifty R way to do this.
> 
> Thanks in advance!
> 
> John Muccigrosso
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

===================================

 Please consider the environment before printing this e-mail

Cleveland Clinic is ranked one of the top hospitals
in America by U.S.News & World Report (2010).  
Visit us online at http://www.clevelandclinic.org for
a complete listing of our services, staff and
locations.

Confidentiality Note:  This message is intended for use\...{{dropped:13}}