[R] help with text patterns in strings
arun
smartpink111 at yahoo.com
Wed Jun 19 22:46:27 CEST 2013
HI Burnette,
As this is continuation of the earlier thread, you could post it on the same thread by cc: to rhelp.
Try this:
res1<-sapply(vec3,function(x) length(vec2New[grep(x,vec2New)]) )
dat1<-data.frame(res1,Name=names(vec3))
dat1$Name<-factor(dat1$Name,levels=c("early","mid","late","wknd"))
with(dat1,tapply(res1,list(Name),FUN=sum))
#early mid late wknd
# 0 1 4 6
#or
sapply(split(res1,names(vec3)),sum)
#early late mid wknd
# 0 4 1 6
A.K.
----- Original Message -----
From: "Crombie, Burnette N" <bcrombie at utk.edu>
To: arun <smartpink111 at yahoo.com>
Cc:
Sent: Wednesday, June 19, 2013 3:55 PM
Subject: RE: [R] help with text patterns in strings
Arun, let me know if I should post this email separately, but it involves the script from our previous conversation. I've been messing around as I think of potential scenarios with my data and am unclear how I can recount vec3 after assigning range names to the different days of the week. For this example, I want my output to go from:
#Su M Tu W Th F Sa
# 2 0 0 1 1 3 4
to:
# early mid late wknd
# 0 1 4 6
Thanks for your help throughout, but, again, let me know if I should start a new thread.
Burnette
##########################################
Begin script
##########################################
dat3<- read.csv("~/Rburnette/TextStringMatch.csv", stringsAsFactors=FALSE)
dat3
#respondent.ID response
# 1 Friday
# 2 Wednesday
# 3 Friday, saturday,Sunday
# 4 Saturday
# 5 Sat, sun
# 6 Th,F, Sa
# Rename the variable “response” to “Ans” to fit the script that’s already been written
# fix(dat3) can be used to do this manually, but then you need to keep "dat3" as the data frame, not "dat3edit"
# if not familiar, fix() generates a popup window like a spreadsheet that can be edited, and character vs numeric property can be changed
# the data set being “fixed” is saved automatically upon closing, but I think only within the current R session
# I think you need to redefine the fix() as a new object to keep the changes outside the R session (need to test this)
##########################################
library(gdata)
dat3edit <- rename.vars(dat3,from="response", to="Ans")
dat3edit
#respondent.ID Ans
# 1 Friday
# 2 Wednesday
# 3 Friday, saturday,Sunday
# 4 Saturday
# 5 Sat, sun
# 6 Th,F, Sa
# get rid of the spaces embedded in text strings
##########################################
dat3edit$Ans2 <- gsub(" ","",dat3edit$Ans)
dat3edit$Ans2
# [1] "Friday" "Wednesday" "Friday,saturday,Sunday" "Saturday"
# [5] "Sat,sun" "Th,F,Sa"
# split up multiple responses within an observation so they can be counted separately
##########################################
vec2<-unlist(strsplit(dat3edit$Ans2,","))
vec2
# [1] "Friday" "Wednesday" "Friday" "saturday" "Sunday" "Saturday" "Sat" "sun"
# [9] "Th" "F" "Sa"
#consistently format all (split up) responses to start with a capital letter for more accurate matching to a “universal” response code created in the next step
##########################################
library(Hmisc)
vec2New<-capitalize(vec2)
vec2New
# [1] "Friday" "Wednesday" "Friday" "Saturday" "Sunday" "Saturday" "Sat" "Sun"
# [9] "Th" "F" "Sa"
#match capitalized data to a “universal” response code of choice
##########################################
vec3<- c("Su","M","Tu","W","Th","F","Sa")
sapply(vec3,function(x) length(vec2New[grep(x,vec2New)]) )
#Su M Tu W Th F Sa
# 2 0 0 1 1 3 4
#assign range names to vec3
##########################################
names(vec3) <- c("wknd","early","early","mid","late","late","wknd")
vec3
# wknd early early mid late late wknd
# "Su" "M" "Tu" "W" "Th" "F" "Sa"
More information about the R-help
mailing list