[R] help with text patterns in strings
arun
smartpink111 at yahoo.com
Mon Jun 17 22:37:49 CEST 2013
Hi,
dat1$Ans<-tolower(dat1$Ans)
#But, if you do this:
vec1<- c("su","m","tu","w","th","f","sa")
vec2<-unlist(strsplit(dat1$Ans,","))
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#su m tu w th f sa # which is incorrect here "tu" got two matches in sa"tu"rday
# 2 0 2 1 1 3 4
Instead:
#Suppose your data looks like this:
dat2<- data.frame(Ans=c("friday","wednesday","Friday,Saturday,sunday","saturday","sat,Sun","th,F,Sa"),stringsAsFactors=FALSE)
vec2<- unlist(strsplit(dat2$Ans,","))
library(Hmisc)
vec2New<-capitalize(vec2)
vec2New
#[1] "Friday" "Wednesday" "Friday" "Saturday" "Sunday" "Saturday"
#[7] "Sat" "Sun" "Th" "F" "Sa"
vec1<- c("Su","M","Tu","W","Th","F","Sa")
sapply(vec1,function(x) length(vec2New[grep(x,vec2New)]) )
#Su M Tu W Th F Sa
# 2 0 0 1 1 3 4
#Or Using Bills' solution:
dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)]
# [1] "Friday" "Wednesday" "Friday" "Saturday" "Sunday" "Saturday"
#[7] "Saturday" "Sunday" "Thursday" "Friday" "Saturday"
table(dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)])
# Friday Saturday Sunday Thursday Wednesday
# 3 4 2 1 1
A.K.
----- Original Message -----
From: "Crombie, Burnette N" <bcrombie at utk.edu>
To: arun <smartpink111 at yahoo.com>
Cc:
Sent: Monday, June 17, 2013 4:12 PM
Subject: RE: [R] help with text patterns in strings
Arun, thanks. Your script achieves the goal I stated, but now I'm tweaking it as I see possible obstacles with my real data.
I anticipate the responses, since they are handwritten, with be a mixture of upper- & lowercase text, so I decided to prevent issues by using the "tolower()" function.
It did not work as I intended when editing your script (see below).
How do I use "tolower()" so that it will save the modification of my variable in the data frame.
Do I have to rename the original data frame in order to save my changes (create new object)?
dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
dat1
# Ans
# 1 Friday
# 2 Wednesday
# 3 Friday,Saturday,Sunday
# 4 Saturday
# 5 Sat,Sun
# 6 Th,F,Sa
tolower(dat1$Ans)
dat1
#the output I want:
# Ans
# 1 friday
# 2 wednesday
# 3 friday,saturday,sunday
# 4 saturday
# 5 sat,sun
# 6 th,f,sa
#but the real R output is not all lowercase
vec1<- c("su","m","tu","w","th","f","sa")
vec2<-unlist(strsplit(dat1$Ans,","))
vec2
#the output I want
#[1] "friday" "wednesday" "friday" "saturday" "sunday" "saturday"
#[7] "sat" "sun" "th" "f" "sa"
#but the real R output is not all lowercase
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#su m tu w th f sa
# 2 0 0 1 1 3 4
-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: Monday, June 17, 2013 3:15 PM
To: Crombie, Burnette N
Cc: R help
Subject: Re: [R] help with text patterns in strings
Hi,
May be this helps:
dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
dat1
Ans
1 Friday
2 Wednesday
3 Friday,Saturday,Sunday
4 Saturday
5 Sat,Sun
6 Th,F,Sa
vec1<- c("Su","M","Tu","W","Th","F","Sa")
vec2<-unlist(strsplit(dat1$Ans,","))
vec2
#[1] "Friday" "Wednesday" "Friday" "Saturday" "Sunday" "Saturday"
#[7] "Sat" "Sun" "Th" "F" "Sa"
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #Su M Tu W Th F Sa # 2 0 0 1 1 3 4
A.K.
----- Original Message -----
From: bcrombie <bcrombie at utk.edu>
To: r-help at r-project.org
Cc:
Sent: Monday, June 17, 2013 1:59 PM
Subject: [R] help with text patterns in strings
Let’s say I have a data set that includes a column of answers to a question “What days of the week are you most likely to eat steak?”.
The answers provided are [1] “Friday”, [2] “Wednesday”, [3] “Friday, Saturday, Sunday", [4] "Saturday”, [5] “Sat, Sun”, [6] “Th, F, Sa”
How can I tell R to count “Friday, Saturday, Sunday”, “Sat, Sun”, and “Th, F, Sa” as three separate entries for each unique observation?
And is there a way to simultaneously tell R that, for example, “Friday” is the same as “Fri” or “F”; “Saturday” is the same as “Sat” or “Sa”; etc.?
Thanks for your assistance.
--
View this message in context: http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list