[R] help with text patterns in strings

arun smartpink111 at yahoo.com
Mon Jun 17 22:37:49 CEST 2013


Hi,
dat1$Ans<-tolower(dat1$Ans)
#But, if you do this:

 vec1<- c("su","m","tu","w","th","f","sa")
 vec2<-unlist(strsplit(dat1$Ans,","))
 sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#su  m tu  w th  f sa  # which is incorrect here "tu" got two matches in sa"tu"rday
# 2  0  2  1  1  3  4 



Instead:
#Suppose your data looks like this: 

dat2<- data.frame(Ans=c("friday","wednesday","Friday,Saturday,sunday","saturday","sat,Sun","th,F,Sa"),stringsAsFactors=FALSE)
vec2<- unlist(strsplit(dat2$Ans,","))
library(Hmisc)
vec2New<-capitalize(vec2)
vec2New
#[1] "Friday"    "Wednesday" "Friday"    "Saturday"  "Sunday"    "Saturday" 
 #[7] "Sat"       "Sun"       "Th"        "F"         "Sa"       
vec1<- c("Su","M","Tu","W","Th","F","Sa")

sapply(vec1,function(x) length(vec2New[grep(x,vec2New)]) )

#Su  M Tu  W Th  F Sa 
# 2  0  0  1  1  3  4 


#Or Using Bills' solution:
dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)]
# [1] "Friday"    "Wednesday" "Friday"    "Saturday"  "Sunday"    "Saturday" 
 #[7] "Saturday"  "Sunday"    "Thursday"  "Friday"    "Saturday" 


table(dayNames[pmatch(vec2New,dayNames,duplicates.ok=TRUE)])
#   Friday  Saturday    Sunday  Thursday Wednesday 
 #       3         4         2         1         1 



A.K.


----- Original Message -----
From: "Crombie, Burnette N" <bcrombie at utk.edu>
To: arun <smartpink111 at yahoo.com>
Cc: 
Sent: Monday, June 17, 2013 4:12 PM
Subject: RE: [R] help with text patterns in strings

Arun, thanks.  Your script achieves the goal I stated, but now I'm tweaking it as I see possible obstacles with my real data.
I anticipate the responses, since they are handwritten, with be a mixture of upper- & lowercase text, so I decided to prevent issues by using the "tolower()" function.
It did not work as I intended when editing your script (see below).
How do I use "tolower()" so that it will save the modification of my variable in the data frame.
Do I have to rename the original data frame in order to save my changes (create new object)?

dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
dat1
#                 Ans
# 1                 Friday
# 2              Wednesday
# 3 Friday,Saturday,Sunday
# 4               Saturday
# 5                Sat,Sun
# 6                Th,F,Sa

tolower(dat1$Ans)
dat1
#the output I want:
#                 Ans
# 1                 friday
# 2              wednesday
# 3 friday,saturday,sunday
# 4               saturday
# 5                sat,sun
# 6                th,f,sa
#but the real R output is not all lowercase

vec1<- c("su","m","tu","w","th","f","sa")
vec2<-unlist(strsplit(dat1$Ans,","))

vec2
#the output I want
#[1] "friday"    "wednesday" "friday"    "saturday"  "sunday"    "saturday" 
#[7] "sat"       "sun"       "th"        "f"         "sa"
#but the real R output is not all lowercase

sapply(vec1,function(x) length(vec2[grep(x,vec2)]) )
#su  m tu  w th  f sa
# 2  0  0  1  1  3  4

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com] 
Sent: Monday, June 17, 2013 3:15 PM
To: Crombie, Burnette N
Cc: R help
Subject: Re: [R] help with text patterns in strings

Hi,
May be this helps:

dat1<- data.frame(Ans=c("Friday","Wednesday","Friday,Saturday,Sunday","Saturday","Sat,Sun","Th,F,Sa"),stringsAsFactors=FALSE)
 dat1
                     Ans
1                 Friday
2              Wednesday
3 Friday,Saturday,Sunday
4               Saturday
5                Sat,Sun
6                Th,F,Sa


 vec1<- c("Su","M","Tu","W","Th","F","Sa")
 vec2<-unlist(strsplit(dat1$Ans,","))

vec2

 #[1] "Friday"    "Wednesday" "Friday"    "Saturday"  "Sunday"    "Saturday" 
 #[7] "Sat"       "Sun"       "Th"        "F"         "Sa"
sapply(vec1,function(x) length(vec2[grep(x,vec2)]) ) #Su  M Tu  W Th  F Sa # 2  0  0  1  1  3  4 

A.K.


----- Original Message -----
From: bcrombie <bcrombie at utk.edu>
To: r-help at r-project.org
Cc: 
Sent: Monday, June 17, 2013 1:59 PM
Subject: [R] help with text patterns in strings

Let’s say I have a data set that includes a column of answers to a question “What days of the week are you most likely to eat steak?”.
The answers provided are [1] “Friday”, [2] “Wednesday”, [3] “Friday, Saturday, Sunday", [4] "Saturday”, [5] “Sat, Sun”, [6] “Th, F, Sa” 
How can I tell R to count “Friday, Saturday, Sunday”, “Sat, Sun”, and “Th, F, Sa” as three separate entries for each unique observation?
And is there a way to simultaneously tell R that, for example, “Friday” is the same as “Fri” or “F”; “Saturday” is the same as “Sat” or “Sa”; etc.?
Thanks for your assistance.




--
View this message in context: http://r.789695.n4.nabble.com/help-with-text-patterns-in-strings-tp4669714.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list